Using built-in statistical functions
- Aggregate functions are used perform some kind of mathematical or statistical calculation across a group of rows.
- The rows in each group are determined by the different values in a specified column or columns.
Alternatively you can aggregate across the entire table.
If we wanted to know the minimum
, average
and maximum age
values (numage)
across the whole SN7577 table we could write a query such as this:
SELECT
min(numage),
avg(numage),
max(numage)
FROM SN7577;
This sort of query provides us with a general view of the values for a particular column or field across the whole table.
min
, max
and avg
are builtin aggregate functions in SQLite (and any other SQL database system).
There are other such functions avaialable. https://sqlite.org/lang_aggfunc.html
- It is more likely that we would want to find such values for a range, or multiple ranges of rows where each range is determined by the values of some other column in the table.
Before we do this we will look at how we can find out what different values are contained in a given column.
The Distinct
keyword
- For the SN7577 table the allowable values in many of the columns are listed in the SN7577 document.
But this doesn’t mean that they all actually appear in the data.
- To obtain a list of umique values in a particular column you can use the
Distinct
keyword.
We will use the text version of the SN7577 table for these examples.
For Q1 How would you vote if there were a General Election tomorrow?
there are 11 possibilities listed and another (-1) for a missing value.
To find out which values are in the data we can use the query:
SELECT DISTINCT Q1
FROM SN7577_Text;
We can see from the reults of running this that all 11 values are represented and that there is no missing data in this field.
- You can have more than one column name after the
Distinct
keyword.
the results will include a row for each unique combination of the columns involved
Exercise
Q3 asks for the likelihood of voting on a scale of 1 - 10 (10 - Absolutely certain to vote). Write a query which shows all of the combinations of voting intentions (Q1) and likelihood of voting (Q3).
Solution:
Sorting the results by Q1 gives the political parties in alphabetical order.
SELECT DISTINCT Q1, Q3
FROM SN7577_Text
ORDER BY Q1
The group by
clause to summarise data
- Just knowing the combinations is of limited use. You really want to know
How many
of each of the values there are.
- To do this we use the
Group By
clause.
SELECT Q1,
count(*) as Num_potential_voters
FROM SN7577_Text
GROUP BY Q1
ORDER BY Q1;
This query gives us a count of potential voters for each party.
- three aggregation were performed over the single column Q1.
It is possible to aggregate over multiple columns by specifying them in both the select and the group by clause.
The grouping will take place based on the order of the columns listed in the group by clause.
What is not allowed is specifying a non-aggregated column in the select clause which is not mentioned in the group by clause.
Using the having
clause
In order to filter the rows returned in a non-aggregated query we used the where
clause.
- For an aggregated query the equivalent is the
having
clause.
You use the having
clause by providing it with a filter expression which references one of the aggregated columns.
In a having
clause you can use the column alias to refer to the aggregated column.
SELECT Q1 ,
sum(daily3) as Telegraph_reader
FROM SN7577
GROUP BY Q1
HAVING Telegraph_reader > 5;
Exercise
In the UK the ‘Telegraph’ is regarded as a right-wing newspaper and the Conservatives are considered to be a right-wing political party.
Do the results of the query above support this?
The ‘Mirror’ is a left-wing newspaper and Labour are considered to be a left-wing political party. Re-write the query above to see which group of supporters are more likely to read the Mirror
You can browse the Newspapers table to find out which of the daily columns refers to the Mirror
Solution:
SELECT Q1 ,
sum(daily12) as Mirror_reader
FROM SN7577
GROUP BY Q1
HAVING Mirror_reader > 5;
LS0tCnRpdGxlOiAiQWdncmVnYXRpb25zIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgoKIyBVc2luZyBidWlsdC1pbiBzdGF0aXN0aWNhbCBmdW5jdGlvbnMKCiogQWdncmVnYXRlIGZ1bmN0aW9ucyBhcmUgdXNlZCBwZXJmb3JtIHNvbWUga2luZCBvZiBtYXRoZW1hdGljYWwgb3Igc3RhdGlzdGljYWwgY2FsY3VsYXRpb24gYWNyb3NzIGEgZ3JvdXAgb2Ygcm93cy4gCiogVGhlIHJvd3MgaW4gZWFjaCBncm91cCBhcmUgZGV0ZXJtaW5lZCBieSB0aGUgZGlmZmVyZW50IHZhbHVlcyBpbiBhIHNwZWNpZmllZCBjb2x1bW4gb3IgY29sdW1ucy4gCiogQWx0ZXJuYXRpdmVseSB5b3UgY2FuIGFnZ3JlZ2F0ZSBhY3Jvc3MgdGhlIGVudGlyZSB0YWJsZS4KCgoqIElmIHdlIHdhbnRlZCB0byBrbm93IHRoZSBgbWluaW11bWAsIGBhdmVyYWdlYCBhbmQgYG1heGltdW0gYWdlYCB2YWx1ZXMgYChudW1hZ2UpYCBhY3Jvc3MgdGhlIHdob2xlIFNONzU3NyB0YWJsZSB3ZSBjb3VsZCB3cml0ZSBhIHF1ZXJ5IHN1Y2ggYXMgdGhpczoKCmBgYHNxbApTRUxFQ1QgCiAgICBtaW4obnVtYWdlKSwKICAgIGF2ZyhudW1hZ2UpLAogICAgbWF4KG51bWFnZSkKRlJPTSBTTjc1Nzc7CgpgYGAKCiogVGhpcyBzb3J0IG9mIHF1ZXJ5IHByb3ZpZGVzIHVzIHdpdGggYSBnZW5lcmFsIHZpZXcgb2YgdGhlIHZhbHVlcyBmb3IgYSBwYXJ0aWN1bGFyIGNvbHVtbiBvciBmaWVsZCBhY3Jvc3MgdGhlIHdob2xlIHRhYmxlLgoKKiBgbWluYCAsIGBtYXhgIGFuZCBgYXZnYCBhcmUgYnVpbHRpbiBhZ2dyZWdhdGUgZnVuY3Rpb25zIGluIFNRTGl0ZSAoYW5kIGFueSBvdGhlciBTUUwgZGF0YWJhc2Ugc3lzdGVtKS4gCiogVGhlcmUgYXJlIG90aGVyIHN1Y2ggZnVuY3Rpb25zIGF2YWlhbGFibGUuIApodHRwczovL3NxbGl0ZS5vcmcvbGFuZ19hZ2dmdW5jLmh0bWwKCiogSXQgaXMgbW9yZSBsaWtlbHkgdGhhdCB3ZSB3b3VsZCB3YW50IHRvIGZpbmQgc3VjaCB2YWx1ZXMgZm9yIGEgcmFuZ2UsIG9yIG11bHRpcGxlIHJhbmdlcyBvZiByb3dzIHdoZXJlIGVhY2ggcmFuZ2UgaXMgZGV0ZXJtaW5lZCBieSB0aGUgdmFsdWVzIG9mIHNvbWUgb3RoZXIgY29sdW1uIGluIHRoZSB0YWJsZS4gCiogQmVmb3JlIHdlIGRvIHRoaXMgd2Ugd2lsbCBsb29rIGF0IGhvdyB3ZSBjYW4gZmluZCBvdXQgd2hhdCBkaWZmZXJlbnQgdmFsdWVzIGFyZSBjb250YWluZWQgaW4gYSBnaXZlbiBjb2x1bW4uCgojIyBUaGUgYERpc3RpbmN0YCBrZXl3b3JkCgoqIEZvciB0aGUgU043NTc3IHRhYmxlIHRoZSBhbGxvd2FibGUgdmFsdWVzIGluIG1hbnkgb2YgdGhlIGNvbHVtbnMgYXJlIGxpc3RlZCBpbiB0aGUgU043NTc3IGRvY3VtZW50LiAKKiBCdXQgdGhpcyBkb2VzbuKAmXQgbWVhbiB0aGF0IHRoZXkgYWxsIGFjdHVhbGx5IGFwcGVhciBpbiB0aGUgZGF0YS4KCiogVG8gb2J0YWluIGEgbGlzdCBvZiB1bWlxdWUgdmFsdWVzIGluIGEgcGFydGljdWxhciBjb2x1bW4geW91IGNhbiB1c2UgdGhlIGBEaXN0aW5jdGAga2V5d29yZC4gCiogV2Ugd2lsbCB1c2UgdGhlIHRleHQgdmVyc2lvbiBvZiB0aGUgU043NTc3IHRhYmxlIGZvciB0aGVzZSBleGFtcGxlcy4KCiogRm9yIFExIGBIb3cgd291bGQgeW91IHZvdGUgaWYgdGhlcmUgd2VyZSBhIEdlbmVyYWwgRWxlY3Rpb24gdG9tb3Jyb3c/YCB0aGVyZSBhcmUgMTEgcG9zc2liaWxpdGllcyBsaXN0ZWQgYW5kIGFub3RoZXIgKC0xKSBmb3IgYSBtaXNzaW5nIHZhbHVlLgoKVG8gZmluZCBvdXQgd2hpY2ggdmFsdWVzIGFyZSBpbiB0aGUgZGF0YSB3ZSBjYW4gdXNlIHRoZSBxdWVyeToKCmBgYHNxbApTRUxFQ1QgRElTVElOQ1QgUTEKRlJPTSBTTjc1NzdfVGV4dDsKYGBgCgoqIFdlIGNhbiBzZWUgZnJvbSB0aGUgcmV1bHRzIG9mIHJ1bm5pbmcgdGhpcyB0aGF0IGFsbCAxMSB2YWx1ZXMgYXJlIHJlcHJlc2VudGVkIGFuZCB0aGF0IHRoZXJlIGlzIG5vIG1pc3NpbmcgZGF0YSBpbiB0aGlzIGZpZWxkLgoKKiBZb3UgY2FuIGhhdmUgbW9yZSB0aGFuIG9uZSBjb2x1bW4gbmFtZSBhZnRlciB0aGUgYERpc3RpbmN0YCBrZXl3b3JkLiAKKiB0aGUgcmVzdWx0cyB3aWxsIGluY2x1ZGUgYSByb3cgZm9yIGVhY2ggdW5pcXVlIGNvbWJpbmF0aW9uIG9mIHRoZSBjb2x1bW5zIGludm9sdmVkCgojIyBFeGVyY2lzZSAKUTMgYXNrcyBmb3IgdGhlIGxpa2VsaWhvb2Qgb2Ygdm90aW5nIG9uIGEgc2NhbGUgb2YgMSAtIDEwICgxMCAtIEFic29sdXRlbHkgY2VydGFpbiB0byB2b3RlKS4gV3JpdGUgYSBxdWVyeSB3aGljaCBzaG93cyBhbGwgb2YgdGhlIGNvbWJpbmF0aW9ucyBvZiB2b3RpbmcgaW50ZW50aW9ucyAoUTEpIGFuZCBsaWtlbGlob29kIG9mIHZvdGluZyAoUTMpLgoKIyMjIFNvbHV0aW9uOgpTb3J0aW5nIHRoZSByZXN1bHRzIGJ5IFExIGdpdmVzIHRoZSBwb2xpdGljYWwgcGFydGllcyBpbiBhbHBoYWJldGljYWwgb3JkZXIuCmBgYHNxbApTRUxFQ1QgRElTVElOQ1QgUTEsIFEzCkZST00gU043NTc3X1RleHQKT1JERVIgQlkgUTEKCmBgYAoKIyMgVGhlIGBncm91cCBieWAgY2xhdXNlIHRvIHN1bW1hcmlzZSBkYXRhCgoqIEp1c3Qga25vd2luZyB0aGUgY29tYmluYXRpb25zIGlzIG9mIGxpbWl0ZWQgdXNlLiBZb3UgcmVhbGx5IHdhbnQgdG8ga25vdyBgSG93IG1hbnlgIG9mIGVhY2ggb2YgdGhlIHZhbHVlcyB0aGVyZSBhcmUuIAoqIFRvIGRvIHRoaXMgd2UgdXNlIHRoZSBgR3JvdXAgQnlgIGNsYXVzZS4KCmBgYHNxbApTRUxFQ1QgUTEsCiAgICAgICBjb3VudCgqKSBhcyBOdW1fcG90ZW50aWFsX3ZvdGVycwpGUk9NIFNONzU3N19UZXh0CkdST1VQIEJZIFExCk9SREVSIEJZIFExOwoKYGBgCgoqIFRoaXMgcXVlcnkgZ2l2ZXMgdXMgYSBjb3VudCBvZiBwb3RlbnRpYWwgdm90ZXJzIGZvciBlYWNoIHBhcnR5LgoKKiB0aHJlZSBhZ2dyZWdhdGlvbiB3ZXJlIHBlcmZvcm1lZCBvdmVyIHRoZSBzaW5nbGUgY29sdW1uIFExLiAKKiBJdCBpcyBwb3NzaWJsZSB0byBhZ2dyZWdhdGUgb3ZlciBtdWx0aXBsZSBjb2x1bW5zIGJ5IHNwZWNpZnlpbmcgdGhlbSBpbiBib3RoIHRoZSBzZWxlY3QgYW5kIHRoZSBncm91cCBieSBjbGF1c2UuCgoqIFRoZSBncm91cGluZyB3aWxsIHRha2UgcGxhY2UgYmFzZWQgb24gdGhlIG9yZGVyIG9mIHRoZSBjb2x1bW5zIGxpc3RlZCBpbiB0aGUgZ3JvdXAgYnkgY2xhdXNlLgoKKiBXaGF0IGlzIG5vdCBhbGxvd2VkIGlzIHNwZWNpZnlpbmcgYSBub24tYWdncmVnYXRlZCBjb2x1bW4gaW4gdGhlIHNlbGVjdCBjbGF1c2Ugd2hpY2ggaXMgbm90IG1lbnRpb25lZCBpbiB0aGUgZ3JvdXAgYnkgY2xhdXNlLgoKIyMgVXNpbmcgdGhlIGBoYXZpbmdgIGNsYXVzZQoKKiBJbiBvcmRlciB0byBmaWx0ZXIgdGhlIHJvd3MgcmV0dXJuZWQgaW4gYSBub24tYWdncmVnYXRlZCBxdWVyeSB3ZSB1c2VkIHRoZSBgd2hlcmVgIGNsYXVzZS4gCgoqIEZvciBhbiBhZ2dyZWdhdGVkIHF1ZXJ5IHRoZSBlcXVpdmFsZW50IGlzIHRoZSBgaGF2aW5nYCBjbGF1c2UuCiogWW91IHVzZSB0aGUgYGhhdmluZ2AgY2xhdXNlIGJ5IHByb3ZpZGluZyBpdCB3aXRoIGEgZmlsdGVyIGV4cHJlc3Npb24gd2hpY2ggcmVmZXJlbmNlcyBvbmUgb2YgdGhlIGFnZ3JlZ2F0ZWQgY29sdW1ucy4KCiogSW4gYSBgaGF2aW5nYCBjbGF1c2UgeW91IGNhbiB1c2UgdGhlIGNvbHVtbiBhbGlhcyB0byByZWZlciB0byB0aGUgYWdncmVnYXRlZCBjb2x1bW4uCgpgYGBzcWwKU0VMRUNUIFExICwKICAgICAgIHN1bShkYWlseTMpIGFzIFRlbGVncmFwaF9yZWFkZXIKRlJPTSBTTjc1NzcgCkdST1VQIEJZIFExCkhBVklORyBUZWxlZ3JhcGhfcmVhZGVyID4gNTsKCmBgYAoKKiBJbiB0aGlzIGV4YW1wbGUgYWdncmVnYXRpbmcgYFRlbGVncmFwaCByZWFkZXJzYCBieSB2b3RlciBpbnRlbnRpb25zIGAoUTEpYC4KCiogV2UgYXJlIG9ubHkgaW50ZXJlc3RlZCBpbiB0aGUgZ3JvdXBzIHdoZXJlIHRoZXJlIGFyZSBtb3JlIHRoYW4gNSDigJhUZWxlZ3JhcGjigJkgcmVhZGVycy4KCiMjIEV4ZXJjaXNlIApJbiB0aGUgVUsgdGhlIOKAmFRlbGVncmFwaOKAmSBpcyByZWdhcmRlZCBhcyBhIHJpZ2h0LXdpbmcgbmV3c3BhcGVyIGFuZCB0aGUgQ29uc2VydmF0aXZlcyBhcmUgY29uc2lkZXJlZCB0byBiZSBhIHJpZ2h0LXdpbmcgcG9saXRpY2FsIHBhcnR5LgoKRG8gdGhlIHJlc3VsdHMgb2YgdGhlIHF1ZXJ5IGFib3ZlIHN1cHBvcnQgdGhpcz8KClRoZSDigJhNaXJyb3LigJkgaXMgYSBsZWZ0LXdpbmcgbmV3c3BhcGVyIGFuZCBMYWJvdXIgYXJlIGNvbnNpZGVyZWQgdG8gYmUgYSBsZWZ0LXdpbmcgcG9saXRpY2FsIHBhcnR5LiBSZS13cml0ZSB0aGUgcXVlcnkgYWJvdmUgdG8gc2VlIHdoaWNoIGdyb3VwIG9mIHN1cHBvcnRlcnMgYXJlIG1vcmUgbGlrZWx5IHRvIHJlYWQgdGhlIE1pcnJvcgoKWW91IGNhbiBicm93c2UgdGhlIE5ld3NwYXBlcnMgdGFibGUgdG8gZmluZCBvdXQgd2hpY2ggb2YgdGhlIGRhaWx5IGNvbHVtbnMgcmVmZXJzIHRvIHRoZSBNaXJyb3IKCiMjIyBTb2x1dGlvbjoKYGBgc3FsClNFTEVDVCBRMSAsCiAgICAgICBzdW0oZGFpbHkxMikgYXMgTWlycm9yX3JlYWRlcgpGUk9NIFNONzU3NyAKR1JPVVAgQlkgUTEKSEFWSU5HIE1pcnJvcl9yZWFkZXIgPiA1OwpgYGAKCgoKCgoKCgoKCgo=