How does the database represents missing data

image 1
image 1
SELECT * from 
    SN7577_nulls
WHERE numage is NULL;
image2
image2

Other representations of missing data

  • The SN7577_nulls table was specially created to demonstrate how NULLs appear in DB Browser.
  • The proper SN7577 table does not contain any missing values in the sense of having consequetive ,,s.

  • This is because the SN7577 data is provided by the UKDS (UK Data Service) whose role in part is to clean datasets before making them publicly available.

  • However the data provided to them, could well have missing data.
  • In the case of the SN7577 dataset this is typically dealt with by substituting the missing value with a value of -1.

  • This is explained in the provided data dictionary for the SN7577 dataset.

  • An extract for Q2 is shown below.

image
image
  • You can see from the extract of the SN7577_nulls file above that there are several -1 values in the Q2 field indicating the original data had no value, therefore interpreted as a NULL value for specific rows in Q2.

  • This is very different from rows which have a value of 11 for column Q2 (3rd row from bottom).

  • The value 11 means that the participant refused to provide an answer.
  • The refusal may not tell you which party they are inclined to vote for, but it does convey some kind of information.
  • A NULL value tells you nothing.

  • Different statistical packages like SPSS or Stata have their own way of representing NULL values such as -99 or -999.
  • You need to be aware of how NULL values in your dataset are being represented.

Dealing with missing data

  • Once you know how NULL values are being represented in your data you can find them and allow for them in your SQL queries.
SELECT *
FROM SN7577_nulls
WHERE Q2 = 11;
  • returns 39 rows
SELECT *
FROM SN7577_nulls
WHERE Q2 = -1;
  • returns 898 rows, nearly 70% of the sample.
  • You would have to decide if the reaming 30% was sufficient for you to use in meaningful analysis.

  • If you need to test for actual NULL values in the data, you use the IS operator and the NULL keyword

SELECT *
FROM SN7577_nulls
WHERE numage IS NULL;
  • If you wish to omit rows with NULLs then include the NOT operator.
SELECT *
FROM SN7577_nulls
WHERE numage IS NOT NULL;
LS0tCnRpdGxlOiAiU1FMIE1pc3NpbmcgRGF0YSIKb3V0cHV0OiBodG1sX25vdGVib29rCi0tLQoKIyMgSG93IGRvZXMgdGhlIGRhdGFiYXNlIHJlcHJlc2VudHMgbWlzc2luZyBkYXRhCgoqICBBIG5vdGVkIGVhcmxpZXIsIGFsbCBkYXRhYmFzZSBzeXN0ZW0gaGF2ZSB0aGUgY29uY2VwdCBvZiBhIGBOVUxMYCB2YWx1ZTsgU29tZXRoaW5nIHdoaWNoIGlzIG1pc3NpbmcgYW5kIG5vdGhpbmcgaXMga25vd24gYWJvdXQgaXQuCgoqIEluIERCIEJyb3dzZXIgd2UgY2FuIGNob29zZSBob3cgd2Ugd2FudCBgTlVMTHNgIGluIGEgdGFibGUgdG8gYmUgZGlzcGxheWVkLiAKCiogV2hlbiB3ZSBoYWQgb3VyIGluaXRpYWwgbG9vayBhdCBEQiBCcm93c2VyLCB3ZSB1c2VkIHRoZSBgVmlldyB8IFByZWZlcmVuY2VgIG9wdGlvbiB0byBjaGFuZ2UgdGhlIGJhY2tncm91bmQgY29sb3VyIG9mIGNlbGxzIGluIGEgdGFibGUgd2hpY2ggaGFzIGEgYE5VTExgIHZhbHVlcyBhcyByZWQuIAoKKiBUaGUgZXhhbXBsZSBiZWxvdyBpcyBhIHZlcnNpb24gb2YgdGhlIFNONzU3NyB0YWJsZSB3aXRoIG1vc3Qgb2YgdGhlIGNvbHVtbnMgcmVtb3ZlZCBhbmQgc29tZSBOVUxMIHZhbHVlcyBpbnRyb2R1Y2VkIGluIHRoZSBgbnVtYWdlYCBjb2x1bW4uCgohW2ltYWdlIDFdKGltYWdlcy9TUUxfMDRfTnVsbHNfMDEucG5nKQoKKiBJZiB5b3UgdHlwZSBgPU5VTExgIGluIHRoZSBmaWx0ZXIgYm94IGZvciBgbnVtYWdlYCwgb25seSB0aGUgcm93cyB3aXRoIGBOVUxMYCBpbiBgbnVtYWdlYCB3aWxsIGJlIGRpc3BsYXllZC4KKiBZb3UgY2FuIGdldCB0aGUgc2FtZSByZXN1bHRzIHVzaW5nIHRoZSBmb2xsb3dpbmcgcXVlcnk6CgpgYGBzcWwKU0VMRUNUICogZnJvbSAKICAgIFNONzU3N19udWxscwpXSEVSRSBudW1hZ2UgaXMgTlVMTDsKYGBgCiogVGhpcyB0YWJsZSB3YXMgY3JlYXRlZCBmcm9tIGEgYGNzdmAgZmlsZSB3aGljaCBsb29rcyBsaWtlIHRoaXMKCiFbaW1hZ2UyXShpbWFnZXMvU1FMXzA0X051bGxzXzAyLnBuZykKCiogVGhlIGxpbmUgbnVtYmVycyBvbiB0aGUgbGVmdCBhcmUgZnJvbSB0aGUgdGV4dCBlZGl0b3IgYW5kIGFyZSBub3QgcGFydCBvZiB0aGUgZGF0YS4KKiBZb3UgY2FuIHNlZSB0aGF0IGluIGBsaW5lcyA0LCA5IGFuZCAxNWAgdGhlcmUgYXJlIGNvbnNlcXVldGl2ZSBgLCAsYHMgd2hlcmUgdGhlIG51bWFnZSB2YWx1ZXMgc2hvdWxkIGJlLiBUaGVzZSB2YWx1ZXMgYXJlIG1pc3NpbmcgZnJvbSB0aGUgZGF0YS4KCiMjIyBPdGhlciByZXByZXNlbnRhdGlvbnMgb2YgbWlzc2luZyBkYXRhCgoqIFRoZSBgU043NTc3X251bGxzYCB0YWJsZSB3YXMgc3BlY2lhbGx5IGNyZWF0ZWQgdG8gZGVtb25zdHJhdGUgaG93IE5VTExzIGFwcGVhciBpbiBEQiBCcm93c2VyLiAKKiBUaGUgcHJvcGVyIGBTTjc1NzdgIHRhYmxlIGRvZXMgbm90IGNvbnRhaW4gYW55IG1pc3NpbmcgdmFsdWVzIGluIHRoZSBzZW5zZSBvZiBoYXZpbmcgY29uc2VxdWV0aXZlIGAsLGBzLiAKCiogVGhpcyBpcyBiZWNhdXNlIHRoZSBTTjc1NzcgZGF0YSBpcyBwcm92aWRlZCBieSB0aGUgYFVLRFMgKFVLIERhdGEgU2VydmljZSlgIHdob3NlIHJvbGUgaW4gcGFydCBpcyB0byBjbGVhbiBkYXRhc2V0cyBiZWZvcmUgbWFraW5nIHRoZW0gcHVibGljbHkgYXZhaWxhYmxlLgoKKiBIb3dldmVyIHRoZSBkYXRhIHByb3ZpZGVkIHRvIHRoZW0sIGNvdWxkIHdlbGwgaGF2ZSBtaXNzaW5nIGRhdGEuIAoqIEluIHRoZSBjYXNlIG9mIHRoZSBgU043NTc3YCBkYXRhc2V0IHRoaXMgaXMgdHlwaWNhbGx5IGRlYWx0IHdpdGggYnkgc3Vic3RpdHV0aW5nIHRoZSBtaXNzaW5nIHZhbHVlIHdpdGggYSB2YWx1ZSBvZiBgLTFgLiAgCgoqIFRoaXMgaXMgZXhwbGFpbmVkIGluIHRoZSBwcm92aWRlZCBgZGF0YSBkaWN0aW9uYXJ5YCBmb3IgdGhlIGBTTjc1NzdgIGRhdGFzZXQuIAoKKiBBbiBleHRyYWN0IGZvciBgUTJgIGlzIHNob3duIGJlbG93LgoKIVtpbWFnZV0oaW1hZ2VzL1NRTF8wNF9OdWxsc18wMy5wbmcpCgoqIFlvdSBjYW4gc2VlIGZyb20gdGhlIGV4dHJhY3Qgb2YgdGhlIGBTTjc1NzdfbnVsbHNgIGZpbGUgYWJvdmUgdGhhdCB0aGVyZSBhcmUgc2V2ZXJhbCBgLTFgIHZhbHVlcyBpbiB0aGUgYFEyYCBmaWVsZCBpbmRpY2F0aW5nIHRoZSBvcmlnaW5hbCBkYXRhIGhhZCBubyB2YWx1ZSwgdGhlcmVmb3JlIGludGVycHJldGVkIGFzIGEgYE5VTExgIHZhbHVlIGZvciBzcGVjaWZpYyByb3dzIGluIGBRMmAuCgoqIFRoaXMgaXMgdmVyeSBkaWZmZXJlbnQgZnJvbSByb3dzIHdoaWNoIGhhdmUgYSB2YWx1ZSBvZiBgMTFgIGZvciBjb2x1bW4gYFEyYCAoM3JkIHJvdyBmcm9tIGJvdHRvbSkuIAoKKiBUaGUgYHZhbHVlIDExYCBtZWFucyB0aGF0IHRoZSBwYXJ0aWNpcGFudCBgcmVmdXNlZGAgdG8gcHJvdmlkZSBhbiBhbnN3ZXIuIAoqIFRoZSByZWZ1c2FsIG1heSBub3QgdGVsbCB5b3Ugd2hpY2ggcGFydHkgdGhleSBhcmUgaW5jbGluZWQgdG8gdm90ZSBmb3IsIGJ1dCBpdCBkb2VzIGNvbnZleSBzb21lIGtpbmQgb2YgaW5mb3JtYXRpb24uIAoqICoqQSBgTlVMTGAgdmFsdWUgdGVsbHMgeW91IG5vdGhpbmcuKioKCiogRGlmZmVyZW50IHN0YXRpc3RpY2FsIHBhY2thZ2VzIGxpa2UgYFNQU1NgIG9yIGBTdGF0YWAgaGF2ZSB0aGVpciBvd24gd2F5IG9mIHJlcHJlc2VudGluZyBgTlVMTCB2YWx1ZXNgIHN1Y2ggYXMgYC05OWAgb3IgYC05OTlgLiAKKiBZb3UgbmVlZCB0byBiZSBhd2FyZSBvZiBob3cgTlVMTCB2YWx1ZXMgaW4geW91ciBkYXRhc2V0IGFyZSBiZWluZyByZXByZXNlbnRlZC4KCgojIyMgRGVhbGluZyB3aXRoIG1pc3NpbmcgZGF0YQoKKiBPbmNlIHlvdSBrbm93IGhvdyBgTlVMTGAgdmFsdWVzIGFyZSBiZWluZyByZXByZXNlbnRlZCBpbiB5b3VyIGRhdGEgeW91IGNhbiBmaW5kIHRoZW0gYW5kIGFsbG93IGZvciB0aGVtIGluIHlvdXIgU1FMIHF1ZXJpZXMuCgpgYGBzcWwKU0VMRUNUICoKRlJPTSBTTjc1NzdfbnVsbHMKV0hFUkUgUTIgPSAxMTsKCmBgYAoKKiByZXR1cm5zIDM5IHJvd3MKCmBgYHNxbApTRUxFQ1QgKgpGUk9NIFNONzU3N19udWxscwpXSEVSRSBRMiA9IC0xOwpgYGAKCiogcmV0dXJucyA4OTggcm93cywgbmVhcmx5IDcwJSBvZiB0aGUgc2FtcGxlLiAKKiBZb3Ugd291bGQgaGF2ZSB0byBkZWNpZGUgaWYgdGhlIHJlYW1pbmcgMzAlIHdhcyBzdWZmaWNpZW50IGZvciB5b3UgdG8gdXNlIGluIG1lYW5pbmdmdWwgYW5hbHlzaXMuCgoqIElmIHlvdSBuZWVkIHRvIHRlc3QgZm9yIGFjdHVhbCBgTlVMTGAgdmFsdWVzIGluIHRoZSBkYXRhLCB5b3UgdXNlIHRoZSBgSVNgIG9wZXJhdG9yIGFuZCB0aGUgYE5VTExgIGtleXdvcmQKCmBgYHNxbApTRUxFQ1QgKgpGUk9NIFNONzU3N19udWxscwpXSEVSRSBudW1hZ2UgSVMgTlVMTDsKYGBgCgoqIElmIHlvdSB3aXNoIHRvIGBvbWl0YCByb3dzIHdpdGggTlVMTHMgdGhlbiBpbmNsdWRlIHRoZSBgTk9UYCBvcGVyYXRvci4KCgpgYGBzcWwKU0VMRUNUICoKRlJPTSBTTjc1NzdfbnVsbHMKV0hFUkUgbnVtYWdlIElTIE5PVCBOVUxMOwoKYGBgCgoKCg==