Edit and Imputation

**This information has been copied across from the SCROL website and some of it may be out-of-date. This will be updated in due course**

Edit and Imputation

People completing Census forms can sometimes make mistakes or accidentally leave questions unanswered. Inherent in optical scanning are errors caused by bad handwriting or dust on the forms. Users of Census data did not wish to fill in gaps in tables containing "not known" responses by making estimates themselves for missing values, nor did they wish to have to cope with inconsistencies within or between tables caused by mistakes. There was a danger that different users would make estimates for the complete population in different ways, creating inconsistencies between their results. Consequently, the Census offices put in place an Edit and Imputation strategy, with the aim of estimating for all missing data and resolving inconsistencies in responses.


For the 2001 Census, the Office for National Statistics, on behalf of all the UK Census offices, devised an Edit and Donor Imputation System (EDIS) and applied it to individual records. It was designed to iron out inconsistencies (edit) and fill in the gaps in the data (imputation) we received from people and households, except for the voluntary questions on religion.

The One Number Census process, described elsewhere, imputed complete records, whole households and individuals who were missed from the Census.


Edit and Imputation can be sub-divided into the following elements:

Multi-tick rules dealt with cases where more than one answer box was ticked but only one option was allowed.

Range checks prevented answers being outside an acceptable range.

Filter rules resolved some inconsistencies and set fields to 'No Code Required' where depending on, say, age questions need not have been answered. The variable Activity was also derived at this stage.

A set of Edit rules dealt with responses which appeared to be in error or inconsistent when compared with answers to other questions. Edit either changed the response to a specific answer or blanked it out and left it to imputation to determine the value.

Edit also identified unlikely, but not impossible responses, which the Census offices dealt with according to how improbable the numbers of unlikely responses were.

The Imputation component filled in all items which were still missing at the end of the Edit stage. It did this by searching through the data for a similar person or household (the donor) whose responses it copied over into the gaps on the records with missing answers (the recipient). The Office for National Statistics had drawn up a series of criteria to determine what was meant by 'similar'. These criteria were adapted for Scottish data and entailed defining a suitable selection of variables (Primary Matching Variables) to match on for each missing item. There were rules to cope with recipients with several missing items. There were also rules to ensure that each donor was not used too frequently.

If a record failed to find a donor or if it had been left with any missing relationship codes, the missing values were assigned by Fallback processing. Fallback worked on the principle of a cold deck. It chose a value from a representative set of values applicable to the variable concerned. The set of values had probabilities attached.

Even after a record had gone through fallback processing, it was possible for some records still to contain missing variables because a consistent answer could not be found. As customers would not accept any missing answers these households had to be corrected. For person records in households with 8 or less persons this was done by substituting a household chosen at random from an error free set. For all other records it was done by a data file amendment.

A data file amendment consists of editing the values of fields on each record in error individually. As this is expensive, such edits were kept to a minimum.

The General Register Office for Scotland has provided more detailed information on data quality. This information will be expanded as analyses on data quality are completed.