Census data editing structure and within record editing
Download
1 / 31

Census Data Editing: Structure and Within Record Editing - PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on

Census Data Editing: Structure and Within Record Editing. Part I: Structure Editing. Summary. Part I: Structure Edits What are structure edits? Geography edits Hierarchy of records Correspondence between housing and population records Editing relationships in a household Family nuclei.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Census Data Editing: Structure and Within Record Editing' - stacey-martin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Census data editing structure and within record editing

Census Data Editing: Structure and Within Record Editing

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Part i structure editing

Part I: Structure Editing

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Summary
Summary

Part I: Structure Edits

What are structure edits?

Geography edits

Hierarchy of records

Correspondence between housing and population records

Editing relationships in a household

Family nuclei

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


What are structure edits
What are structure edits?

Structure edits check coverage and relationships between different units: persons, households, housing units, enumeration areas, etc. Specifically, they check that:

all households and collective quarters records within an enumeration area are present and are in the proper order;

all occupied housing units have person records, but vacant units have no person records;

households must have neither duplicate person records, nor missing person records;

enumeration areas must have neither duplicate nor missing housing records.

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Geography edits
Geography edits

Each EA must have the right geographic codes (city, province, region...)

Every housing unit in an EA should be entered and every record must have a valid EA code

The capture process must check this before editing of data commences

If errors remain, it is best to find the right code by returning to the enumeration documents and correcting manually, for example.

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Hierarchy of records
Hierarchy of records

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Hierarchy of records1
Hierarchy of records

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008

1_EA

2_Housing unit

4_Individual

4_Individual

2_Housing unit

3_Collective living quater

4_Individual

4_Individual

1_EA


Hierarchy of records2
Hierarchy of records

Type 1 (EA) followed by new Type 1 (if original EA empty) or Type 2 (Housing unit) or Type 3 (Collective Living Quarter)

Particular case of homeless people: create a dummy housing record to make structural checking easier

Type 2 (Housing Unit) followed by Type 1, 2 or 3 (if original dwelling vacant) or Type 4 (if original dwelling occupied)

Type 3 (Collective Living Quarter) followed by Type 4 (Individual)

If not occupied, empty CLQ allowed?

Type 4 (Individual) followed by Type 4 (other individual in the same dwelling or collective living quarter), or Type 2 or 3 (other dwelling or CLQ) or Type 1 (new EA)

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Correspondence between housing and population records
Correspondence between housing and population records

An occupied unit should have at least one person and a vacant unit should have no people: if Type 2 (Housing Unit) & category (vacant) followed by Type 4 (individual) then change the category to occupied

The number of occupants recorded on the Housing Unit form should be exactly the same as the sum of the individual records in the household. If not, change the number on the Housing Unit form

Population records should be sequenced (numbered)

Type 3 (CLQ) & category (Hospital) followed by multiple Type 4 (individual) of category “Retirement home” then change the category of the CLQ to “Retirement home”

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Editing relationships in a household
Editing relationships in a household

Each individual has a relation to the first person:

1st person (or Head, or reference person)

Spouse

Child of the 1st or of his/her spouse

Parent

Other relative

Friend

Lodger

...

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Editing relationships in a household1
Editing relationships in a household

Household with potential inconsistencies in age reporting

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Family nuclei
Family nuclei

Father:

Sex should be male and Age should be > minimum age

Mother

Sex should be female and Age should be > minimum age

Child

Age under a maximum limit ?

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Part ii within record editing

Part II: Within Record Editing

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Summary1
Summary

Part II: Within Record Edits

Validity and Consistency Checks

Top-down Editing versus Multiple-variable Editing

Example of Multiple-Variable Editing

Methods of Correcting and Imputing Data

Example of Hot Deck for Sample Household (Sex Only)

Example of Hot Deck for Sample Household (Sex and Age)

Issues Related to Hot Deck

Methods of Correcting and Imputing Data: General Principles

Edit Trails and the Use of Imputation Flags

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Validity and consistency checks
Validity and Consistency Checks

Validity checks are performed to see if the value of individual variables are plausible or lie within a reasonable range

Examples:

0<=AGE<=110

SEX= Female or SEX=Male

Consistency checks are performed to ensure that there is coherence between two or more variables

Examples:

Head of Household should have AGE>=15

A child should be younger than a head of household

A person with AGE<15 should never be married

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Top down editing versus multiple variable editing
Top-down Editing versus Multiple-Variable Editing

Top-down Editing approach starts by editing top priority variable (not necessarily first variable on questionnaire) and moves sequentially through all items in decreasing priority

During editing process, some edits change the value of an item more than once; this can introduce one or more errors in dataset

Example: Child’s age first imputed on basis of mother’s age. Later child’s age re-imputed on basis of reported years of schooling, which might be inconsistent with mother’s age

In this case, child’s age should keep being re-imputed till it is consistent

Important to avoid circular editing!

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Top down editing versus multiple variable editing1
Top-down Editing versus Multiple-Variable Editing

Multiple-Editing approach uses a set of rules that state the relationship between variables

Each statement is tested against data to see if true

Edit system keeps track of all false statements relating to invalid entries or inconsistencies

Assessment is then made on how to change record so that it will pass all edits and then decision is made

Fellegi-Holt principle of “minimum change” should be used

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Example of multiple variable editing head of household and spouse have same sex
Example of Multiple-Variable EditingHead of household and spouse have same sex

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Example of multiple variable editing head of household and spouse have same sex1
Example of Multiple-Variable EditingHead of household and spouse have same sex

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Methods of correcting and imputing data
Methods of Correcting and Imputing Data

The process of imputation changes one or more responses or missing values in a record or several records to ensure internally coherent records result

Before using any imputation method, the best strategy is to start with manual study of responses; imputation can then handle the remaining unresolved edit failures

Two methods of imputation: Cold Deck and Hot Deck

Cold Deck Imputation:

Used mainly for missing or unknown values (not for inconsistent/invalid values)

Values are imputed on a proportional basis from a distribution of valid responses (e.g., from previous census)

In doing so, cold deck draws values from a fixed (but possibly outdated) distribution of values

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Methods of correcting and imputing data1
Methods of Correcting and Imputing Data

Hot Deck or Dynamic Imputation:

Used for both missing data and inconsistent/invalid items

Uses one or more variables to estimate the likely response based on data about individuals with similar characteristics

The “donor set” (or imputation matrix) constantly changes through updating; therefore, imputations dynamically change during the process of editing all the records

Thus, hot deck draws from a distribution that dynamically changes with each imputation and eventually (through modifications) “approaches” the distribution of current data set

Caution: if the different items for a particular record have unknown values, hot deck may not use the same “donor” to impute for both missing values; in this case, it is preferable to use the same donor for both items

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Example of hot deck for sample household sex only
Example of Hot Deck for Sample Household (Sex Only)

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Example of hot deck for age sex and relationship
Example of Hot Deck for Age (Sex and Relationship)

Initial Imputation Matrix For Age Based on Sex and Relationship

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Example of hot deck for age sex and relationship1
Example of Hot Deck for Age (Sex and Relationship)

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Census data editing structure and within record editing
Example of Hot Deck for Age (Sex and Relationship)Initial Imputation Matrix For Age Based on Sex and Relationship

Dynamic Imputation Matrix After Multiple Changes

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Issues related to hot deck
Issues Related to Hot Deck

Devise dynamic imputation matrices based on people living in same small geographic area since they tend to be homogeneous with respect to many characteristics, i.e., different imputation matrices for different geographic areas should be created

Sometimes the simplest approaches are best: for example, for a missing housing attribute, it may be preferable to use the value of a neighboring household rather than using a complex imputation matrix that may result in the assignment of a value from outside the neighborhood

Before using dynamic imputation, an effort should be made to use related items instead. For example, if marital status is missing for an individual and there exists a spouse for that individual, then the value “married” should be assigned

One should edit key items such as age and sex first so that these can be used in other imputation matrices for lower priority items

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Issues related to hot deck1
Issues Related to Hot Deck

Construct imputation matrices based on research from administrative sources or previous censuses and surveys

Standardized imputation matrices, (i.e., having standard dimensions, such as age and sex (e.g., for language)) can streamline process since they can be tested and applied quickly

BUT if language missing, first look to language of others in the same household or to race, ethnicity, birthplace before using dynamic imputation; i.e., an attempt should be made to use related information to assign values before resorting to imputation

Some editing teams keep more than one value per cell in imputation matrices to protect against same value being imputed multiple times; e.g., in case of 4 male children in household all with ages unknown, different values will be assigned

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Issues related to hot deck2
Issues Related to Hot Deck

Imputation matrices that are too big (with too many dimensions) cannot be updated thoroughly, leading to inefficiencies and inaccuracies

Imputation matrices that are too small (with too few dimensions or too few groupings within dimensions) may lead to the same donor value being used repeatedly in imputation before the matrix is updated

Some items such as occupation and industry are notoriously difficult to edit since the large number of categories can make dynamic imputation very cumbersome; in such cases, may be counter-productive to impute and may be preferable to use “not stated”

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Methods of correcting and imputing data general principles
Methods of Correcting and Imputing Data: General Principles

Imputed record should closely resemble the failed edit record; impute for a minimum number of variables

Imputed record should satisfy all edits

All imputed values should be flagged and methods and sources of imputation should be clearly specified

Both un-imputed and imputed values should be stored to allow for evaluation of degree and effects of imputation

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Edit trails and the use of imputation flags
Edit Trails and the Use of Imputation Flags

Important to generate edit trail showing all data changes and substituted values with their tallies

Counters of several types are essential to process planning and management: i) number of cases of each type of error; ii) non-response rates for each item; iii) imputation rates for each item, ….

Imputation flags are binary flags that change from initial value of 0 to 1 if original value of data is changed in any way; flags should be added onto each item that is imputed

Although a separate file with imputation flags takes up considerable space, this information is critical for planning of future censuses; e.g., As a means to investigate age threshold below which female with “child ever born” triggers a query edit and to decide if threshold should be modified for future rounds

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008


Thank you

THANK YOU!

UNSD-UNESCAP Regional Workshop on Census Data Processing: Contemporary technologies for data capture, methodology and practice of data editing, documentation and archiving

Bangkok, Thailand, 15-19 September 2008