1 / 10

5. Integration of Microdata and Metadata (9 slides)

This presentation discusses the integration of microdata and metadata in historical censuses. It covers topics such as sample integration, complete documentation, enumerator instructions, data dictionaries, codebooks, and systematic metadata. The presentation also highlights the use of IPUMS-International for high precision samples with implicit stratification.

stacyk
Download Presentation

5. Integration of Microdata and Metadata (9 slides)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 5. Integration of Microdata and Metadata (9 slides)

  2. Integrating samples • Project requests new, high quality samples for historical censuses (before 1995) –drawn to uniform criteria • Complete documentation, in official language and English translation • Census forms • Enumerator instructions • Data dictionaries and codebooks • Complete metadata for all samples • Systematic metadata for all variables • Universe • Definitions • Comparability • Dynamic metadata system to compare phrasing of any question in any combination of countries and censuses

  3. IPUMS-International: High precision samples with implicit stratification • Suppress all identifying information: names, id numbers, street addresses, low-level administrative geography (NUTS-5, NUTS-4?, NUTS-3?, NUTS-2?) • Sample is stratified by lowest level geography (census tract) • Lower standard errors than a classic random sample—to the extent that variables of interest are correlated with geography • Implicit geographical stratification is equivalent to extremely fine geographic stratification with proportional weighting • Many of our NSI partners have adopted the IPUMS sample design (see table). • 26 countries provided 100% microdata for the MPC to draw the sample • Europe: almost all NSIs have drawn samples to IPUMS specs. for all censuses • 188 High precision samples for 78 countries entrusting microdata (08/08/2007) • 10% samples: 125 censuses 62 countries • 5% 34 10 • <5% 29 6

  4. IPUMS-International: High precision samples with implicit stratification • Suppress all identifying information: names, id numbers, street addresses, low-level administrative geography (NUTS-5, NUTS-4?, NUTS-3?, NUTS-2?) • Sample is stratified by lowest level geography (census tract) • Lower standard errors than a classic random sample—to the extent that variables of interest are correlated with geography • Implicit geographical stratification is equivalent to extremely fine geographic stratification with proportional weighting • Many of our NSI partners have adopted the IPUMS sample design (see table). • 26 countries provided 100% microdata for the MPC to draw the sample • Europe: almost all NSIs have drawn samples to IPUMS specs. for all censuses • 188 High precision samples for 78 countries entrusting microdata (08/08/2007) • 10% samples: 125 censuses 62 countries • 5% 34 10 • <5% 29 6

  5. International and chronological integration of microdata and metadata: methods and procedures • Integrated microdata • Challenge: retain all significant detail, integrate everything • Solution: composite coding scheme (multiple digits, each carries meaning—think ISCO) • Use international standards where appropriate • Integrated metadata (documentation) • Summarize and describe commonalities • Explain unique characteristics • Dynamically generate metadata according to needs (countries, samples, variables) of individual researcher using XML database

  6. Use international standards as points of departure: • UNESCO (1997) The International Standard Classification of Education (isced 1997). • International Labor Office (1990) International Standard Classification of Occupations (isco-88). • United Nations Statistics Division (1990) International Standard Industrial Classification of All Economic Activities (isic-88). • United Nations Economic Commission for Europe (1999). Recommendations for the 2000 Censuses of Population and Housing in the ECE Region.

  7. IPUMSi » Integrate (harmonize), not standardize 1. retain all original detail 2. harmonize every digit INTEGRATES » How is this possible? Composite codes (multiple digits, 111) Not serial (1, 2, 3, ....) (example: next slide) » Why? Researcher confidently understands the logic uses as much detail as needed

  8. Composite coding scheme: Employment Status

  9. Metadata: Employment Status EMPSTATEmployment status DescriptionEMPSTAT indicates whether or not the respondent was part of the labor force -- working or seeking work -- over a specified period of time. Depending on the sample, EMPSTAT can also convey further information.The first digit of EMPSTAT is fully comparable, and classifies the population into three groups: employed, unemployed, and inactive. The combination of employed and unemployed yields the total labor force. The second and third digits of EMPSTAT preserve additional information available for some countries and census years but not for others.Employment status is sometimes referred to in other sources as "activity status."Comparability -- GeneralThe age of persons to whom the question applies varies across the samples (see Universe). The reference period for the employment status question varies. For most samples, employment status was reported with respect to the day of the census or…

  10. Integrate: retain all significant detail, harmonize everything Not standardize: force square pegs in round holes Metadata: Employment Status, example: Mexico Comparability -- MexicoThe universe and reference period are fully comparable across the Mexico samples. The 1970 Census did not provide detail on the inactive population except for "houseworkers," while the later samples have numerous subcategories.In 1990, the employment status question refers to "Principal Activity" and therefore under-reports secondary economic activity by students, housewives, family-workers, the semi-retired, and others.The 2000 Census sought to overcome deficiencies in reporting work status for people whose primary activity was not work (students, housewives, retirees, etc.), but who in fact were working according to international definitions. A second question introduced for the first time in 2000 sought to capture this secondary economic activity. For strict comparability with earlier Mexican censuses, this recovered activity (codes 1101-1106) should be considered "inactive."…

More Related