1 / 17

Breaking the Census "Code": Reconstructing Original Record-Level Data from Summary Tables

Breaking the Census "Code": Reconstructing Original Record-Level Data from Summary Tables. Dmitry Messen Houston-Galveston Area Council. Need for Disaggregate Demographic Data. One person=one record (one household=one record) Agent-based Land Use Forecasting Model (UrbanSim)

zena
Download Presentation

Breaking the Census "Code": Reconstructing Original Record-Level Data from Summary Tables

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Breaking the Census "Code": Reconstructing Original Record-Level Data from Summary Tables Dmitry Messen Houston-Galveston Area Council

  2. Need for Disaggregate Demographic Data • One person=one record (one household=one record) • Agent-based Land Use Forecasting Model (UrbanSim) • Household Location Choice • Population Evolution Microsimulation • Survival, child birth, migration • Household Evolution Microsimulation • Household formation and dissolution

  3. Synthesis Strategies • Strategy 1: One-step synthesis of all the attributes (N) • Get N separate counts (on each attribute) • Fill in the table margins • Get record-level sample data (PUMS) • Estimate conditional probabilities • Run IPF (Iterative Proportional Fitting) • Fill in the table cells, preserve the margins • Quick results; however, tons of information is not used (wasted)—Spendthrift Synthesis

  4. Synthesis Strategies • Strategy 2: Multi-step synthesis • Guiding principles • Lowest level of spatial resolution • Use all available information • Minimize synthesis • Parsimonious Synthesis

  5. Census Data • Decennial Census • SF-1 Tables • Based on “Short Form” (100% count) • Basic Demographic Info • Age, Sex, Race, Hispanic, Type of Household/Family, Relation to Head of Household • SF-3 Tables • Based on “Long Form” (16% sample) • No Long Form in 2010; ACS • Expanded Socioeconomic Data

  6. Comparing SF-1 to SF-3

  7. Short Form • Based on the “Short Form” responses Census compiles master files of persons and households • All SF-1 Tables are just tabulations from the master file • We can’t see the entire master file, we only have indirect information as revealed by the tabulations • As if the Master File is an encrypted message and we are trying to break the code • MRI/CAT-scan analogy

  8. Master File • Project Goal • To recreate the master file using available summary tabulations • Constraints • Use all available data • Minimize guessing (IPF) • Final product must be fully consistent with SF1 tabulations • Tabulations produced from the reconstructed master file should be identical to SF-1 tables

  9. Expansion Tables • SF-1 Expansion tables (e.g., 16A, 16B, 16I) • 9 categories (A,B,C,..I) • 5 single races, • 1 Other race • 1 Two or more races • 1 Hispanics • 1 White Not Hispanics

  10. Core SF-1 Tables • Tables 27, 28, 30 • Age groups: 0-17, 18-64, 65+ (65-102) • Household Roles—Major Groups: • Householder or Spouse (HS) • Household Head (HH) • Male/Female x Fam/NonFam Alone/NonFam Not Alone • Spouse (SP) • Household Member (HM) • Non-Relative (NR) • Group quarters inhabitant (GQ1, GQ2)

  11. Operational Hierarchy • Rules of Internal Consistency (sudoku puzzle) • No additional info • External Constraints • Race-Hisp Constraint (Tables 5,6,8) • Race-Hisp-Age (Under 18, Over 18) • 0-17 = Under 18,18-65 = Over 18,65-102 = Over 18 • Sex Constraint (Table 12) • Sex-Age • IPF (aka raking, balancing) procedure

  12. Additional Info • Size distribution (1,2,3,4,5,6,7+) for Family and Non-Family Households • By Race of Household Head • Table 26 • Count of MCF and Other Families by Presence (0, at least 1, at least 2) of Children (<18 years old) • By Race of Household Head • Table 35

  13. Phases • Phase 1: Race-Hispanic Assignment • Phase 2: Sex Assignment • Phase 3: Type of Family (married couple or other) Assignment • Phase 4: “Child” Role Assignment • Generate a list of people from the summary table • Phase 5: Match MCF householders with Spouses (PUMS-based probabilities) • Phase 6: Household Size Assignment • Phase 7: Assign People to Households

  14. Implementation • Implemented in SAS • Still experimental • Completed all 7 phases, now reworking the sequence • Stand alone IPF module • Integer solution • 13 counties, 56K+ Blocks, 4.8M+ People

  15. What’s Next • Testing • Documenting • Assigning Socioeconomic (non-SF1) Attributes • Developing Household Evolution Model • Analyzing Census 2010 SF-1 Table shells for compatibility

  16. Thank you! Questions?

More Related