slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and PowerPoint Presentation
Download Presentation
The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and

Loading in 2 Seconds...

play fullscreen
1 / 51

The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and - PowerPoint PPT Presentation


  • 364 Views
  • Uploaded on

The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges. Dean H. Judson Planning, Research and Evaluation Division U.S. Census Bureau. Outline of Presentation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and' - Mia_John


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

The Statistical Administrative Records System and Administrative Records Experiment 2000: System Design, Successes, and Challenges

Dean H. Judson

Planning, Research and Evaluation Division

U.S. Census Bureau

outline of presentation
Outline of Presentation
  • General principles for using administrative records properly
  • Overview of StARS/AREX history, goals and design
  • Applications and evaluations: StARS 1999 and StARS 2000 versus Census 2000
how administrative records are created and used
How Administrative Records Are Created and Used

Policy changes which change the

definition of events and objects

  • “Ontologies” and thresholds
  • for observation

Data collection

Data entry errors and coding schemes

Data management issues

Query structure and spurious structure

some important principles
Some Important Principles
  • Database  Population !
  • Database  Truth !
  • The “true” Data exist in the “real world”, as does the “true” Population.
  • But, the database gives us information that points to the Truth, and points to the Population.
slide6

“Current” employees of Company X,

October 1, 2001

Resident U.S. Population on April 1, 2000

Population

in StARS

Database

Population

in Employee

Database

Accidental Duplication

Accidental Duplication

Non-U.S.

Residents

Deceased

Terminated,

not yet entered

in database

Oops!

Accidentally

included

contractors!

ontologies and data quality

“Real world”

“Real world”

“Real world”

“Real world”

Database

Database

Database

Database

Ontologies and Data Quality

Incomplete Representation

Proper Representation

State 1

State 1

State 1

State 1

State 2

State 2

State 2

State 2

State 3

State 3

State 3

State 4

Ambiguous Representation

Meaningless States

State 1

State 1

State 1

State 1

State 2

State 2

State 2

State 2

State 3

State 3

State 4

Data Quality  The function that maps from “real world” to database allows one to reconstruct

the “real world” from the database values. Source: Wand and Wang, 1996:90

coverage versus intensity content how can we get the best of both

High

Careful, well-done

sample survey

Intensity/Content of Data Collection

Administrative Records/

Data Warehouse

Low

Low

High

Coverage of Target Population

Coverage versus Intensity/Content:How can we get the best of both?
slide9

A Model for “Borrowing Strength”

Original DW Database (X)

“Ground

Truth”

Carefully

Collected Data (Y)

X

Representative

Sample of X

Estimated Model: Y=f(X)

Augmented DW Database, with

X and estimated Y’s

background and history
Background and History
  • Statistical Administrative Records System
    • Six large Federal input files: IRS 1040, IRS 1099, Selective Service, Medicare, Indian Health Service, HUD-TRACS/MTCS
    • One lookup file: SSA/Census NUMIDENT
  • AREX 2000
    • Attempt to use StARS data to simulate administrative records census
what was the purpose of stars 1999 and arex 2000
What Was the Purpose of StARS 1999 and AREX 2000?
  • Test the feasibility of an administrative records census
    • StARS: Nationwide
    • AREX: two counties in Maryland, three in Colorado
      • MD 1.4M persons in 558K households
      • CO: 1.2M persons in 459K households
  • Test two methods for conducting an administrative records census
    • top-down method
    • bottom-up method (match to address list, add’tl operations)
can we do this
Can We Do This?
  • Title 13, U.S. Code (§6, (a)-(c) abridged:
    • “The Secretary…may call upon any other department…of the Federal Government…for information pertinent to the work provided for in this title…To the maximum extent possible, the Secretary…shall use [such] information instead of conducting direct inquiries”
  • Privacy Act, 1974 (Title 5 §6, abridged):
    • “No agency shall disclose any record…unless…to the Bureau of the Census for purposes of planning or carrying out a census or survey or related [title 13] activity”
    • “Each agency that maintains a system of records shall…publish in the Federal Register upon establishment…the existence and character of the system of records” (Published StARS in FR , January 1999)
slide14

The Statistical Administrative Records System-1999

Address Processing

795,742,702

Hygiene & Unduplication

136,154,293

Person Processing

875,750,973

SSN Validation (PVS)

844,945,296 Valid

(96.5%)

Unduplication

279,601,038

Geocoding

102,965,122 (75.6% Coded)

33,189,171 (24.4% Uncoded)

Gender

Model

Mortality

Model

TIGER

Code 1

ABI

Race

Model

TY98 IRS 1040

119,946,193

TY98 IRS 1099

598,075,971

Medicare

56,837,022

Selective Service

13,176,234

HUD TRACS

3,342,234

Indian Health Service

3,106,821

Census

NUMIDENT

396,185,872

Edited

IRS 1099

Edited

IRS 1040

243,260,776

Edited

Selective Service

Edited

HUD TRACS

Edited

Indian Health Service

NUMIDENT

676,589,439

Person Characteristics File (PCF)

396,185,872

Remove Deceased/Create Composite Record

257,764,909

Invalid

SSNs

30,805,677

(3.5%)

Edited

Medicare

?

Research

Extraction of AREX Test Site Records

1,459,760 in Baltimore Site

1,229,274 in Colorado Site

slide15

Statistical Administrative Records System-2000 (DRAFT)

Hygiene & Unduplication

158,593,956

Geocoding

125,647,359

SSN Validation

895,196,891

Unduplication

289,968,449

Race

Model

Person Processing

905,432,071

Mortality

Model

TIGER/MAF

Code 1

ABI

Address Processing

725,230,009

Gender

Model

TY99 IRS IRMF

583,642,950

Census

NUMIDENT

408,447,131

Edited

IRS IMF

253,825,653

Edited

HUD TRACS

1,991,655

Edited

SSS

14,538,895

Edited

Medicare

59,197,759

Edited

IRS IRMF

568,109,788

TY99 IRS IMF

124,729,862

Medicare

59,198,432

Edited

MTCS

6,208,615

HUD TRACS

1,991,672

Indian Health Service

2,730,407

Edited

IHS

2,728,548

NUMIDENT

721,228,119

Person Characteristics File (PCF)

408,447,131

HUD MTCS

6,232,562

Remove Deceased/Create Composite Record

265,950,850

Invalid

SSNs

10,235,180

Selective Service

13,370,053

?

administrative records experiment in 2000 arex 2000
Administrative Records Experiment in 2000 (AREX 2000)
  • Five selected sites in Maryland and Colorado
    • MD: Baltimore city, Baltimore county;
    • CO: El Paso county, Douglas county, Jefferson county
  • Attempt to simulate an Administrative Records Census
  • Not all aspects of an Administrative Records Census are simulated
    • Group Quarters survey
    • Coverage measurement survey
  • Special operations not included in StARS
    • Request for physical address (PO boxes/Rural Route’s)
    • Clerical hand geocoding
    • Field verification of addresses not matched to DMAF
arex 2000 evaluations
AREX 2000 Evaluations
  • Process: Analyzing selected components of the AREX implementation processing
  • Outcomes: Block level analysis: Age/Race/Sex/Hispanicity comparisons to Census 2000
  • Household level analysis:
    • Comparing household distributions for matched addresses
    • Assessing the feasibility of using administrative records in lieu of a field interview to obtain data on nonresponding households
  • Available at www.census.gov/pred/www/rpts.html#AREX
  • (Synthesis of results from the Administrative Records Experiment in 2000)
characteristics of files included in the stars system
Characteristics of Files Included in the StARS System
  • IRS Individual Master 1040 File:
    • Tax year data; April, 2000 refers to “tax year” 1999
    • TY ‘99 file arrives October, 2000
    • Business entities, estates, other institutions included
    • ~120 million return records/year; maximum of six person records per return
    • Households below the filing threshold do not need to file
    • Late filers systematically different than early filers
    • Tax Filing Unit  Housing Unit: 10-20% of addresses are PO Boxes, business addresses, tax preparers (Czajka, 2000)
    • TY95+: SSN’s of dependents requested, recorded
    • .5% of primary filer, 1.6% of secondary filer, 3.4% of dependents’ SSN’s in error (Czajka, 1987)
    • Age, race, sex, Hispanic origin microdata not available
characteristics of files included in the stars system cont
Characteristics of Files Included in the StARS System, cont.
  • IRS Information Returns Master File:
    • Tax year data; April, 2000 refers to “tax year” 1999
    • TY ‘99 file arrives October, 2000
    • Business entities, estates, other institutions included
    • ~700 million records/year
    • Recipient address  Housing Unit
    • 10-20% of addresses are PO Boxes, business addresses, tax preparers
    • Extremely limited microdata content: Age, race, sex, Hispanic origin microdata not available; name information often truncated
    • Possible source of information on undocumented persons
characteristics of files included in the stars system cont20
Characteristics of Files Included in the StARS System, cont.
  • Selective Service File:
    • Requested 4/1/99(00) file “cut date”
    • ~13 million records
    • Registration required in 1940, suspended in 1975, resumed in 1980
    • Presumably, males 18-25 are required to inform SSS when they move
    • Females, non-immigrant aliens, hospitalized, incarcerated, and institutionalized males, and members of the armed forces are exempt
    • Limited microdata content: Race, Hispanic origin microdata not available
    • Address information may not be current
characteristics of files included in the stars system cont21
Characteristics of Files Included in the StARS System, cont.
  • Medicare Enrollment Database (EDB):
    • Requested 4/1/99(00) file “cut date” -- current and historical Medicare enrollment (“Active” and “Inactive” cases)
    • ~ 40 million records at any one point in time
    • Recipient Address  Housing Unit
      • Proxy recipients listed on the file (e.g., John Doe’s benefits c/o Jane Doe; John Doe’s benefits c/o nursing home)
    • Used in population estimates system for 65+ household population estimates
    • A small portion of records at any point in time are almost certainly deceased (Kim and Sater, 2000)
    • Coverage is high (93-102%) but not perfect and unevenly distributed geographically
      • “Snowbird” states appear to have lower ratios of Medicare to 65+ population than “non-snowbird” states (Kim and Sater, 2000)
characteristics of files included in the stars system cont22
Characteristics of Files Included in the StARS System, cont.
  • Indian Health Service patient file:
    • Requested 4/1/99(00) file “cut date”
    • ~10 million patient/transaction records
    • Transaction record  person record
    • Unduplication
      • about 10 million patient records, 2 million unduplicated SSN’s
    • Many missing SSN’s (about 20%)
    • Integral part of our race model
characteristics of files included in the stars system cont23
Characteristics of Files Included in the StARS System, cont.
  • Housing and Urban Development Tenant Rental Assistance Certification System (HUD-TRACS/MTCS):
    • Requested 4/1/99(00) file “cut date”
    • HUD subsidy payments
    • TRACS 1999: ~ 3.3 million records
    • TRACS 2000: ~ 2 million records
    • Short form data for all members of household (Race/Hispanic only for head of household)
    • Address information may represent project or landlord address
characteristics of files included in the stars system cont24
Characteristics of Files Included in the StARS System, cont.
  • Census NUMIDENT File:
    • ~700 million transaction records  400 million individual SSN records
    • Post 1985: Enumeration at birth
    • For each SSN: Date of birth, gender, race, place of birth
  • About 50-60 million persons on the file are deceased but not identified as such
  • No current residence information on the file
  • Taxpayer ID Numbers (TINs) not on the file
  • Demographic properties:
    • About 35% of SSN’s on file have alternate names (marriage, divorce, etc.)
    • About 6% missing gender
    • Race coding has changed (prior to 1980, 3 races: White, Black, Other); 20% either “unknown” or “other”
    • About 25% of SSN’s have transactions with different race codes
creating final stars database
Creating Final StARS Database
  • Select best address and demographics based on
    • geocodability
    • currency
    • quality
  • Impute missing demographics (from NUMIDENT/PERSON CHARACTERISTICS FILE)
  • Flag records for deceased people
  • Final database is like the census
address processing results stars 1999
Address Processing Results (StARS 1999)
  • Almost 800 million addresses at start
  • About 6 percent identified as potential businesses
  • 136 million address records after unduplication
  • About 75 percent geocoded
    • 85 percent geocoding rate for city-style addresses
person processing results stars 1999
Person Processing Results (StARS 1999)
  • 875 million records at start
  • 845 million have valid SSN record (96.5%)
  • 280 million after unduplication by SSN
  • 261 million after removal of known deceased
  • 257 million after removal of known deceased and persons residing in outlying territories
  • StARS 2000: 266 million after removal of known deceased before April 1, 2000 and persons residing in outlying territories
additional operations of arex 2000
Additional Operations of AREX 2000
  • Clerical geocoding
  • Request for physical address (for P.O. Boxes, Etc.)
  • Match to Decennial Master Address File
  • Field address verification
major analytic issues with stars processing
Major Analytic Issues with StARS Processing

Ontologies

The way in which an administrative agency “defines” the world may not match the way the Census Bureau “defines” the world, e.g.,

A delivery address suitable for receiving a payment check may not suffice for putting individuals at a street address

Difficult to distinguish individual units within the Basic Street Address

Race coding: Hispanic Origin is a separate race on NUMIDENT

Transaction data  person data

How many names does a person have (and in what order)?

Proxies – IRS & Medicare records

JOHN WILSON The address is (presumably) for Mary Smith. John Wilson may or

C/O MARY SMITH may not live there.

1004 LAUREL LANE

ROCKMONT, MD 22345

major analytic issues with stars processing cont
Major Analytic Issues with StARS Processing, cont.

Addresses that are difficult to place on the ground

About 10 % of addresses are rural style

PO Boxes: 45% for IHS, 9.5% for Medicare, 7.5% for IRS 1040, 6.8% for SSS, 3.8% for IRS 1099, .4% for HUD-TRACS (Huang and Kim, 2000)

1995 IRS/CPS match: 86.5% of tax return cases had the same address as residence address, 94% coded to same county (Sater, 1995)

John Smith

H&R BLOCK

P.O. BOX 12

GREENWAY, MD 29752

Addresses with both business and residential components

Dean H. Judson

JUDSON OLD GROWTH LOGGING SERVICES

45850 BACKWOODS HIGHWAY

BOONDOCKS, OR 96432

major analytic issues with stars processing cont31
Major Analytic Issues with StARS Processing, cont.

Unduplication and matching

Addresses and personal characteristics are measured with substantial variation

Often not obvious whether a particular pair of records represent a duplicate or not.

Yet, with multiple files, unduplication decisions must be made.

Address matching:

101 Elm Rd, # 1 97132

101 Elm St, apt 1 97701

Versus

101 Elm Rd, #1 97132

101 Elm St, apt 1 97132

major analytic issues with stars processing cont32
Major Analytic Issues with StARS Processing, cont.

Variations in data from different sources

Of the 50% of SSN’s found on multiple files,

about 1% have more than one gender recorded

about 32% have multiple addresses

about 2% have multiple races (Huang and Kim, 2000)

“Imputation” from the NUMIDENT

Many files have limited microdata. For those that are found on the NUMIDENT, we can “impute” microdata from the approximately equivalent NUMIDENT fields.

Race Model (Bye, 1998,1999)

Gender Model (Thompson, 1999)

Mortality Model (Falkenstein, Resnick, and Judson, 2000)

StARS 2002 “NUMIDENT Race Enhancement”

Match NUMIDENT to Census 2000

Use Census 2000 race response to improve imputation model

major analytic issues with stars processing cont33
Major Analytic Issues with StARS Processing, cont.

Changing information states

Distinct problem from “point in time” data collection

Information states change over time/over databases

Address information ages over time and varies over databases

SAM SMITH SAM SMITH

BOX 2 RURAL ROUTE 37 486 MAIN STREET

WESTPORT, VA 32784 FAIRFIELD, VA 33412

(Dated 10/14/98 from Medicare) (From TY97 IRS file, filed sometime in 1998)

Mortality information ages over time and varies over databases

One database provides information about the other, provided that matching can be performed

Data processing requires complex, and substantively important, decision logic at each step

applications
Applications
  • SSN search and validation with GEOkey
    • Earlier: 90% found in validation step, 5% in search step
    • 2001 Evaluation: 92% found in search (with GEOkey) alone
    • Apparently, our computer search outperforms SSA manual system
  • CPS/NHIS/ACS to Census matching evaluations
    • Compare different race responses
    • Compare survey and Census coverage
    • Compare variations in Poverty estimates
  • Evaluation of synthetic estimation methods (Popoff, Judson and Fadali, 2001)
  • Multiple-system Estimation for coverage evaluation
    • Additional information to aid dual-system estimation (Asher and Feinberg, 2001)
    • Erroneous enumerations (Biemer, Brown, Wiesen, and Judson, 2001)
applications36
Applications
  • Nonresponse follow up (NRFU) substitution (’04 simulation test)
  • Imputation methods improvement (’04 simulation test)
  • Master Address File (MAF) targeting
  • Census unduplication confirmation
  • Population estimation (postcensal estimates)
  • Survey improvement (noninterview adjustments)
evaluations
Evaluations
  • Numident/PCF 1998 versus 1998 National estimates (Miller, Judson and Sater, 2000)
  • State level comparisons of StARS 2000 versus Census 2000
  • County StARS-synthetic methods versus county ratio estimates and Census 2000
  • Detailed comparison by (fully crossed) age, race, sex, and Hispanic origin counts versus Census 2000, at the county level
  • AREX tract, block, household evaluations on February 19th
slide42

County StARS-synthetic methods versus 1999 Estimates

versus Census 2000

% Hispanic (StARS 99 vs. 99 Estimates vs. Census 2000, selected

counties where StARS and Estimates deviate by more than 4

percentage points, counties in Colorado)

90

80

70

60

StARS 99

50

Census 2000

40

99 Estimates

30

20

10

Counties in

0

which StARS 99

Bent

is closer to

Otero

Kiowa

Pueblo

Chaffee

Morgan

Lincoln

Garfield

Costilla

Mineral

Phillips

Conejos

Crowley

Fremont

La Plata

Huerfano

Alamosa

San Juan

Archuleta

Saguache

Las Animas

Census 2000

are marked with

a star.

fully crossed age race sex and hispanic origin array arsh array
Fully crossed age, race, sex, and Hispanic Origin array(ARSH array)
  • For every county in the U.S., count the number of nondeceased persons by:
    • Single year of age (0,101+)
    • Race (four groups)
    • Sex (two groups)
    • Hispanic origin (Hispanic/non)
    • Potentially 102 x 4 x 2 x 2 = 1632 cells per county, 3141x1632 = 5,126,112 in the U.S.
  • Error Measures:
    • Simple difference (C-S)
    • Algebraic percent error (S-C)/C
slide44

Note: Each data

point is a single

county’s ARSH cell.

age sex distributions selected counties in texas
Age/Sex distributions, selected counties in Texas

Anderson County (N of Houston)

Andrews County (Far west, NM border)

Brazos County (W of Houston)

Atascosa County (Southern part of state)

concluding thoughts
Concluding Thoughts

Historians of science will say that there was an “explosion” of research into Administrative Records and Data Warehousing in the late 20th/early 21st century

Using these databases in a statistically-principled way requires a new statistical paradigm:

Not survey sampling per se

Not econometric modeling per se

Not coverage measurement per se

Something new

These databases have some similar, but many different data quality issues than usual survey or census data

We are attacking these issues with real Census applications

for further reading
For Further Reading

Alvey, W., and Scheuren, F. (1982). Background for an Administrative Records Census. Proceedings of the Social Statistics Section. Alexandria, VA: American Statistical Association.

Asher, J., and Feinberg, S. (2001). Statistical Variations on an Administrative Records Census. Proceedings of the Social Statistics Section. Alexandria, VA: American Statistical Association.

Biemer, P., Brown, G., Weisen, C., and Judson, D.H. (2001). Triple system estimation in the presence of erroneous enumerations. Proceedings of the Social Statistics Section. Alexandria, VA: American Statistical Association. Under review at the Journal of Official Statistics.

Bye, B. (1997). Administrative Record Census for 2010 Design Proposal, Final Report. Rockville, MD: Westat, Inc.

Bye, B. (1998). Race and ethnicity modeling with SSA Numident Data: Interim report: File development and tabulations. Unpublished document available from the U.S. Bureau of the Census.

Bryant, C. (1995). Comparing the LUCA address list to “local records.” Paper presented at the 1995 State Data Center Meeting, San Francisco, CA, April 4, 1995.

Czajka, J., Moreno, L., and Schirm, A.L. (1997). On the Feasibility of Using Internal Revenue Service Records to Count the U.S. Population. Washington, DC: Mathematica Policy Research, Inc.

Czajka, J. (1999). Can we count on administrative records in future U.S. Censuses? Presentation at the Bureau of the Census, December 15, 1999.

Falkenstein, Matthew, Resnick, Dean R., and Judson, Dean. H. (2000). The Mortality Module of the Statistical Administrative Records System. Administrative Records Memorandum Series, U.S. Census Bureau.

Farber, Jim, and Shaw, Kevin M. (2002). Dual System Estimates of Housing Units Based on Administrative Records. To appear in the 2002 Proceedings of the American Statistical Association, Government Statistics Section [CD-ROM], Alexandria, VA: American Statistical Association.

Heimovitz, Harley K (2002). Administrative Records Experiment 2000: Outcomes. To appear in the 2002 Proceedings of the American Statistical Association, Government Statistics Section [CD-ROM], Alexandria, VA: American Statistical Association.

Huang, E., and Kim, J. (2000). One Percent Sample Study Report (SRD-DRAFT). Unpublished document available from the U.S. Bureau of the Census, February 10, 2000.

for further reading49
For Further Reading

Judson, D.H., and Popoff, C.L. (2000). Research Use of Administrative Records. University of Nevada: Nevada State Demographer’s Office.

Judson, D. H. (2000). The Statistical Administrative Records System: System Design, Successes, and Challenges. Paper presented at the 2000 Data Quality Workshop, Morristown, NJ, Nov 30-Dec 1.

Judson, D.H., Popoff, Carole L., and Batutis, Michael (2001). An Evaluation of the Accuracy of U.S. Census Bureau County Population Estimation Methods. Statistics in Transition, 5:185-215.

Judson, D.H. (2001). A Partial Order Approach to Record Linkage. Paper presented at the Federal Committee on Statistical Methodology, Washington, DC, November 14, 2001.

Judson, D.H. (2002). Adventures in Bayesian Record Linkage. Paper presented at the Classification Society of North America, June 11, 2002.

Judson, Dean H. (2002). Merging Administrative Records Databases in the Absence of a Register: Data Quality Concerns and Outcomes of an Experiment in Administrative Records Use. Paper presented at the UNECE-EUROSTAT work session on registers and administrative records in social and demographic statistics, Geneva, Switzerland, 9-11 December 2002).

Kim, M. O., and Sater, D. (2000). Defining the Medicare Data Universe for the U.S. Census Bureau's Population Estimates Program. Paper presented at the Southern Demographic Association meetings, New Orleans, LA, August 29, 2000.

Leggieri, Charlene, and Prevost, Ron (1999). Expansion Of Administrative Records Uses At The Census Bureau: A Long-Range Research Plan. Paper presented at the November 1999 Meeting of the Federal Committee on Statistical Methodology, Washington D.C.

Miller, E., Judson, D.H., and Sater, D. (2000). The 100% Census NUMIDENT: Demographic Analysis of Modeled Race and Hispanic Origin Estimates Based Exclusively on Administrative Records Data, Paper presented at the Southern Demographic Association meetings, New Orleans, LA, August 29, 2000.

Popoff, C.L., Judson, D.H., and Fadali, Betsy (2001). Measuring the Number of People Without Health Insurance: A Test of a Synthetic Estimates Approach for Small Area Estimates using SIPP Microdata. Paper presented at the Federal Committee on Statistical Methodology, Washington, DC, November 14, 2001.

for further reading50
For Further Reading

Sailer, P., Weber, M., and Yau, E. (1993). How Well Can IRS Count the Population? 1993 Proceedings of the Survey Research Methods Section. Alexandria, VA: American Statistical Association.

Sater, D. (1995). Differences in Location of Households and Tax Filing Units. Paper presented at the 1995 meeting of the Population Association of America, San Francisco, CA, April 6, 1995.

Stuart, E. and Zaslavsky, A.M. (2002). Using administrative records to predict census day residency. In Constantine Gatsonis, Robert E. Kass, Alicia Carriquiry, Andrew Gelman, David Higdon, Donna K. Pauler, Isabella Verdinelli (Eds.), Case Studies in Bayesian Statistics Volume VI. New York, NY: Springer.

Thompson, Herbert (1999). The Development of a Gender Model with SSA Numident Data. Administrative Records Research Memorandum Series #32, U.S. Census Bureau.

Wand, Y., and Wang, R. Y. (1996). Anchoring data quality dimensions in ontological foundations. Communications of the ACM, 39: 86-95.

Zanutto, Elaine, and Zaslavsky, Alan M. (2001). Using Administrative Records to Impute for Nonresponse. In R. Groves, R.J.A. Little, and J.Eltinge (Eds), Survey Nonresponse. New York: John Wiley.

glossary of terms
Glossary of Terms

Administrative records: Data collected wherein the primary purpose is to administer a regulation or record a transaction rather than data collection per se.

Administrative Records Census: A Census of Population and Housing in which a predominant component of the census-taking is performed by using administrative records databases. In practice, field operations (for example, for coverage measurement or for Group Quarters enumeration) often coincide.

AREX2000: Administrative Records Experiment in 2000, an experimental attempt to simulate an “Administrative Records Census” in two sites in the U.S.

Basic Street Address: The primary street number and street name, omitting apartment numbers or other within-structure identifiers.

CPS: Current Population Survey, an ongoing survey administered by the U.S. Census Bureau.

Data Quality: The ability to construct a mapping from the ontological representation of a data item in a database to its appropriate ontological representation in the “real world.”

Master Address File (MAF): A file of addresses maintained by the U.S. Census Bureau for the purpose of taking its decennial census, and acting as a frame for ongoing sample surveys. The Decennial Master Address File is referred to as the “DMAF.”

Master Housing File: A file of addresses developed by the Statistical Administrative Records System.

Microdata: Data on individual person or housing characteristics, i.e., race, sex, age, street address, zip code.

Ontology: The study of “what is”, that is, the categories by which we understand the world.

StARS:Statistical Administrative Records System, an experimental database that combines information from several major Federal databases into one database that can be used for census-taking purposes.