200 likes | 282 Views
Explore multi-year research project investigating underreporting in consumer expenditures data, aiming to identify patterns and characteristics contributing to the underreporting. Phases include macro and micro level analysis to estimate underreporting and inform new procedures.
E N D
Estimating the Level of Underreporting of Expenditures among Expenditure Reporters:A Further Micro-Level Latent Class Analysis DISCLAIMER: The views expressed in this presentation are those of the authors and do not necessarily represent the views of the Bureau of Labor Statistics or the Department of Labor.
Outline • Background • Research Goals • Past Approach • Past Results • Current Approach • Results • Future Research
U.S. Consumer Expenditure (CE) Interview Survey • ~ 6,000 households/year • Interviewed every 3 months about prior 3 months expenditures • 5 consecutive interviews for each household • 6 years of CE data: 1996 – 2002
Research Goals This multi-year project has three goals: • Identify patterns of underreporting of expenditures in different commodities • Identify the characteristics of respondents contributing most to the underreports • Use the knowledge gained to design new procedures for overcoming underreporting
Phases of Research • Phase 1 (2003) • Markov LCA on macro level data • Non-reporters only • Phase 2 (2004) • (Ordered) LCA on micro level data for total consumer expenditures • Reporters only • Phase 2 (current) • (Ordered) LCA on micro level data for separate commodity expenditures • Examination of possible casual linkages to respondent characteristics • Reporters only • Phase 3 (future) • Combine the macro and micro analyses • All sample • Produce overall estimates of underreporting by category and respondent characteristics
Phase 1: Approach • Used 4 consecutive CE interviews “Since the 1st of (month, 3 months ago), have you (or any members of your household) had any expenses for __________?” • Used 1st and 2nd order Markov LCA to fit models to dichotomous response to screening question • Explored effect on underreporting of: • family size, income, age, family type, gender, education, record use, interview length
Phase 1: Design • Obtained estimates of false negative probability i.e. P(no purchase reported | made a purchase) • Produced estimates for each commodity of: • True proportion of “purchasers” • Accuracy rate i.e. P(report a purchase | truly made a purchase) • Used these estimates to examine relationships between demographic variables and probability of accurate reporting
Phase 1: Conclusions • Model fit was adequate for all commodities • Levels of underreporting vary by commodity • Variables were found to be positively related to accurate reporting included: • Education • Family Size • Income • Use of Records • Length of Interview • The effect of age was highly variable
Phase 2 in 2004 • Differences between Phase 1 and Phase 2: • Used only Interview 2 data, not Markov LCA • Micro level analysis • Reporters only • Latent variable represents level of underreporting, as opposed to purchasing status as in Phase 1
Approach • Analysis Plan • Ran both ordered latent class models and unordered. • Order was determined based on theoretical relationship between values of indicators and level of underreporting. • Ran all combinations of indicators in groups of 3 • Using only reporters • Using only 2nd interview data
Application of Model • For the final model: • Each combination of indicator was assigned to a latent class • The probability of being in that class given the value of the indicators was used to assign classes • Each respondent was assigned to a latent class given the value of their indicator variables • Expenditure means were found for each latent class.
Summary of Findings in 2004 • Levels of underreporting were found to vary by interview level characteristics including: • Number of contacts • Missing income data • Type and frequency of records used • Length of interview • Total expenditure means for respondents assigned to each latent class confirmed this
Current Phase 2 • Using same general methodology as 2004 • Refine indicators • Apply methodology to separate commodity categories • Identify best model for each commodity and assign respondents to latent classes • Examine the pattern of mean expenditures for each latent class to confirm results • Run demographic analysis to identify characteristics of members of each latent class
Indicators • Interview level indicators considered: • Number of contacts • Ratio of respondents/household members • Missing income data • Type and frequency of records used • Length of interview • Ratio of expenditures in last month to quarter • Combination of type of record and interview length
Indicator Coding • #contacts (1=0-2; 2=3-5; 3=6+) • Resp/hh size (1= <.5; 2= .5+) • Income missing (1=present; 2=missing) • Records use (1=never; 2=single type or sometimes; 3=multiple types and always) • Interview length (1= <45; 2=45-90; 3= 90+) • Month3 expn/all (1= <.25; 2= .25-.5; 3= +.5) • Combined records and length (1= poor; 2= fair; 3=good)
Demographic Coding • CU size (1=1; 2=2; 3=3+) • Age (1= 30<; 2= 30-49; 3=50+) • Education (1=< H.S.; 2= H.S.+) • Income rank (1= <=.25; 2=.25-.75 and missing; 3=+.75) • Race (1= White; 2= Other) • Tenure (1= renter; 2= owner) • Urban (1= urban; 2= rural)
Future Research • Other categories and total expenditures • Add a Markov component • Combine the macro and micro analyses (underreporting for both reporters and nonreporters) • Produce overall estimates of underreporting by category and respondent characteristics