creating a collection of standardized datasets on household consumption n.
Skip this Video
Loading SlideShow in 5 Seconds..
Creating a collection of standardized datasets on household consumption PowerPoint Presentation
Download Presentation
Creating a collection of standardized datasets on household consumption

Loading in 2 Seconds...

play fullscreen
1 / 15

Creating a collection of standardized datasets on household consumption - PowerPoint PPT Presentation

  • Uploaded on

Creating a collection of standardized datasets on household consumption. Olivier Dupriez World Bank, Development Data Group 6 June 2013. Initial objective. Calculate poverty PPPs

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Creating a collection of standardized datasets on household consumption' - deanne

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
creating a collection of standardized datasets on household consumption

Creating a collection of standardized datasets on household consumption

Olivier Dupriez

World Bank, Development Data Group

6 June 2013

initial objective
Initial objective
  • Calculate poverty PPPs
  • Had price data at basic heading level from the ICP ; needed consumption shares “at the poverty line” for the same breakdown to be used as weights.
  • See: A. Deaton and O. Dupriez,Purchasing power parity exchange rates for the global poor, American Economic Journal: Applied, vol. 3, pp. 137-166 (2011), and also Global Poverty and Global Price Indexes
intermediary output data files
Intermediary output – data files
  • A collection of “standard” files
    • Individual level: age, sex
    • Household level: region, total expenditure (before and after fixing outliers), adult equivalents, hhld size, etc
    • Household + product level:
      • Product code (original as in questionnaire, with labels) and COICOP code
      • Value purchased, home produced, received, total
      • Deflated (when available) / non deflated
      • NO information on quantities
    • Format/structure of the data files is standard; content not so much
multiple uses and users
Multiple uses and users
  • Many potential applications
    • IFC “Business Opportunities at the Base of the Pyramid”
    • Micro-macro modeling
    • Poverty/inequality analysis
    • Assessment of reliability and relevance of surveys
      • E.g., list all items related to health with percentage of respondents, for each survey
      • E,g, list all categories not covered by questionnaires
    • And many more
  • Use household consumption/expenditure surveys
    • A VERY divers set of surveys (HBS, LSMS, HIES, etc)
    • Ex-post harmonization has limits
  • Map all products and services to COICOP
    • From 6000+ items in Brazil survey to less than 50 in other countries…
  • Annualize values by product/service and hhld
  • Fix outliers
  • No attempt to fill gaps (no imputation of values for missing products/services)
  • Generate the 3 standard files
principle full replicability
Principle – Full replicability
  • One single Stata program per survey
    • Calls one “generic” program to detect and fix outliers
  • Controlled vocabulary for file names, folder names
  • Survey ID to link to on-line metadata catalog
mapping to coicop
Mapping to COICOP
  • ICP/COICOP: 110 basic headings for household consumption
  • 105 are relevant for household surveys
  • Situations:
    • Many to one (e.g., long list of vegetables)
    • One to one
    • One to many (lack of detail in questionnaire)
    • No data to one (questionnaire missed items)
grouped categories
Grouped categories
  • One to many: items in questionnaires are not always detailed enough to be mapped to one single COICOP basic heading
missing categories
Missing categories
  • No questionnaire found to cover all 105 categories of products and services
  • On average, N basic headings missing
    • Sometimes for know reasons (e.g., pork in muslim countries)
    • But questionnaire design needs improvement in all countries
splitting grouped categories
Splitting grouped categories
  • Used breakdown from national accounts to split grouped categories (data obtained from ICP)
correlation between sna and surveys
Correlation between SNA and surveys
  • From almost perfect (very few cases) to very low (many countries)
annualization challenges
Annualization challenges
  • Some problematic items:
    • Durables (use value/expenditure)
    • Imputed rents
    • Out of pocket health expenditure
    • Ceremonies, etc.
    • Food away from home
  • Validation: compare with official estimates when available, and with PovCal aggregates
    • Never replicate exactly
detecting and fixing outliers
Detecting and fixing outliers
  • Top outliers only
  • Tried multiple options
  • Based on per capita or per household depending on item
  • 75th percentile + 5 times interquartile range
  • Replace with maximum valid value (zero values not included in calculations)
  • If outlier for multiple items, consider “rich” household and do not fix
  • Would deserve a specific research project
outliers fixing significant impact
Outliers fixing – Significant impact
  • Example: change in Ginis
past and future
Past and future
  • 160 datasets “standardized” – 90+ low and middle-income countries
  • Many more survey datasets available at WB; could expand and update the collection if resources are available
  • Conduct in-depth research work on outliers and formulate recommendations to countries
  • Feedback to countries on issues in questionnaire design
  • Dissemination of microdata?