1 / 17

DATA QUALITY, COMPOSITE INDICATORS AND AGGREGATION

UPA Package 4, Module 2. DATA QUALITY, COMPOSITE INDICATORS AND AGGREGATION. Data Quality, Composite Indicators and Aggregation. Data errors Less is more; benefit-cost of data Data cleaning Composite indicators Introduction exercise 4.2.3 Exploring Data sets

yered
Download Presentation

DATA QUALITY, COMPOSITE INDICATORS AND AGGREGATION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UPA Package 4, Module 2 DATA QUALITY, COMPOSITE INDICATORS AND AGGREGATION

  2. Data Quality, Composite Indicators and Aggregation • Data errors • Less is more; benefit-cost of data • Data cleaning • Composite indicators • Introduction exercise 4.2.3 Exploring Data sets • Introduction exercise 4.2.4 Aggregation

  3. Data Errors • Biased data • Outliers, error or extreme value • Sample too small • Too much precision or regularity (too good to be true) • Missing values • Inconsistencies

  4. Quality (full coverage and maintenance) • Quantity (many variables but missing values and outdated) Physical Characteristics of a building Ownership Characteristics of a building Less is more; Benefit and Cost of Data

  5. Benefit and Cost of Data • Data Benefit and Costs Strategy and clear objectives of developing databases Data (and functionalities) requirement study Data benefit, the value of information and quantification • costs reduction, effectiveness/priorities of (public) resource allocation • transparency, awareness, involvement Data costs high (acquisition, editing, conversion, updates, maintenance)

  6. Benefit and Cost of Data Primary and secondary data, data sharing • Primary, ad-hoc, single use of data, (too) expensive • Secondary matching with requirement for poverty studies • Combination of existing data and samples • Data collection embedded into institutional settings, from data projects to data processes

  7. Composite Indicators • Poverty without reliable income data • Slums • Composite Indicator Human Development Indicator, Poverty Index • Proxy indicators (consumption / income)

  8. Composite Indicators

  9. Aggregation • Aggregate cases into a single summary case • Break variable defines a group and create one case e.g. neighborhood • Aggregate functions Summary, fractions

  10. Small Area Statistics • Limited (existing) data, limited funds for data collection • Sample survey and auxiliary data sets (+ analytical skills) = small area statistics • Developing a model to identify the relationship between the survey and the auxiliary data more reliable estimates can be made and the possibilities to extrapolate to areas not covered by a household survey

  11. Introduction Exercise 4.2.3 Exploring Datasets Classifying interval data (number of foreigners, income, family size) into meaningful groups (e.g. low income, medium income, high income). Create cross tables and analyze relationships between these ordinal data sets.

  12. Symmetric Measures Asymp. a b Value Std. Error Approx. T Approx. Sig. c Interval by Interval Pearson's R .309 .044 7.517 .000 c Ordinal by Ordinal Spearman Correlation .288 .042 6.956 .000 N of Valid Cases 538 a. Not assuming the null hypothesis. b. Using the asymptotic standard error assuming the null hypothesis. c. Based on normal approximation. Introduction Exercise 4.3.2 Count incomecl Total Low Medium High Very High housecl Low 37 50 25 0 112 Medium 51 177 182 0 410 High 1 1 8 6 16 Total 89 228 215 6 538 Cross table (mean income x mean house value) Municipalities in the Netherlands

  13. Introduction Exercise 4.2.3 Aggregation Central Bureau of Statistics of The Netherlands three main spatial units: Municipality (n=538) Districts (n=2382) Neighbourhoods (n=10737) Aggregation, summarizing data, why and what Spatially homogenous versus heterogeneous variables Which statistics to use (mean or other statistical figures) Simple and weighted aggregates

  14. Introduction Exercise 4.3.2

  15. Introduction Exercise 4.3.2 4.2.3 Data Quality, Composite Indicators and Aggregation 14

  16. Introduction Exercise 4.2.3

  17. Introduction Exercise 4.2.3

More Related