Tools for Civil Society to Understand and Use Development Data: Improving MDG Policymaking and Monit...
Download
1 / 44

Module 8: Living with Error - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Tools for Civil Society to Understand and Use Development Data: Improving MDG Policymaking and Monitoring. Module 8: Living with Error. What you will learn from this module. What causes error in MDG indicators (MDGi’s) The 3 types of error in MDGi’s, and how they differ.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Module 8: Living with Error' - abedi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Tools for Civil Society to Understand and Use Development Data: Improving MDG Policymaking and Monitoring

Module 8: Living with Error


What you will learn from this module
What you will learn from this module Data: Improving MDG Policymaking and Monitoring

  • What causes error in MDG indicators (MDGi’s)

  • The 3 types of error in MDGi’s, and how they differ


From where does error derive
From where does error derive? Data: Improving MDG Policymaking and Monitoring

  • MDG indicators are derived from data

  • Data represent the population from which they were collected

  • Any shortfall in the data collection and handling system will, thus, cause error in the MDGi’s


Types of error

Types of Error Data: Improving MDG Policymaking and Monitoring

We can identify three types of error in MDG indicators (and other summary statistics):

Computation error

Bias error

Sampling error


Computation error
Computation Error Data: Improving MDG Policymaking and Monitoring

  • Errors made in the calculation of the MDG indicators, or its components

  • Purely due to avoidable mistakes

  • Less likely when calculation is automated


Bias error
Bias Error Data: Improving MDG Policymaking and Monitoring

Bias error is a systematic error that causes all measured values to deviate from the true value in a consistent direction, higher or lower

  • Arises when the characteristics of the population from which the sampling frame is drawn differ from the characteristics of the target population

  • Almost always a big issue when administrative data are used in deriving the MDGi in developing countries

  • Also are often an issue when survey data are used


Bias error 2
Bias Error ( Data: Improving MDG Policymaking and Monitoring2)

Sample Means

1. Bias (male) x x x x x

2. Bias (female) x x x x x

3. No bias x x x x x

Population valuex

Measurement scale


Sampling error
Sampling Error Data: Improving MDG Policymaking and Monitoring

  • May be thought of as “the difference between a sample and the population from which it was derived”

  • Always present when sample survey data are used to derive the MDGi

  • Not an issue with administrative data (unless these are only collected from a sample)

  • Not an issue with a census


Sampling error 2
Sampling Error Data: Improving MDG Policymaking and Monitoring (2)

Sample mean (male) X

Population value:X

Sampling error

Measurement scale


Cumulative effect of bias and sampling error
Cumulative effect of bias and Data: Improving MDG Policymaking and Monitoringsampling error

Sample meanx

Population value:X

Bias error

Sampling error

Measurement scale


Sampling error1

SAMPLING ERROR Data: Improving MDG Policymaking and Monitoring


Dozenland an example of sampling error
Dozenland: An Example of Data: Improving MDG Policymaking and MonitoringSampling Error

Dozenland is the world’s smallest country

  • It has only 12 households, each of which is composed by a single person


The problem
The Problem Data: Improving MDG Policymaking and Monitoring

Estimate the average income (in Dozenland dollars) per person

How shall we do this?

Using a census (true value)

Using a household sample of size 4

Using all possible household samples of any size


Census data
Census Data Data: Improving MDG Policymaking and Monitoring


Sample of 4
Sample of 4 Data: Improving MDG Policymaking and Monitoring

  • Dozenland government has insufficient funds to carry out a census, so instead it decides to sample four of the twelve households

  • At random, it samples the households headed by WJK, MM, DC, DJ

  • Thus sample results are 4200, 4700, 4500, 7000 Dozenland dollars (D$)

  • Sample average is: (4200+4700+4500+7000)/4 = 5100 D$


Real error
Real Error Data: Improving MDG Policymaking and Monitoring

Since we know the true answers from the hypothetical census, we can see the exact error in our sample-based estimate

The error in the estimate of the mean is

5100 - 5466.7 = -366.7 Dozenland dollars (D$)

i.e. we have underestimated average income by about 7%


Interpretation
Interpretation Data: Improving MDG Policymaking and Monitoring

  • This is NOT bias error, since the sample was random

  • It is purely a result of the sample being different from the population


Can we do better
Can We do Better? Data: Improving MDG Policymaking and Monitoring

1. Use samples of different sizes (The easiest way to do so is to use a larger sample, making the sample more similar to the population from which it is drawn)

2. Rely on statistical theory, which tells us how to estimate the sampling error


Summary results from taking Data: Improving MDG Policymaking and Monitoringall possible samples

ALL possible samples of size n (ranging from 1 to 12) from the 12 households

n S Mean Variation

1 12 5466.7 1327.5

2 66 5466.7 895.0

3 220 5466.7 693.3

4 495 5466.7 566.0

5 792 5466.7 473.6

6 924 5466.7 400.3

7 792 5466.7 338.3

8 495 5466.7 283.0

9 220 5466.7 231.1

10 66 5466.7 179.0

11 12 5466.7 120.7

12 1 5466.7

n = sample size; S = number samples of size n


What can we conclude
What can we conclude? Data: Improving MDG Policymaking and Monitoring

  • If you take all possible sample sizes available, the mean of the means will always be the same and will be equal to the true population mean

  • The variation from sample-to-sample decreases as the sample size (n) gets bigger

  • That is, there is less uncertainty in the estimate as the sample size increases


Here s a big problem
Here’s a Big Problem Data: Improving MDG Policymaking and Monitoring

  • In real life we will only take ONE sample

  • Thus we cannot see how values vary from sample-to-sample for any given sample size, n

  • That is, we cannot measure the mean, or the variation, over all samples


Here s a solution
Here’s a Solution Data: Improving MDG Policymaking and Monitoring

  • We can estimate the sample-to-sample variation (“standard error”) from the single sample

  • This helps us to understand how our sample mean may differ from the true population mean

    Let us consider the sample of four households

    The values in the sample are: 4200, 4700, 4500, and 7000. This yields:

    • Mean = 5100

    • Standard Error = 524

    • 95% confidence interval = 5100 ± 1666 = [3434 to 6766]


Common sampling schemes
Common Sampling Schemes Data: Improving MDG Policymaking and Monitoring

  • Simple random sampling

  • Stratified sampling – sample independently within important groups (“strata”) of the population

    • Generally decreases sampling error at minimal extra cost

  • Cluster or multi-stage sampling – sample (or sub-sample within) entire groups (“clusters”) of the population

    • Generally increases sampling error, but saves money and time


Statistical theory to practice
Statistical Theory to Practice Data: Improving MDG Policymaking and Monitoring

  • Statistics textbooks tell us how to deal with

    • complex survey designs

    • proportions, ratios and other summaries of data

    • CIs with any degree of % confidence

  • Although the theory differs, the principles, practice and interpretation follow exactly as for the simple case we have considered


Bias error1

BIAS ERROR Data: Improving MDG Policymaking and Monitoring


Missing the target population
Missing the Target Population Data: Improving MDG Policymaking and Monitoring

In many cases, bias arises because we obtain data from a population that is not the one we really should be using, called the target population

Example: vital registration

Target population: all deaths

Population used: urban areas


Does bias error matter
Does Bias Error Matter? Data: Improving MDG Policymaking and Monitoring

Whether or not bias error occurs depends upon the difference between

  • the characteristics of persons included in the population used for data collection, and the

  • characteristics of the persons not included

Example: are infant deaths more common in rural than in urban areas?


Common sources of bias
Common Sources of Bias Data: Improving MDG Policymaking and Monitoring

  • Deliberate selection

  • Errors in defining the population

  • Non-response and Human fallacy

Note:that there is some overlap between these groupings


Deliberate selection
Deliberate Selection Data: Improving MDG Policymaking and Monitoring

This is where some members of the target population have a greater chance of selection into the sample than do others

Example:household surveys of income

  • An enumerator may not bother to visit isolated households, which are hard to access

  • Such households are more likely to be self-dependent, with low income

  • Result is upward bias in average income


Errors in defining the population
Errors in Defining the Population Data: Improving MDG Policymaking and Monitoring

This is where the population has been incorrectly specified

  • We get data for a population either from administrative systems or sample surveys

  • Incomplete administrative records (rating lists, taxpayers' lists, land registers company registers, the voting register or street maps) or weak sampling frames from which sample is drawn can cause bias

  • In sample surveys the error may arise because the sampling frame being used is inadequate

    Classic example:use of a telephone to question potential respondents


Missing groups
Missing Groups Data: Improving MDG Policymaking and Monitoring

Sampling frames or administrative systems might be inadequate in that clusters of the population are missing and therefore could not be sampled.

Examples:

  • Sampling frame: list of households omit people in institutions such as orphanages

  • Administrative systems: Business register may omit most or all rural businesses


Omission and superfluous units
Omission and Superfluous Units Data: Improving MDG Policymaking and Monitoring

On the other hand the frame might cover all broad sectors but may have some units omitted or some “foreign elements”. For example:

  • Survey: A list of households used as a sampling frame may omit persons who have recently moved to the area/or mover away

  • Administrative systems: A business frame might omit the new businesses started up in the last year because they have not yet been listed or business register might include businesses that have recently closed.


Duplicated units
Duplicated Units Data: Improving MDG Policymaking and Monitoring

Some units in the population might appear twice or more.

Examples:

Administrative data: A business that moves to a new location may be included in register in both locations


Advantages or disadvantages to listing
Advantages or disadvantages to listing Data: Improving MDG Policymaking and Monitoring

The quality of administrative records can depend in part on the incentives of registration

  • If subsidies are offered to registrants, then there may be an incentive to register fraudulently

  • If registrants are taxed, then they may attempt to avoid registration.

Example: Casley and Lury (1981) give an example of a Caribbean finance department who offered fertilizer subsidies for every registered piece of land on an island

They later found that they were paying subsidies for an area greater than the entire island!


Non response and human fallacy non response
Non-Response and Human Fallacy Data: Improving MDG Policymaking and MonitoringNon-Response

May be classified into three types:

  • Those unable to respond

  • Absentees

  • Refusals


Non response and human fallacy human fallacy
Non-Response and Human Fallacy Data: Improving MDG Policymaking and MonitoringHuman Fallacy

  • Influenced responses occur when respondents are encouraged to answer in a certain way

Example 1: farmers might inflate their land holdings, by always rounding figures upwards, because they believe that the survey results will be used to allocate state aid, or….

Example 2: the farmers might deflate, by rounding down, in the hope of minimize taxation


Leading questions and prestige error
Leading Questions and Prestige Error Data: Improving MDG Policymaking and Monitoring

Sometimes response bias is caused through leading questions such as, 'Do you agree that meat eating is barbaric?'

Most people like to please and/or will take the easy option of agreeing in the hope of avoiding further questions!

Many people do not want to appear uninformed.

On occasions the very appearance of the enumerator can cause bias


Total error

TOTAL ERROR Data: Improving MDG Policymaking and Monitoring


Total error1
Total Error Data: Improving MDG Policymaking and Monitoring

We have seen that sampling error will decrease as the sample size increases

Unfortunately the reverse is generally true about bias error: it tends to increase as sample size increases


Root mean square error

RMSE Data: Improving MDG Policymaking and Monitoring

Bias

Sampling error

Root Mean Square Error

The total error, sampling and bias combined, is measured by the rootmean square error, (RMSE)

This is defined as


How should we treat error
How Should We Treat Error? Data: Improving MDG Policymaking and Monitoring

  • Quantify it, if we can

    • generally only possible for sampling error

  • Acknowledge it, when this does not cause confusion or lead to lack of trust

  • Record it through use of metadata

  • Treat small differences in MDGi’s with scepticism

    • differences may be due to error


How can we minimize error
How Can We Minimize Error? Data: Improving MDG Policymaking and Monitoring

  • Use a larger sample size

  • Use a better sample design (e.g. stratified)

  • Be more careful in survey administration (e.g. minimize non-response)

  • Increase coverage of administrative data

  • Use statistical models to average over time periods/countries etc. (e.g. FAO method for hunger indicators in MDG1)


Summary
Summary Data: Improving MDG Policymaking and Monitoring

There are 3 types of error that may have affected an MDG indicators:

  • Computation error may be avoided by careful arithmetic or appropriate use of software

  • Sampling error is unavoidable whenever sample survey data are used

  • Bias error is often present, not always obvious, but can sometimes be minimised by taking care in the data collection process


Practical 8
Practical 8 Data: Improving MDG Policymaking and Monitoring

  • List three ways by which bias error may arise

  • List two methods which can be used to reduce sampling error


ad