- By
**shika** - Follow User

- 81 Views
- Updated on

Download Presentation
## Inequality: Empirical Issues

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Inequality: Empirical Issues

Inequality and Poverty Measurement

Technical University of Lisbon

Frank Cowell

http://darp.lse.ac.uk/lisbon2006

July 2006

Motivation

- Interested in sensitivity to extreme values for a number of reasons
- Welfare properties of income distribution
- Robustness in estimation
- Intrinsic interest in the very rich, the very poor.

Sensitivity?

- How to define a “sensitive” inequality measure?
- Ad hoc discussion of individual measures
- empirical performance on actual data (Braulke 83).
- not satisfactory for characterising general properties
- Welfare-theoretical approaches
- focuses on transfer sensitivity (Shorrocks-Foster 1987)
- But does not provide a guide to the way measures may respond to extreme values.
- Need a general and empirically applicable tool.

The Influence Function

- Mixture distribution:

- Influence function:

- For the class of inequality measures:

- which yields:

Some Standard Measures

- GE:
- Theil:
- MLD:
- Atkinson:
- Log var:

…and their IFs

- GE:
- Theil:
- MLD:
- Atkinson:
- Log var:

Implications

- Generalised Entropy measures with > 1 are very sensitive to high incomes in the data.
- GE ( < 0) are very sensitive to low incomes
- We can’t compare the speed of increase of the IF for different values of 0 < < 1
- If we don’t know the income distribution, we can’t compare the IFs of different class of measures.
- So, let’s take a standard model…

Using S-M to get the IFs

Good model of income distribution of German households

- Use these to get true values of inequality measures.
- Obtained from the moments:

- Take parameter values a=100, b=2.8, c=1.7

- Normalise the IFs
- Use relative influence function

IF using S-M: conclusions

- When z increases, IF increases faster with high values of a.
- When z tends to 0, IF increases faster with small values of a.
- IF of Gini index increases slower than others but is larger for moderate values of z.
- Comparison of the Gini index with GE or Log Variance does not lead to clear conclusions.

A simulation approach

- Use a simulation study to evaluate the impact of a contamination in extreme observations.
- Simulate 100 samples of 200 observations from S-M distribution.
- Contaminate just one randomly chosen observation by multiplying it by 10.
- Contaminate just one randomly chosen observation by dividing it by 10.
- Compute the quantity

Contaminated Distribution

Empirical Distribution

Contamination in high values

RC(I)

100 different samples sorted such that Gini realisations are increasing.

- Gini is less affected by contamination than GE.
- Impact on Log Var and GE (0<a1 is relatively small compared to GE (a<0) or GE (a>1)
- GE (0a1) is less sensitive ifais smaller
- Log Var is slightly more sensitive than Gini

Contamination in low values

RC(I)

100 different samples sorted such that Gini realisations are increasing.

- Gini is less affected by contamination than GE.
- Impact on Log Var and GE (0<a1 is relatively small compared to GE (a<0) or GE (a>1)
- GE (0a1) is less sensitive ifais larger
- Log Var is more sensitive than Gini

Influential Observations

- Drop the ithobservation from the sample
- Call the resulting inequality estimate Î(i)
- Compare I(F) with Î(i)
- Use the statistic

- Take sorted sample of 5000
- Examine 10 from bottom, middle and top

Influential observations: summary

- Observations in the middle of the sorted sample don’t affect estimates compared to smallest or highest observations.
- Highest values are more influential than smallest values.
- Highest value is very influential for GE (a = 2)
- Its estimate should be modified by nearly 0.018 if we remove it.
- GE (a = –1) strongly influenced by the smallest observation.

Extreme values

- An extreme value is not necessarily an error or some sort of contamination
- Could be an observation belonging to the true distribution
- Could convey important information.
- Observation is extreme in the sense that its influence on the inequality measure estimate is important.
- Call this a high-leverage observation.

High-leverage observations

- The term leaves open the question of whether such observations “belong” to the distribution
- But they can have important consequences on the statistical performance of the measure.
- Can use this performance to characterise the properties of inequality measures under certain conditions.
- Focus on the Error in Rejection Probability as a criterion.

Davidson-Flachaire (1)

- Even in very large samples the ERP of an asymptotic or bootstrap test based on the Theil index, can be significant
- Tests are therefore not reliable.
- Three main possible causes :
- Nonlinearity
- Noise
- Nature of the tails.

Davidson-Flachaire (2)

- Three main possible causes :
- Indices are nonlinear functions of sample moments. Induces biases and nonnormality in estimates.
- Estimates of the covariances of the sample moments used to construct indices are often noisy.
- Indices often sensitive to the exact nature of the tails. A bootstrap sample with nothing resampled from the tail can have properties different from those of the population.
- Simulation experiments show that case 3 is often quantitatively the most important.
- Statistical performance should be better with MLD and GE (0 < a < 1 ), than with Theil.

Empirical methods

Empirical Distribution

- The empirical distribution

Indicator function

- Empirical moments

- Inequality estimate

Bootstrap

- To construct bootstrap test, resample from the original data.
- Bootstrap inference should be superior
- For bootstrap sample j, j = 1,…,B, a bootstrap statistic W*j is computed almost as W from the original data
- But I0 in the numerator is replaced by the index Î estimated from the original data.
- Then the bootstrap P-value is

Error in Rejection Probability: A

- ERPs of asymptotic tests at the nominal level 0.05
- Difference between the actual and nominal probabilities of rejection
- Example:
- N = 2 000 observations
- ERP of GE (a =2) is 0.11
- Asymptotic test over-rejects the null hypothesis
- The actual level is 16%, when the nominal level is 5%.

Error in Rejection Probability: B

- ERPs of bootstrap tests.
- Distortions are reduced for all measures
- But ERP of GE (a = 2) is still very large even in large samples
- ERPs of GE (a = 0.5, –1) is small only for large samples.
- GE (a=0) (MLD) performs better than others. ERP is small for 500 or more observations.

N=50,000

N=100,000

2

0.0492

0.0415

1

0.0096

0.0096

0.5

0.0054

0.0052

0

0.0024

0.0043

–1

0.0113

0.0125

More on ERP for GEWhat would happen in very large samples?

ERP: conclusions

- Rate of convergence to zero of ERP of asymptotic tests is very slow.
- Same applies to bootstrap
- Tests based on GE measures can be unreliable even in large samples.

Sensitivity: a broader perspective

- Results so far are for a specific Singh-Maddala distribution.
- It is realistic, but – obviously – special.
- Consider alternative parameter values
- Particular focus on behaviour in the upper tail
- Consider alternative distributions
- Use other familiar and “realistic” functional forms
- Focus on lognormal and Pareto

Alternative distributions

- First consider comparative contamination performance for alternative distributions, same inequality index
- Use same diagrammatic tool as before
- x-axis is the 100 different samples, sorted such inequality realizations are increasing
- y-axis is RC(I) for the MLD index

Comparing Distributions

- Bootstrap tests usually improve numerical performance.
- MLD is more sensitive to contamination in high incomes when the underlying distribution upper tail is heavy.
- ERP of an asymptotic and bootstrap test based on the MLD or Theil index is more significant when the underlying distribution upper tail is heavy.

Why the Gini…?

- Why use the Gini coefficient?
- Obvious intuitive appeal
- Sometimes suggested that Gini is less prone to the influence of outliers
- Less sensitive to contamination in high incomes than GE indices.
- But little to choose between…
- the Gini coefficient and MLD
- Gini and the logarithmic variance

The Bootstrap…?

- Does the bootstrap “get you out of trouble”?
- bootstrap performs better than asymptotic methods,
- but does it perform well enough?
- In terms of the ERP, the bootstrap does well only for the Gini, MLD and logarithmic variance.
- If we use a distribution with a heavy upper tail bootstrap performs poorly in the case of a = 0
- even in large samples.

Download Presentation

Connecting to Server..