1 / 30

1. Are Women Paid Less than Men?

1. Are Women Paid Less than Men?. Intro and revision of basic statistics. Learning Objectives. Review of basic statistics Some basic stata comands Nature of randomness Distributions The use and abuse of the Normal Distribution Basic Hypothesis testing The role of prediction & causation.

yehudi
Download Presentation

1. Are Women Paid Less than Men?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1.Are Women Paid Less than Men? Intro and revision of basic statistics

  2. Learning Objectives • Review of basic statistics • Some basic statacomands • Nature of randomness • Distributions • The use and abuse of the Normal Distribution • Basic Hypothesis testing • The role of prediction & causation

  3. Wages and Gender • Some Stata commands • We can use data in wages.dta to examine this issue • .dta is a stata format data file • Start stata via NAL (see stata manual on blog) • Review the dataset using basic stata commands • Editor • summ • Describe • sort

  4. Wages and gender • Basic answer to the question is to compare average wages. • Stata: • summ wage if gender==1 • Summ wage if gender==2 • So we can say wages different in this sample • Can we say this difference would be true generally? • Note the implicit “out of sample” prediction in this question

  5. Statistical Inference • Key point is that would be no problem if we observed wages of all men and women. • Sampling is the source of the problem • Result could it be dumb luck • Choice of sample could be dumb luck • Isolated example? • What is chance that different sample lead to radically different result?

  6. Statistical Inference • Key Point: Statistical inference is process of deciding when we can use a sample result to make statements about the world • Also referred to as Hypothesis testing • Using this data can we reject the hypothesis that both groups are paid the same? • Rough ans: we can if difference between the average wage is large (How large?) • To answer precisely we need to review the nature of randomness – the role of “dumb luck”

  7. Random Variables • Outcome of an “experiment” value unknown until it is observed • Dice, coin, roulette wheel • Getting a job, the wage • In our example the observed differences between wages may be random and have nothing to do with gender i.e. dumb luck • Continuous vs Discrete Random Variables • Discrete: Finite no of possible outcomes • Dice, coin, lottery • Useful for intuition • Continuous: Outcome can be any value over a range • Most real world data is continuous

  8. Distribution of Random Variables • Discrete random variable • Establish the frequency of outcomes in repeated experiments • Continuous Random Variable • Outcome can be any value over a range • Can establish probability of r.v. being with a particular interval, • not of taking a particular value(zero by definition) • e.g. Household Income, interest rates, price of bread

  9. Probability Density Function f(x) • Mathematical representation of the distribution of a random variable • f(x) =Pr(X=x) • prob of dice to get 3 i.e. f(3) = Pr(X=3) = 1/6.

  10. f(x) = 1; Sum of probabilities = 1, 0  f(x)  1 • For a continuous rv • can take on any value within an interval: infinite no. of values. • NB: the probability of one exact value occurring = 0: • Pr(a X b) = ? • Probability is not measured by height now – measured by area

  11. An example of a continuous RV • An example of the bell curve 1200

  12. Empirical vs Theoretical Distribution • The examples so far are theoretical distributions • Real world data wont necessarily match these exactly or even approximately • We can show the empirical distribution of data on a histogram • File dice.dta contains 10 dice each rolled 3449 times • hist dice1

  13. Empirical Dist of Dice

  14. Example using wage data • Histogram will show the distribution • Stata: hist wages, bin(50) • Can also do it separately for the two genders • Hist wages if gender==1, bin(50) norm • Hist wages if gender==2, bin(50) norm • This shows that the distribution of wages looks a little different for both groups • Not just the average • Note that bell curve is bad approximation

  15. Characteristics of a Distribution • We can characterise the difference between distributions in many ways • The two main are the Expected value and the Variance • Expected Value is the average • Weighted by probability

  16. Sample and Theoretical Mean • Sample and theoretical mean will be different • dice: E(X)=1.(1/6)+2.(1/6)+3.(1/6)+4.(1/6)+5.(1/6)+6.(1/6)=3.5 • For dice1: summ dice1 • Continuous: • We already did for gender using summ

  17. Rules of Expectations • E(X+Y) = E(X)+E(Y) • E(X-Y) = E(X)-E(Y) • E(aX) =a E(X) • E(X+a) = E(X)+a • E(aX+bY+cZ) = aE(X)+bE(Y)+cE(X) See dice.dta for examples of this

  18. Variance of a random variable • Distribution has an average but it also varies around that average • Need a concept to measure that dispersion In stata part of output of “summ”

  19. Dice Example

  20. Dice Example cont. Note That the theoretical variance may differ from the variance in the sample Stata: summ dice1

  21. Rules of Variance • Var(aX) = a2Var(X): • Var(X+a) = Var(X): • Var(a+bX) = b2Var(X): • Var (aX+bY) = a2Var(X) + b2Var(Y)if X and Y are independent. (deal with dependence later). • Standard Deviation = Square root of Variance:  = 2

  22. Normal Distribution • A special continuous distribution that can be very useful • AKA “Gaussian Distribution”, “Bell Curve”, “Law of Errors” • Mean and variance completely defineit • X~N(m,s)

  23. Bell Curve 1200 1000

  24. Calculating Probabilities from Bell Curve • Integral solved for you by computer • in stata the “normal” function gives area under the curve of standard normal rv • Mean=0, variance=1 • display normal(0.5) • For other normal make use of trick • If y~N(m,s) then z=(y-m)/s is N(0,1) • Mean 1000, stn dev 100 • Prob(x<1200)=Prob(z<2) • di normal(2)

  25. Properties of the Normal • Symmetric around the mean • Positive and negative deviations are equally likely • The probability of a deviation declines with the size of a deviation • approx. 68% of the area under the curve lies between [m-s,m+s] • and 95% is in the interval [m-2s,m+2s]

  26. Using the Bell Curve • We often assume that data has normal distribution • Easy to use • Often intuitive • But remember nothing is actually normal • We choose to model things as normal • it is only a convenient approximation • Could be very wrong • Always have to ask yourself if it is reasonable to treat the data as normal • Check histogram

  27. Using the Bell Curve • NassimTaleb made career out of complaining that we assume data is normal when it is not • See Fooled by Randomness • Already seen that bad approx to wage data • Bad approximation to stock market data as grossly underside tails • Low prob of large changes • See sandp.dta

  28. Empirical dist of % Stock Returns

  29. Answer the question! • We now have some tools which we can use to answer the question of gender bias in wages • Recall that women are paid less on average in the sample • Recall that the issue is whether we can use this fact about the sample to make statements about the world (“population”)

More Related