1 / 31

BUSH 632: Getting Beyond Fear and Loathing of Statistics

BUSH 632: Getting Beyond Fear and Loathing of Statistics. Lecture 1 Spring, 2007. Don’t Panic. Motivation: this course is about the connection between theoretical claims and empirical data What we’ll cover (after a very brief review): Part 1: bi-variate regression

arella
Download Presentation

BUSH 632: Getting Beyond Fear and Loathing of Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BUSH 632: Getting Beyond Fear and Loathing of Statistics Lecture 1 Spring, 2007

  2. Don’t Panic • Motivation: this course is about the connection between theoretical claims and empirical data • What we’ll cover (after a very brief review): • Part 1: bi-variate regression • Part 2: multiviariate regression • Part 3: logit analysis and factor analysis

  3. The place of statistical analysis • Programs, policies, legislation typically consist of sets of normative claims and a (sketchy?) theory about how to achieve objectives • Policies typically attempt to map a set of beliefs and empirical claims into society, the economy, international relations. (E.g., welfare reform) • Policy analysts need to be able to identify the values served, distill the theory, and evaluate its empirical claims.

  4. The place of statistical analysis • Ingredients of strong empirical research • Theory  claims for policy (and counter-claims) • Hypotheses  measurement  analysis • Findings  Back to theory… • Implications for policy • Characterizing data • Data Quality: Valid? Reliable? Relevant? • Appropriate model design and execution • Are statistical models appropriate to test hypotheses? • Are models appropriately specified? • Do data conform to statistical assumptions?

  5. How to survive this class • Use the webpage • http://www.tamu.edu/classes/bush/hjsmith/courses/bush632.html • Lectures and book: as close as possible • Readings: Read ‘em or weep. • Questions: Bring ‘em to class, office hours • Stata: Use it a lot • In-class examples and exercises • Download exercises and data in advance • The place of exercises in Bush 632 • Nothing late; don’t miss class…

  6. Class Exams • Three Take-Home Exams • Characteristics and Grading Criteria • Connection to theory • Clear hypotheses • Appropriate statistical analyses • Clear and succinct explanations • Class Data Will Be Provided • From the text • www.aw-bc.com/stock_watson • From Us • On the Class Webpage

  7. A Brief Refresher on Functions and Sampling • Statistical models involve relationships • Relationships imply functions • E.g.: Coffee consumption and productivity • Functions are ubiquitous (or chaos prevails) • Most general expression: Y f (X1, X2, … Xn, e)

  8. Linear Functions

  9. Non-Linear Functions

  10. More Non-Linear Functions

  11. Functions in Policy • Welfare and work incentives • Employment = f(welfare programs, …) Pretty complex • Nuclear deterrence • Major power military conflict = f(nuclear capabilities, proliferation, …) • Educational Attainment • Test Scores = f(class size, institutional incentives, …) • Successful Program Implementation • Implementation = f(clarity, public support, complexity…)

  12. Sampling is also ubiquitous • “Knowing” a person: we sample • “Knowing” places: we sample • Samples are necessary to identify functions • Samples must cover relevant variables, contexts, etc. • Strategies for sampling • Soup and temperature: stir it • Stratify sample: observations in appropriate “cells” • Randomize

  13. Statistics Refresher: Topics • Characteristics of sampling distributions • Class Data • 2005 National Security Survey (phone and web) • Stata application • Means, Variance, Standard Deviations • The Normal Distribution • Medians and IQRs • Box Plots and Symmetry Plots • Central tendency • Expected value and means • Dispersion • Population variance, sample variance, standard deviations • Measures of relations • Covariation • covariance matrices • Correlations • Sampling distributions

  14. Measures of Central Tendency In general: E[Y] = µY For discrete functions: For continuous functions: An unbiased estimator of the expected value:

  15. Rules for Expected Value • E[a] = a -- the expected value of a constant is always a constant • E[bX] = bE[X] • E[X+W] = E[X] + E[W] • E[a + bX] = E[a] + E[bX] = a + bE[X]

  16. Measures of Dispersion • Var[X] = Cov[X,X] = E[X-E[X]]2 • Sample variance: • Standard deviation: • Sample Std. Dev:

  17. Rules for Variance Manipulation • Var[a] = 0 • Var[bX] = b2 Var[X] • From which we can deduce: Var[a+bX] = Var[a] + Var[bX] = b2 Var[X] • Var[X + W] = Var[X] + Var[W] + 2Cov[X,W]

  18. Measures of Association • Cov[X,Y] = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y] • Sample Covariance: • Correlation: • Correlation restricts range to -1/+1

  19. Rules of Covariance Manipulation • Cov[a,Y] = 0 (why?) • Cov[bX,Y] = bCov[X,Y] (why?) • Cov[X + W,Y] = Cov[X,Y] + Cov[W,Y]

  20. Covariance Matrices Correlation Matrices (Example) . correlate ahe yrseduc (obs=2950) | ahe yrseduc -------------+------------------ ahe | 1.0000 yrseduc | 0.3610 1.0000 Figure 5.3 Annual Hourly Earnings and Years of Education (Stock & Watson p. 165)

  21. Characterizing Data • Rolling in the data -- before modeling • A Cautionary Tale • Sample versus population statistics ConceptSample StatisticPopulation Parameter Mean Variance Standard Deviation

  22. Properties of Standard Normal (Gaussian) Distributions • Can be dramatically different than sample frequencies (especially small ones) Stata • Tails go to plus/minus infinity • The density of the distribution is key: +/- 1.96 std.s covers 95% of the distribution +/- 2.58 std.s covers 99% of the distribution • Student’s t tables converge on Gaussian

  23. ni=300 ni=100 ni=20 Standard Normal (Gaussian) Distributions • So what? • Only mean and standard deviation needed to characterize data, test simple hypotheses • Large sample characteristics: honing in on normal

  24. Order Statistics • Medians • Order statistic for central tendency • The value positioned at the middle or (n+1)/2 rank • Robustness compared to mean • Basis for “robust estimators” • Quartiles • Q1: 0-25%; Q2: 25-50%; Q3: 50-75% Q4: 75-100% • Percentiles • List of hundredths (say that fast 20 times)

  25. Distributional Shapes • Positive Skew • Negative Skew • Approximate Symmetry MdY MdY MdY

  26. Using the Interquartile Range (IQR) • IQR = Q3 - Q1 • Spans the middle 50% of the data • A measure of dispersion (or spread) • Robustness of IQR (relative to variance) • If Y is normally distributed, then: • SY≈IQR/1.35. • So: if MdY ≈ and SY ≈IQR/1.35, then • Y is approximately normally distributed

  27. Example: The Observed Distribution of Annual Household Income (Distribution of income by gender: men=1, women=2)

  28. Interpreting Box Plots Median Income = 15.38 (men), 14.34 (women)

  29. Quantile Normal Plots • Allow comparison between an empirical distribution and the Gaussian distribution • Plots percentiles against expected normal • Most intuitive: • Normal QQ plots • Evaluate

  30. Data Exploration in Stata • Access The Guns dataset from the replication data on the Stock and Watson Webpage • Using Incarceration Rate: univariate analysis Stata • Using Incarceration Rate : split by Shall Issue Laws Stata • Exercises: • Graphing: Produce • Histograms • Box plots • Q-Normal plots

  31. For Next Week • Read Stock and Watson • Chapter 4 • Homework Assignment on Webpage

More Related