Discrete and categorical data
Download
1 / 253

Discrete and Categorical Data - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Discrete and Categorical Data. William N. Evans Department of Economics/MPRC University of Maryland. Part I. Introduction. Introduction. Workhorse statistical model in social sciences is the multivariate regression model Ordinary least squares (OLS)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Discrete and Categorical Data' - dyami


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Discrete and categorical data

Discrete and Categorical Data

William N. Evans

Department of Economics/MPRC

University of Maryland


Part i

Part I

Introduction


Introduction
Introduction

  • Workhorse statistical model in social sciences is the multivariate regression model

  • Ordinary least squares (OLS)

    • yi = β0 + x1iβ1+ x2iβ2+… xkiβk+ εi

    • yi = xi β + εi


Linear model y i x i i
Linear model yi =  + xi + i

  •  and  are “population” values – represent the true relationship between x and y

  • Unfortunately – these values are unknown

  • The job of the researcher is to estimate these values

  • Notice that if we differentiate y with respect to x, we obtain

  • dy/dx = 



Put some concreteness on the problem
Put some concreteness xon the problem

  • State of Maryland budget problems

    • Drop in revenues

    • Expensive k-12 school spending initiatives

  • Short-term solution – raise tax on cigarettes by 34 cents/pack

  • Problem – a tax hike will reduce consumption of taxable product

  • Question for state – as taxes are raised, how much will cigarette consumption fall?


  • Simple model: y xi =  + xi + i

  • Suppose y is a state’s per capita consumption of cigarettes

  • x represents taxes on cigarettes

  • Question – how much will y fall if x is increased by 34 cents/pack?

  • Problem – many reasons why people smoke – cost is but one of them –


  • Data x

    • (Y) State per capita cigarette consumption for the years 1980-1997

    • (X) tax (State + Federal) in real cents per pack

    • “Scatter plot” of the data

    • Negative covariance between variables

      • When x>, more likely that y<

      • When x<, more likely that y>

  • Goal: pick values of  and  that “best fit” the data

    • Define best fit in a moment


Notation
Notation x

  • True model

    • yi =  + xi + i

    • We observe data points (yi,xi)

    • The parameters  and  are unknown

    • The actual error (i)is unknown

  • Estimated model

    • (a,b) are estimates for the parameters (,)

    • ei is an estimate of i where

    • ei=yi-a-bxi

  • How do you estimate a and b?


  • Objective minimize sum of squared errors
    Objective: Minimize sum of squared errors x

    • Min iei2 = i(yi – a – bxi)2

    • Minimize sum of squared errors (SSE)

    • Treat (+) and (-) errors equally

      • Over or under predict by “5” is the same magnitude of error

      • “Quadratic form”

      • The optimal value for a and b are those that make the 1st derivative equal zero

      • Functions reach min or max values when derivatives are zero


    • The model has a lot of nice features x

      • Statistical properties easy to establish

      • Optimal estimates easy to obtain

      • Parameter estimates are easy to interpret

      • Model maximizes prediction

        • If you minimize SSE you maximize R2

    • The model does well as a first order approximation to lots of problems


    Discrete and qualitative data
    Discrete and Qualitative Data x

    • The OLS model work well when y is a continuous variable

      • Income, wages, test scores, weight, GDP

    • Does not has as many nice properties when y is not continuous

    • Example: doctor visits

      • Integer values

      • Low counts for most people

      • Mass of observations at zero


    Downside of forcing non standard outcomes into ols world
    Downside of forcing non-standard outcomes into OLS world? x

    • Can predict outside the allowable range

      • e.g., negative MD visits

    • Does not describe the data generating process well

      • e.g., mass of observations at zero

    • Violates many properties of OLS

      • e.g. heteroskedasticity


    This talk
    This talk x

    • Look at situations when the data generating process does not lend itself well to OLS models

    • Mathematically describe the data generating process

    • Show how we use different optimization procedure to obtain estimates

    • Describe the statistical properties



    Types of data generating processes we will consider
    Types of data generating processes we will consider x

    • Dichotomous events (yes or no)

      • 1=yes, 0=no

      • Graduate high school? work? Are obese? Smoke?

    • Ordinal data

      • Self reported health (fair, poor, good, excel)

      • Strongly disagree, disagree, agree, strongly agree


    • Count data x

      • Doctor visits, lost workdays, fatality counts

    • Duration data

      • Time to failure, time to death, time to re-employment


    Recommended textbooks
    Recommended Textbooks x

    • Jeffrey Wooldridge, “Econometric analysis of cross sectional and panel data”

      • Lots of insight and mathematical/statistical detail

      • Very good examples

    • William Greene, “Econometric Analysis”

      • more topics

      • Somewhat dated examples


    Course web page
    Course web page x

    • www.bsos.umd.edu/econ/evans/jpsm.html

    • Contains

      • These notes

      • All STATA programs and data sets

      • A couple of “Introduction to STATA” handouts

      • Links to some useful web sites


    Stata resources discrete outcomes
    STATA Resources xDiscrete Outcomes

    • “Regression Models for Categorical Dependent Variables Using STATA”

      • J. Scott Long and Jeremy Freese

    • Available for sale from STATA website for $52 (www.stata.com)

    • Post-estimation subroutines that translate results

      • Do not need to buy the book to use the subroutines


  • Will give you a list of available programs to download

  • One is

    Spostado from http://www.indiana.edu/~jslsoc/stata

  • Click on the link and install the files


  • Part ii

    Part II x

    A brief introduction to STATA


    Stata
    STATA x

    • Very fast, convenient, well-documented, cheap and flexible statistical package

    • Excellent for cross-section/panel data projects, not as great for time series but getting better

    • Not as easy to manipulate large data sets from flat files as SAS

    • I usually clean data in SAS, estimate models in STATA


    • Key characteristic of STATA x

      • All data must be loaded into RAM

      • Computations are very fast

      • But, size of the project is limited by available memory

    • Results can be generated two different ways

      • Command line

      • Write a program, (*.do) then submit from the command line


    Sample program to get you started
    Sample program to get you started x

    • cps87_or.do

    • Program gets you to the point where can

      • Load data into memory

      • Construct new variables

      • Get simple statistics

      • Run a basic regression

      • Store the results on a disk


    Data cps87 do dta
    Data (cps87_do.dta) x

    • Random sample of data from 1987 Current Population Survey outgoing rotation group

    • Sample selection

      • Males

      • 21-64

      • Working 30+hours/week

    • 19,906 observations


    Major caveat
    Major caveat x

    • Hardest thing to learn/do: get data from some other source and get it into STATA data set

    • We skip over that part

    • All the data sets are loaded into a STATA data file that can be called by saying:

      use data file name


    Housekeeping at the top of the program
    Housekeeping at the top of the program x

    • * this line defines the semicolon as the ;

    • * end of line delimiter;

    • # delimit ;

    • * set memork for 10 meg;

    • set memory 10m;

    • * write results to a log file;

    • * the replace options writes over old;

    • * log files;

    • log using cps87_or.log,replace;

    • * open stata data set;

    • use c:\bill\stata\cps87_or;

    • * list variables and labels in data set;

    • desc;


    • ------------------------------------------------------------------------------------------------------------------------------------------------------------

    • > -

    • storage display value

    • variable name type format label variable label

    • ------------------------------------------------------------------------------

    • > -

    • age float %9.0g age in years

    • race float %9.0g 1=white, non-hisp, 2=place,

    • n.h, 3=hisp

    • educ float %9.0g years of education

    • unionm float %9.0g 1=union member, 2=otherwise

    • smsa float %9.0g 1=live in 19 largest smsa,

    • 2=other smsa, 3=non smsa

    • region float %9.0g 1=east, 2=midwest, 3=south,

    • 4=west

    • earnwke float %9.0g usual weekly earnings

    • ------------------------------------------------------------------------------


    Constructing new variables
    Constructing new variables------------------------------------------------------------------------------

    • Use ‘gen’ command for generate new variables

    • Syntax

      • gen new variable name=math statement

    • Easily construct new variables via

      • Algebraic operations

      • Math/trig functions (ln, exp, etc.)

      • Logical operators (when true, =1, when false, =0)


    From program
    From program------------------------------------------------------------------------------

    • * generate new variables;

    • * lines 1-2 illustrate basic math functoins;

    • * lines 3-4 line illustrate logical operators;

    • * line 5 illustrate the OR statement;

    • * line 6 illustrates the AND statement;

    • * after you construct new variables, compress the data again;

    • gen age2=age*age;

    • gen earnwkl=ln(earnwke);

    • gen union=unionm==1;

    • gen topcode=earnwke==999;

    • gen nonwhite=((race==2)|(race==3));

    • gen big_ne=((region==1)&(smsa==1));


    Getting basic statistics
    Getting basic statistics------------------------------------------------------------------------------

    • desc -- describes variables in the data set

    • sum – gets summary statistics

    • tab – produces frequencies (tables) of discrete variables


    From program1
    From program------------------------------------------------------------------------------

    • * get descriptive statistics;

    • sum;

    • * get detailed descriptics for continuous variables;

    • sum earnwke, detail;

    • * get frequencies of discrete variables;

    • tabulate unionm;

    • tabulate race;

    • * get two-way table of frequencies;

    • tabulate region smsa, row column cell;


    Results from sum
    Results from sum------------------------------------------------------------------------------

    • Variable | Obs Mean Std. Dev. Min Max

    • -------------+--------------------------------------------------------

    • age | 19906 37.96619 11.15348 21 64

    • race | 19906 1.199136 .525493 1 3

    • educ | 19906 13.16126 2.795234 0 18

    • unionm | 19906 1.769065 .4214418 1 2

    • smsa | 19906 1.908369 .7955814 1 3

    • -------------+--------------------------------------------------------


    Detailed summary
    Detailed summary------------------------------------------------------------------------------

    • usual weekly earnings

    • -------------------------------------------------------------

    • Percentiles Smallest

    • 1% 128 60

    • 5% 178 60

    • 10% 210 60 Obs 19906

    • 25% 300 63 Sum of Wgt. 19906

    • 50% 449 Mean 488.264

    • Largest Std. Dev. 236.4713

    • 75% 615 999

    • 90% 865 999 Variance 55918.7

    • 95% 999 999 Skewness .668646

    • 99% 999 999 Kurtosis 2.632356


    Results for tab
    Results for tab------------------------------------------------------------------------------

    • 1=union |

    • member, |

    • 2=otherwise | Freq. Percent Cum.

    • ------------+-----------------------------------

    • 1 | 4,597 23.09 23.09

    • 2 | 15,309 76.91 100.00

    • ------------+-----------------------------------

    • Total | 19,906 100.00


    2x2 table
    2x2 Table------------------------------------------------------------------------------

    • 1=east, |

    • 2=midwest, | 1=live in 19 largest smsa,

    • 3=south, | 2=other smsa, 3=non smsa

    • 4=west | 1 2 3 | Total

    • -----------+---------------------------------+----------

    • 1 | 2,806 1,349 842 | 4,997

    • | 56.15 27.00 16.85 | 100.00

    • | 38.46 18.89 15.39 | 25.10

    • | 14.10 6.78 4.23 | 25.10

    • -----------+---------------------------------+----------

    • 2 | 1,501 1,742 1,592 | 4,835

    • | 31.04 36.03 32.93 | 100.00

    • | 20.58 24.40 29.10 | 24.29

    • | 7.54 8.75 8.00 | 24.29

    • -----------+---------------------------------+----------

    • 3 | 1,501 2,542 1,904 | 5,947

    • | 25.24 42.74 32.02 | 100.00

    • | 20.58 35.60 34.80 | 29.88

    • | 7.54 12.77 9.56 | 29.88

    • -----------+---------------------------------+----------

    • 4 | 1,487 1,507 1,133 | 4,127

    • | 36.03 36.52 27.45 | 100.00

    • | 20.38 21.11 20.71 | 20.73

    • | 7.47 7.57 5.69 | 20.73

    • -----------+---------------------------------+----------

    • Total | 7,295 7,140 5,471 | 19,906

    • | 36.65 35.87 27.48 | 100.00

    • | 100.00 100.00 100.00 | 100.00

    • | 36.65 35.87 27.48 | 100.00


    Running a regression
    Running a regression------------------------------------------------------------------------------

    • Syntax

      reg dependent-variable independent-variables

    • Example from program

      *run simple regression;

      reg earnwkl age age2 educ nonwhite union;


    • Source | SS df MS Number of obs = 19906

    • -------------+------------------------------ F( 5, 19900) = 1775.70

    • Model | 1616.39963 5 323.279927 Prob > F = 0.0000

    • Residual | 3622.93905 19900 .182057239 R-squared = 0.3085

    • -------------+------------------------------ Adj R-squared = 0.3083

    • Total | 5239.33869 19905 .263217216 Root MSE = .42668

    • ------------------------------------------------------------------------------

    • earnwkl | Coef. Std. Err. t P>|t| [95% Conf. Interval]

    • -------------+----------------------------------------------------------------

    • age | .0679808 .0020033 33.93 0.000 .0640542 .0719075

    • age2 | -.0006778 .0000245 -27.69 0.000 -.0007258 -.0006299

    • educ | .069219 .0011256 61.50 0.000 .0670127 .0714252

    • nonwhite | -.1716133 .0089118 -19.26 0.000 -.1890812 -.1541453

    • union | .1301547 .0072923 17.85 0.000 .1158613 .1444481

    • _cons | 3.630805 .0394126 92.12 0.000 3.553553 3.708057

    • ------------------------------------------------------------------------------


    Analysis of variance
    Analysis of variance Number of obs = 19906

    • R2 = .3085

      • Variables explain 31% of the variation in log weekly earnings

    • F(5,19900)

      • Tests the hypothesis that all covariates (except constant) are jointly zero


    Interpret results
    Interpret results Number of obs = 19906

    • Y = β0 + β1Xi + εi

    • dY/dX = β1

    • But in this case Y=ln(W) where W weekly wages

    • dln(W)/dX = (dW/W)/dX = β1

      • Percentage change in wages given a change in x



    Part iii

    Part III 6.9%

    Some notes about probability distributions


    Continuous distributions
    Continuous Distributions 6.9%

    • Random variables with infinite number of possible values

    • Examples -- units of measure (time, weight, distance)

    • Many discrete outcomes can be treated as continuous, e.g., SAT scores


    How to describe a continuous random variable
    How to describe a continuous random variable 6.9%

    • The Probability Density Function (PDF)

    • The PDF for a random variable x is defined as f(x), where

      f(x) $ 0

      If(x)dx = 1

    • Calculus review: The integral of a function gives the “area under the curve”


    Cumulative distribution function cdf
    Cumulative Distribution Function (CDF) 6.9%

    • Suppose x is a “measure” like distance or time

    • 0 # x # 4

    • We may be interested in the Pr(x#a) ?


    CDF 6.9%

    What if we consider all values?


    Properties of cdf
    Properties of CDF 6.9%

    • Note that Pr(x # b) + Pr(x>b) =1

    • Pr(x>b) = 1 – Pr(x # b)

    • Many times, it is easier to work with compliments


    General notation for continuous distributions
    General notation for continuous distributions 6.9%

    • The PDF is described by lower case such as f(x)

    • The CDF is defined as upper case such as F(a)


    Standard normal distribution
    Standard Normal Distribution 6.9%

    • Most frequently used continuous distribution

    • Symmetric “bell-shaped” distribution

    • As we will show, the normal has useful properties

    • Many variables we observe in the real world look normally distributed.

    • Can translate normal into ‘standard normal’


    Examples of variables that look normally distributed
    Examples of variables that look normally distributed 6.9%

    • IQ scores

    • SAT scores

    • Heights of females

    • Log income

    • Average gestation (weeks of pregnancy)

    • As we will show in a few weeks – sample means are normally distributed!!!


    Standard normal distribution1
    Standard Normal Distribution 6.9%

    • PDF:

    • For -# z #


    Notation1
    Notation 6.9%

    • (z) is the standard normal PDF evaluated at z

    • [a] = Pr(z  a)


    Standard normal
    Standard Normal 6.9%

    • Notice that:

      • Normal is symmetric: (a) = (-a)

      • Normal is “unimodal”

      • Median=mean

      • Area under curve=1

      • Almost all area is between (-3,3)

    • Evaluations of the CDF are done with

      • Statistical functions (excel, SAS, etc)

      • Tables


    Standard normal cdf
    Standard Normal CDF 6.9%

    • Pr(z  -0.98) = [-0.98] = 0.1635


    • Pr(z 6.9% 1.41) = [1.41] = 0.9207



    • Pr(0.1 6.9% z  1.9)

      = Pr(z  1.9) – Pr(z  0.1)

      = M(1.9) - M(0.1) = 0.9713 - 0.5398

      = 0.4315


    Important properties of normal distribution
    Important Properties of Normal Distribution 6.9%

    • Pr(z  A) = [A]

    • Pr(z > A) = 1 - [A]

    • Pr(z  - A) = [-A]

    • Pr(z > -A) = 1 - [-A] = [A]


    Section iv

    Section IV 6.9%

    Maximum likelihood estimation


    Maximum likelihood estimation
    Maximum likelihood estimation 6.9%

    • Observe n independent outcomes, all drawn from the same distribution

    • (y1, y2, y3….yn)

    • yi is drawn from f(yi; θ) where θ is an unknown parameter for the PDF f

    • Recall definition of indepedence. If a and b and independent, Prob(a and b) = Pr(a)Pr(B)



    • MLE: pick a value for particular n values of Y would be drawn at random is called the ‘likelihood function’ and it equalsθ that best represents the chance these n values of y would have been generated randomly

    • To maximize L, maximize a monotonic function of L

    • Recall ln(abcd)=ln(a)+ln(b)+ln(c)+ln(d)


    • Max particular n values of Y would be drawn at random is called the ‘likelihood function’ and it equalsL = ln(L) = ln[f(y1; θ)] +ln[f(y2; θ)] +

      ….. ln[f(yn; θ) = Σi ln[f(yi; θ)]

    • Pick θ so that L is maximized

    • dL/dθ = 0


    L particular n values of Y would be drawn at random is called the ‘likelihood function’ and it equals

    θ

    θ1

    θ2


    Example poisson
    Example: Poisson particular n values of Y would be drawn at random is called the ‘likelihood function’ and it equals

    • Suppose y measures ‘counts’ such as doctor visits.

    • yi is drawn from a Poisson distribution

    • f(yi;λ) =e-λλyi/yi! For λ>0

    • E[yi]= Var[yi] = λ


    • Given n observations, particular n values of Y would be drawn at random is called the ‘likelihood function’ and it equals(y1, y2, y3….yn)

    • Pick value of λ that maximizes L

    • Max L =Σi ln[f(yi; θ)] = Σi ln[e-λλyi/yi!]

      = Σi [– λ + yiln(λ) – ln(yi!)]

      = -n λ + ln(λ) Σi yi – Σi ln(yi!)


    • L = particular n values of Y would be drawn at random is called the ‘likelihood function’ and it equals -n λ + ln(λ) Σi yi – Σi ln(yi!)

    • dL/dθ = -n + (1/ λ )Σi yi = 0

    • Solve for λ

    • λ = Σi yi /n =  = sample mean of y



    • If d solution for the parameter in L/dθ > 0, increasing θ will increase L so we increase θ some

    • If dL/dθ < 0, decreasing θ will increase L so we decrease θ some

    • Keep changing θ until dL/dθ = 0

    • How far you ‘step’ when you change θ is determined by a number of different factors


    L solution for the parameter in

    dL/dθ > 0

    θ1

    θ


    L solution for the parameter in

    dL/dθ < 0

    θ3

    θ


    Properties of mle estimates
    Properties of MLE estimates solution for the parameter in

    • Sometimes call efficient estimation. Can never generate a smaller variance than one obtained by MLE

    • Parameters estimates are distributed as a normal distribution when samples sizes are large

    • Therefore, if we divide the parameter by its standard error, should be normally distributed with a mean zero and variance 1 if the null (=0) is correct


    Section 5

    Section 5 solution for the parameter in

    Dichotomous outcomes


    Dichotomous data
    Dichotomous Data solution for the parameter in

    • Suppose data is discrete but there are only 2 outcomes

    • Examples

      • Graduate high school or not

      • Patient dies or not

      • Working or not

      • Smoker or not

    • In data, yi=1 if yes, yi =0 if no


    How to model the data generating process
    How to model the data generating process? solution for the parameter in

    • There are only two outcomes

    • Research question: What factors impact whether the event occurs?

    • To answer, will model the probability the outcome occurs

      • Pr(Yi=1) when yi=1 or

      • Pr(Yi=0) = 1- Pr(Yi=1) when yi=0


    • Think of the problem from a MLE perspective solution for the parameter in

    • Likelihood for i’th observation

    • Li= Pr(Yi=1)Yi [1 - Pr(Yi=1)](1-Yi)

      • When yi=1, only relevant part is Pr(Yi=1)

      • When yi=0, only relevant part is [1 - Pr(Yi=1)]


    • L = solution for the parameter in Σi ln[Li] =

      = Σi {yi ln[Pr(yi=1)] + (1-yi)ln[Pr(yi=0)] }

    • Notice that up to this point, the model is generic. The log likelihood function will determined by the assumptions concerning how we determine Pr(yi=1)


    Modeling the probability
    Modeling the probability solution for the parameter in

    • There is some process (biological, social, decision theoretic, etc) that determines the outcome y

    • Some of the variables impacting are observed, some are not

    • Requires that we model how these factors impact the probabilities

    • Model from a ‘latent variable’ perspective


    • Consider a women’s decision to work solution for the parameter in

    • yi* = the person’s net benefit to work

    • Two components of yi*

      • Characteristics that we can measure

        • Education, age, income of spouse, prices of child care

      • Some we cannot measure

        • How much you like spending time with your kids

        • how much you like/hate your job


  • Decision rule: person will work if yi* > 0

    (if net benefits are positive)

    yi=1 if yi*>0

    yi=0 if yi*≤0


    • y solution for the parameter in i=1 if yi*>0

      • yi* = xi β + εi > 0 only if

      • εi > - xi β

  • yi=0 if yi*≤0

    • yi* = xi β + εi ≤ 0 only if

    • εi ≤ - xi β


  • How to interpret
    How to interpret solution for the parameter in ε?

    • When we look at certain people, we have expectations about whether y should equal 1 or 0

    • These expectations do not always hold true

    • The error ε represents deviations from what we expect

    • Go back to the work example, suppose xi β is ‘big.’ We observe a woman with:

      • High wages

      • Low husband’s income

      • Low cost of child care



  • If we observe someone not working, then Consider the opposite. Suppose we observe someone NOT working.

  • Then εi must not have been big or it was a bigger negative number, since

    • yi=0 if εi ≤ - xi β


  • The probabilities
    The Probabilities values that

    • The estimation procedure used is determined by the assumed distribution of ε

    • What is the probability we observe someone with y=1?

      • Use definition of the CDF

      • Pr(yi=1) = Pr(yi*>0) = Pr(εi > - xi β)

        = 1 – F(-xi β)



    Normal probit model
    Normal (probit) Model values that

    • ε is distributed as a standard normal

      • Mean zero

      • Variance 1

    • Evaluate probability (y=1)

      • Pr(yi=1) = Pr(εi > - xi β) = 1 – Ф(-xi β)

      • Given symmetry: 1 – Ф(-xi β) = Ф(xi β)

    • Evaluate probability (y=0)

      • Pr(yi=0) = Pr(εi ≤ - xi β) = Ф(-xi β)

      • Given symmetry: Ф(-xi β) = 1 - Ф(xi β)


    • Summary values that

      • Pr(yi=1) = Ф(xi β)

      • Pr(yi=0) = 1 -Ф(xi β)

    • Notice that Ф(a) is increasing a. Therefore, is one of the x’s increases the probability of observing y, we would expect the coefficient on that variable to be (+)



    Logit
    Logit values that

    • CDF: F(a) = exp(a)/(1+exp(a))

      • Symmetric, unimodal distribution

      • Looks a lot like the normal

      • Incredibly easy to evaluate the CDF and PDF

      • Mean of zero, variance > 1 (more variance than normal)

    • Evaluate probability (y=1)

      • Pr(yi=1) = Pr(εi > - xi β) = 1 – F(-xi β)

      • Given symmetry: 1 – F(-xi β) = F(xi β)

      • F(xi β) = exp(xi β)/(1+exp(xi β))


    • Evaluate probability (y=0) values that

      • Pr(yi=0) = Pr(εi ≤ - xi β) = F(-xi β)

      • Given symmetry: F(-xi β) = 1 - F(xi β)

      • 1 - F(xi β) = 1 /(1+exp(xi β))

    • When εi is a logistic distribution

      • Pr(yi =1) = exp(xi β)/(1+exp(xi β))

      • Pr(yi=0) = 1/(1+exp(xi β))


    Example workplace smoking bans
    Example: Workplace smoking bans values that

    • Smoking supplements to 1991 and 1993 National Health Interview Survey

    • Asked all respondents whether they currently smoke

    • Asked workers about workplace tobacco policies

    • Sample: workers

    • Key variables: current smoking and whether they faced by workplace ban



    Description of variables in data
    Description of variables in data values that

    • . desc;

    • storage display value

    • variable name type format label variable label

    • ------------------------------------------------------------------------

    • > -

    • smoker byte %9.0g is current smoking

    • worka byte %9.0g has workplace smoking bans

    • age byte %9.0g age in years

    • male byte %9.0g male

    • black byte %9.0g black

    • hispanic byte %9.0g hispanic

    • incomel float %9.0g log income

    • hsgrad byte %9.0g is hs graduate

    • somecol byte %9.0g has some college

    • college float %9.0g

    • -----------------------------------------------------------------------


    Summary statistics
    Summary statistics values that

    • sum;

    • Variable | Obs Mean Std. Dev. Min Max

    • -------------+--------------------------------------------------------

    • smoker | 16258 .25163 .433963 0 1

    • worka | 16258 .6851396 .4644745 0 1

    • age | 16258 38.54742 11.96189 18 87

    • male | 16258 .3947595 .488814 0 1

    • black | 16258 .1119449 .3153083 0 1

    • -------------+--------------------------------------------------------

    • hispanic | 16258 .0607086 .2388023 0 1

    • incomel | 16258 10.42097 .7624525 6.214608 11.22524

    • hsgrad | 16258 .3355271 .4721889 0 1

    • somecol | 16258 .2685447 .4432161 0 1

    • college | 16258 .3293763 .4700012 0 1


    Running a probit
    Running a probit values that

    • probit smoker age incomel male black hispanic hsgrad somecol college worka;

    • The first variable after ‘probit’ is the discrete outcome, the rest of the variables are the independent variables

    • Includes a constant as a default


    Running a logit
    Running a logit values that

    • logit smoker age incomel male black hispanic hsgrad somecol college worka;

    • Same as probit, just change the first word


    Running linear probability
    Running linear probability values that

    • reg smoker age incomel male black hispanic hsgrad somecol college worka, robust;

    • Simple regression.

    • Standard errors are incorrect (heteroskedasticity)

    • robust option produces standard errors with arbitrary form of heteroskedasticity


    Probit results
    Probit Results values that

    • Probit estimates Number of obs = 16258

    • LR chi2(9) = 819.44

    • Prob > chi2 = 0.0000

    • Log likelihood = -8761.7208 Pseudo R2 = 0.0447

    • ------------------------------------------------------------------------------

    • smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    • -------------+----------------------------------------------------------------

    • age | -.0012684 .0009316 -1.36 0.173 -.0030943 .0005574

    • incomel | -.092812 .0151496 -6.13 0.000 -.1225047 -.0631193

    • male | .0533213 .0229297 2.33 0.020 .0083799 .0982627

    • black | -.1060518 .034918 -3.04 0.002 -.17449 -.0376137

    • hispanic | -.2281468 .0475128 -4.80 0.000 -.3212701 -.1350235

    • hsgrad | -.1748765 .0436392 -4.01 0.000 -.2604078 -.0893453

    • somecol | -.363869 .0451757 -8.05 0.000 -.4524118 -.2753262

    • college | -.7689528 .0466418 -16.49 0.000 -.860369 -.6775366

    • worka | -.2093287 .0231425 -9.05 0.000 -.2546873 -.1639702

    • _cons | .870543 .154056 5.65 0.000 .5685989 1.172487

    • ------------------------------------------------------------------------------


    How to measure fit
    How to measure fit? values that

    • Regression (OLS)

      • minimize sum of squared errors

      • Or, maximize R2

      • The model is designed to maximize predictive capacity

    • Not the case with Probit/Logit

      • MLE models pick distribution parameters so as best describe the data generating process

      • May or may not ‘predict’ the outcome well


    Pseudo r 2
    Pseudo R values that 2

    • LLk log likelihood with all variables

    • LL1 log likelihood with only a constant

    • 0 > LLk > LL1 so | LLk | < |LL1|

    • Pseudo R2 = 1 - |LL1/LLk|

    • Bounded between 0-1

    • Not anything like an R2 from a regression


    Predicting y
    Predicting Y values that

    • Let b be the estimated value of β

    • For any candidate vector of xi , we can predict probabilities, Pi

    • Pi = Ф(xib)

    • Once you have Pi, pick a threshold value, T, so that you predict

      • Yp = 1 if Pi > T

      • Yp = 0 if Pi ≤ T

  • Then compare, fraction correctly predicted


    • Question: what value to pick for T? values that

    • Can pick .5

      • Intuitive. More likely to engage in the activity than to not engage in it

      • However, when the  is small, this criteria does a poor job of predicting Yi=1

      • However, when the  is close to 1, this criteria does a poor job of picking Yi=0


    • *predict probability of smoking; values that

    • predict pred_prob_smoke;

    • * get detailed descriptive data about predicted prob;

    • sum pred_prob, detail;

    • * predict binary outcome with 50% cutoff;

    • gen pred_smoke1=pred_prob_smoke>=.5;

    • label variable pred_smoke1 "predicted smoking, 50% cutoff";

    • * compare actual values;

    • tab smoker pred_smoke1, row col cell;


    • . values that sum pred_prob, detail;

    • Pr(smoker)

    • -------------------------------------------------------------

    • Percentiles Smallest

    • 1% .0959301 .0615221

    • 5% .1155022 .0622963

    • 10% .1237434 .0633929 Obs 16258

    • 25% .1620851 .0733495 Sum of Wgt. 16258

    • 50% .2569962 Mean .2516653

    • Largest Std. Dev. .0960007

    • 75% .3187975 .5619798

    • 90% .3795704 .5655878 Variance .0092161

    • 95% .4039573 .5684112 Skewness .1520254

    • 99% .4672697 .6203823 Kurtosis 2.149247


    • Notice two things values that

      • Sample mean of the predicted probabilities is close to the sample mean outcome

      • 99% of the probabilities are less than .5

      • Should predict few smokers if use a 50% cutoff


    • | predicted smoking, values that

    • is current | 50% cutoff

    • smoking | 0 1 | Total

    • -----------+----------------------+----------

    • 0 | 12,153 14 | 12,167

    • | 99.88 0.12 | 100.00

    • | 74.93 35.90 | 74.84

    • | 74.75 0.09 | 74.84

    • -----------+----------------------+----------

    • 1 | 4,066 25 | 4,091

    • | 99.39 0.61 | 100.00

    • | 25.07 64.10 | 25.16

    • | 25.01 0.15 | 25.16

    • -----------+----------------------+----------

    • Total | 16,219 39 | 16,258

    • | 99.76 0.24 | 100.00

    • | 100.00 100.00 | 100.00

    • | 99.76 0.24 | 100.00


    • Check on-diagonal elements. values that

    • The last number in each 2x2 element is the fraction in the cell

    • The model correctly predicts 74.75 + 0.15 = 74.90% of the obs

    • It only predicts a small fraction of smokers



    • In this case, 25.16% smoke. values that

    • If everyone had the same chance of smoking, we would assign everyone Pr(y=1) = .2516

    • We would be correct for the 1 - .2516 = 0.7484 people who do not smoke


    Key points about prediction
    Key points about prediction values that

    • MLE models are not designed to maximize prediction

    • Should not be surprised they do not predict well

    • In this case, not particularly good measures of predictive capacity


    Translating coefficients in probit continuous covariates
    Translating coefficients in probit: values that Continuous Covariates

    • Pr(yi=1) = Φ[β0 + x1iβ1+ x2iβ2+… xkiβk]

    • Suppose that x1i is a continuous variable

    • d Pr(yi=1) /d x1i = ?

    • What is the change in the probability of an event give a change in x1i?


    Marginal effect
    Marginal Effect values that

    • d Pr(yi=1) /d x1i

    • = β1φ[β0 + x1iβ1+ x2iβ2+… xkiβk]

    • Notice two things. Marginal effect is a function of the other parameters and the values of x.


    Translating coefficients discrete covariates
    Translating Coefficients: values that Discrete Covariates

    • Pr(yi=1) = Φ[β0 + x1iβ1+ x2iβ2+… xkiβk]

    • Suppose that x2i is a dummy variable (1 if yes, 0 if no)

    • Marginal effect makes no sense, cannot change x2i by a little amount. It is either 1 or 0.

    • Redefine the variable of interest. Compare outcomes with and without x2i


    • y values that 1 = Pr(yi=1 | x2i=1)

      = Φ[β0 + x1iβ1+ β2 + x3iβ3 +… ]

    • y0 = Pr(yi=1 | x2i=0)

      = Φ[β0 + x1iβ1+ x3iβ3 … ]

      Marginal effect = y1 – y0.

      Difference in probabilities with and without x2i?


    In stata
    In STATA values that

    • Marginal effects for continuous variables, and Change in probabilities for dichotomous outcomes, STATA picks sample means for X’s


    Stata command for marginal effects
    STATA command for Marginal Effects values that

    • mfx compute;

    • Must come after the outcome when estimates are still active in program.


    • Marginal effects after probit values that

    • y = Pr(smoker) (predict)

    • = .24093439

    • ------------------------------------------------------------------------------

    • variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

    • ---------+--------------------------------------------------------------------

    • age | -.0003951 .00029 -1.36 0.173 -.000964 .000174 38.5474

    • incomel | -.0289139 .00472 -6.13 0.000 -.03816 -.019668 10.421

    • male*| .0166757 .0072 2.32 0.021 .002568 .030783 .39476

    • black*| -.0320621 .01023 -3.13 0.002 -.052111 -.012013 .111945

    • hispanic*| -.0658551 .01259 -5.23 0.000 -.090536 -.041174 .060709

    • hsgrad*| -.053335 .01302 -4.10 0.000 -.07885 -.02782 .335527

    • somecol*| -.1062358 .01228 -8.65 0.000 -.130308 -.082164 .268545

    • college*| -.2149199 .01146 -18.76 0.000 -.237378 -.192462 .329376

    • worka*| -.0668959 .00756 -8.84 0.000 -.08172 -.052072 .68514

    • ------------------------------------------------------------------------------

    • (*) dy/dx is for discrete change of dummy variable from 0 to 1


    Interpret results1
    Interpret results values that

    • 10% increase in income will reduce smoking by 2.9 percentage points

    • 10 year increase in age will decrease smoking rates .4 percentage points

    • Those with a college degree are 21.5 percentage points less likely to smoke

    • Those that face a workplace smoking ban have 6.7 percentage point lower probability of smoking




    Marginal effects for specific characteristics
    Marginal effects for specific characteristics values that

    • Can generate marginal effects for a specific x

    • prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0);

    • If an x is not specified, STATA will use the sample mean (e.g., log income in this case)

    • Make sure when you specify a particular dummy variable (=1) you set the rest to zero


    • probit: Changes in Predicted Probabilities for smoker values that

    • min->max 0->1 -+1/2 -+sd/2 MargEfct

    • age -0.0323 -0.0005 -0.0005 -0.0056 -0.0005

    • incomel -0.1795 -0.0320 -0.0344 -0.0263 -0.0345

    • male 0.0198 0.0198 0.0198 0.0097 0.0198

    • black -0.0385 -0.0385 -0.0394 -0.0124 -0.0394

    • hispanic -0.0804 -0.0804 -0.0845 -0.0202 -0.0847

    • hsgrad -0.0625 -0.0625 -0.0648 -0.0306 -0.0649

    • somecol -0.1235 -0.1235 -0.1344 -0.0598 -0.1351

    • college -0.2644 -0.2644 -0.2795 -0.1335 -0.2854

    • worka -0.0742 -0.0742 -0.0776 -0.0361 -0.0777


    Testing significance of individual parameters
    Testing significance of individual parameters values that

    • In large samples, MLE estimates are normally distributed

    • Null hypothesis, βj=0

    • If the null is true and the sample is larges, βj is distributed as a normal with variance σj2.

    • Using notes from before, if we divide βj by the standard deviation, we get standard normal


    • β values that j/se(βj) should be N(0,1)

    • βj/se(βj) = z-score

    • 95% of the distribution of a N(0,1) is between -1.96, 1.96

    • Reject null of the z-score > |1.96|

    • Only age is statistically insignificant (cannot reject null)


    When will results differ
    When will results differ? values that

    • Normal and logit CDF look:

      • Similar in the mid point of the distribution

      • Different in the tails

    • You obtain more observations in the tails of the distribution when

      • Samples sizes are large

      •  approaches 1 or 0

    • These situations will produce more differences in estimates


    Some nice properties of the logit
    Some nice properties of the Logit values that

    • Outcome, y=1 or 0

    • Treatment, x=1 or 0

    • Other covariates, x

    • Context,

      • x = whether a baby is born with a low weight birth

      • x = whether the mom smoked or not during pregnancy


    • Risk ratio values that

      RR = Prob(y=1|x=1)/Prob(y=1|x=0)

      Differences in the probability of an event when x is and is not observed

      How much does smoking elevate the chance your child will be a low weight birth


    • Let Y values that yx be the probability y=1 or 0 given x=1 or 0

    • Think of the risk ratio the following way

      • Y11 is the probability Y=1 when X=1

      • Y10 is the probability Y=1 when X=0

  • Y11 = RR*Y10


    • Odds Ratio values that

      OR=A/B = [Y11/Y01]/[Y10/Y00]

      A = [Pr(Y=1|X=1)/Pr(Y=0|X=1)]

      = odds of Y occurring if you are a smoker

      B = [Pr(Y=1|X=0)/Pr(Y=0|X=0)]

      = odds of y happening if you are not a smoker

      What are the relative odds of Y happening if you do or do not experience X


    • Suppose Pr(Y values that i =1) = F(βo+ β1Xi + β2Z) and F is the logistic function

    • Can show that

    • OR = exp(β1) = e β1

    • This number is typically reported by most statistical packages


    • Details values that

      • Y11 = exp(βo+ β1 + β2Z) /(1+ exp(βo+ β1+ β2Z) )

      • Y10 = exp(βo+ β2Z)/(1+ exp(βo+β2Z))

      • Y01 = 1 /(1+ exp(βo+ β1 + β2Z) )

      • Y00 = 1/(1+ exp(βo+β2Z)

      • [Y11/Y01] = exp(βo+ β1 + β2Z)

      • [Y10/Y00] = exp(βo+ β2Z)

      • OR=A/B = [Y11/Y01]/[Y10/Y00]

        = exp(βo+ β1 + β2Z)/ exp(βo + β2Z)

        = exp(β1)


    • Suppose Y is rare, values that  close to 0

      • Pr(Y=0|X=1) and Pr(Y=0|X=0) are both close to 1, so they cancel

    • Therefore, when  is close to 0

      • Odds Ratio = Risk Ratio

    • Why is this nice?


    Population attributable risk
    Population attributable risk values that

    • Average outcome in the population

    •  = (1-) Y10 +  Y11 = (1- )Y10 + (RR)Y10

    • Average outcomes are a weighted average of outcomes for X=0 and X=1

    • What would the average outcome be in the absence of X (e.g., reduce smoking rates to 0)

    • Ya = Y10


    Population attributable risk1
    Population Attributable Risk values that

    • PAR

    • Fraction of outcome attributed to X

    • The difference between the current rate and the rate that would exist without X, divided by the current rate

    • PAR = ( – Ya)/

      = (RR – 1)/[(1-) + RR]


    Example maternal smoking and low weight births
    Example: Maternal Smoking and Low Weight Births values that

    • 6% births are low weight

      • < 2500 grams (

      • Average birth is 3300 grams (5.5 lbs)

    • Maternal smoking during pregnancy has been identified as a key cofactor

      • 13% of mothers smoke

      • This number was falling about 1 percentage point per year during 1980s/90s

      • Doubles chance of low weight birth


    Natality detail data
    Natality detail data values that

    • Census of all births (4 million/year)

    • Annual files starting in the 60s

    • Information about

      • Baby (birth weight, length, date, sex, plurality, birth injuries)

      • Demographics (age, race, marital, educ of mom)

      • Birth (who delivered, method of delivery)

      • Health of mom (smoke/drank during preg, weight gain)


    • Smoking not available from CA or NY values that

    • ~3 million usable observations

    • I pulled .5% random sample from 1995

    • About 12,500 obs

    • Variables: birthweight (grams), smoked, married, 4-level race, 5 level education, mothers age at birth


    • ------------------------------------------------------------------------------------------------------------------------------------------------------------

    • > -

    • storage display value

    • variable name type format label variable label

    • ------------------------------------------------------------------------------

    • > -

    • birthw int %9.0g birth weight in grams

    • smoked byte %9.0g =1 if mom smoked during

    • pregnancy

    • age byte %9.0g moms age at birth

    • married byte %9.0g =1 if married

    • race4 byte %9.0g 1=white,2=black,3=asian,4=other

    • educ5 byte %9.0g 1=0-8, 2=9-11, 3=12, 4=13-15,

    • 5=16+

    • visits byte %9.0g prenatal visits

    • ------------------------------------------------------------------------------


    • dummy |------------------------------------------------------------------------------

    • variable, |

    • =1 | =1 if mom smoked

    • ifBW<2500 | during pregnancy

    • grams | 0 1 | Total

    • -----------+----------------------+----------

    • 0 | 11,626 1,745 | 13,371

    • | 86.95 13.05 | 100.00

    • | 94.64 89.72 | 93.96

    • | 81.70 12.26 | 93.96

    • -----------+----------------------+----------

    • 1 | 659 200 | 859

    • | 76.72 23.28 | 100.00

    • | 5.36 10.28 | 6.04

    • | 4.63 1.41 | 6.04

    • -----------+----------------------+----------

    • Total | 12,285 1,945 | 14,230

    • | 86.33 13.67 | 100.00

    • | 100.00 100.00 | 100.00

    • | 86.33 13.67 | 100.00


    • Notice a few things------------------------------------------------------------------------------

      • 13.7% of women smoke

      • 6% have low weight birth

    • Pr(LBW | Smoke) =10.28%

    • Pr(LBW |~ Smoke) = 5.36%

    • RR

      = Pr(LBW | Smoke)/ Pr(LBW |~ Smoke)

      = 0.1028/0.0536 = 1.92


    Logit results
    Logit results------------------------------------------------------------------------------

    • Log likelihood = -3136.9912 Pseudo R2 = 0.0330

    • ------------------------------------------------------------------------------

    • lowbw | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    • -------------+----------------------------------------------------------------

    • smoked | .6740651 .0897869 7.51 0.000 .4980861 .8500441

    • age | .0080537 .006791 1.19 0.236 -.0052564 .0213638

    • married | -.3954044 .0882471 -4.48 0.000 -.5683654 -.2224433

    • _Ieduc5_2 | -.1949335 .1626502 -1.20 0.231 -.5137221 .1238551

    • _Ieduc5_3 | -.1925099 .1543239 -1.25 0.212 -.4949791 .1099594

    • _Ieduc5_4 | -.4057382 .1676759 -2.42 0.016 -.7343769 -.0770994

    • _Ieduc5_5 | -.3569715 .1780322 -2.01 0.045 -.7059081 -.0080349

    • _Irace4_2 | .7072894 .0875125 8.08 0.000 .5357681 .8788107

    • _Irace4_3 | .386623 .307062 1.26 0.208 -.2152075 .9884535

    • _Irace4_4 | .3095536 .2047899 1.51 0.131 -.0918271 .7109344

    • _cons | -2.755971 .2104916 -13.09 0.000 -3.168527 -2.343415

    • ------------------------------------------------------------------------------


    Odds ratios
    Odds Ratios------------------------------------------------------------------------------

    • Smoked

      • exp(0.674) = 1.96

      • Smokers are twice as likely to have a low weight birth

    • _Irace4_2 (Blacks)

      • exp(0.707) = 2.02

      • Blacks are twice as likely to have a low weight birth


    Asking for odds ratios
    Asking for odds ratios------------------------------------------------------------------------------

    • Logistic y x1 x2;

    • In this case

    • xi: logistic lowbw smoked age married i.educ5 i.race4;


    • Log likelihood = -3136.9912 Pseudo R2 = 0.0330

    • ------------------------------------------------------------------------------

    • lowbw | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]

    • -------------+----------------------------------------------------------------

    • smoked | 1.962198 .1761796 7.51 0.000 1.645569 2.33975

    • age | 1.008086 .0068459 1.19 0.236 .9947574 1.021594

    • married | .6734077 .0594262 -4.48 0.000 .5664506 .8005604

    • _Ieduc5_2 | .8228894 .1338431 -1.20 0.231 .5982646 1.131852

    • _Ieduc5_3 | .8248862 .1272996 -1.25 0.212 .6095837 1.116233

    • _Ieduc5_4 | .6664847 .1117534 -2.42 0.016 .4798043 .9257979

    • _Ieduc5_5 | .6997924 .1245856 -2.01 0.045 .4936601 .9919973

    • _Irace4_2 | 2.028485 .1775178 8.08 0.000 1.70876 2.408034

    • _Irace4_3 | 1.472001 .4519957 1.26 0.208 .8063741 2.687076

    • _Irace4_4 | 1.362817 .2790911 1.51 0.131 .9122628 2.035893

    • ------------------------------------------------------------------------------


    PAR = 0.0330

    • PAR = (RR – 1)/[(1-) + RR]

    • = 0.137

    • RR = 1.96

    • PAR = 0.116

    • 11.6% of low weight births attributed to maternal smoking


    Hypothesis testing in mle models
    Hypothesis Testing in MLE models = 0.0330

    • MLE are asymptotically normally distributed, one of the properties of MLE

    • Therefore, standard t-tests of hypothesis will work as long as samples are ‘large’

    • What ‘large’ means is open to question

    • What to do when samples are ‘small’ – table for a moment


    Testing a linear combination of parameters
    Testing a linear combination of parameters = 0.0330

    • Suppose you have a probit model

      • Φ[β0 + x1iβ1+ x2iβ2 + x3iβ3 +… ]

  • Test a linear combination or parameters

  • Simplest example, test a subset are zero

  • β1= β2 = β3 = β4 =0

  • To fix the discussion

    • N observations

    • K parameters

    • J restrictions (count the equals signs, j=4)


  • Wald test
    Wald Test = 0.0330

    • Based on the fact that the parameters are distributed asymptotically normal

    • Probability theory review

      • Suppose you have m draws from a standard normal distribution (zi)

      • M = z12 + z22 + …. zm2

      • M is distributed as a Chi-square with m degrees of freedom


    • Wald test constructs a ‘quadratic form’ suggested by the test you want to perform

    • This combination, because it contains squares of the true parameters, should, if the hypothesis is true, be distributed as a Chi square with J degrees of freedom.

    • If the test statistic is ‘large’, relative to the degrees of freedom of the test, we reject, because there is a low probability we would have drawn that value at random from the distribution


    Reading critical values from a table
    Reading critical values test you want to performfrom a table

    • All stats books will report the ‘percentiles’ of a chi-square

      • Vertical axis (degrees of freedom)

      • Horizontal axis (percentiles)

      • Entry is the value where ‘percentile’ of the distribution falls below


    • Example: Suppose 4 restrictions test you want to perform

    • 95% of a chi-square distribution falls below 9.488.

    • So there is only a 5% a number drawn at random will exceed 9.488

    • If your test statistic is below, cannot reject null

    • If your test statistics is above, reject null


    Chi square
    Chi-square test you want to perform


    Wald test in stata
    Wald test in STATA test you want to perform

    • Default test in MLE models

    • Easy to do. Look at program

      • test hsgrad somecol college

  • Does not estimate the ‘restricted’ model

  • ‘Lower power’ than other tests, i.e., high chance of false negative


    • . test you want to performtest hsgrad somecol college;

    • ( 1) hsgrad = 0

    • ( 2) somecol = 0

    • ( 3) college = 0

    • chi2( 3) = 504.78

    • Prob > chi2 = 0.0000



    2 log likelihood test
    -2 Log likelihood test low chance that a variable, drawn at random from a ch-square with three degrees of freedom will be this large.

    • * how to run the same tests with a -2 log like test;

    • * estimate the unresticted model and save the estimates ;

    • * in urmodel;

    • probit smoker age incomel male black hispanic

    • hsgrad somecol college worka;

    • estimates store urmodel;

    • * estimate the restricted model. save results in rmodel;

    • probit smoker age incomel male black hispanic

    • worka;

    • estimates store rmodel;

    • lrtest urmodel rmodel;


    • I prefer -2 log likelihood test low chance that a variable, drawn at random from a ch-square with three degrees of freedom will be this large.

      • Estimates the restricted and unrestricted model

      • Therefore, has more power than a Wald test

    • In most cases, they give the same ‘decision’ (reject/not reject)


    Section vi

    Section VI low chance that a variable, drawn at random from a ch-square with three degrees of freedom will be this large.

    Categorical Data


    Ordered probit
    Ordered Probit low chance that a variable, drawn at random from a ch-square with three degrees of freedom will be this large.

    • Many discrete outcomes are to questions that have a natural ordering but no quantitative interpretation:

    • Examples:

      • Self reported health status

        • (excellent, very good, good, fair, poor)

      • Do you agree with the following statement

        • Strongly agree, agree, disagree, strongly disagree



    Self reported health status
    Self reported health status analyze these outcomes

    • Excellent, very good, good, fair, poor

    • Coded as 1, 2, 3, 4, 5 on National Health Interview Survey

    • We will code as 5,4,3,2,1 (easier to think of this way)

    • Asked on every major health survey

    • Important predictor of health outcomes, e.g. mortality

    • Key question: what predicts health status?



    Model
    Model of their value, just an ordering to show you the lowest to highest

    • yi* = latent index of reported health

    • The latent index measures your own scale of health. Once yi* crosses a certain value you report poor, then good, then very good, then excellent health


    • y of their value, just an ordering to show you the lowest to highesti = (1,2,3,4,5) for (fair, poor, VG, G, excel)

    • Interval decision rule

      • yi=1 if yi* ≤ u1

      • yi=2 if u1 < yi* ≤ u2

      • yi=3 if u2 < yi* ≤ u3

      • yi=4 if u3 < yi* ≤ u4

      • yi=5 if yi* > u4



    • The threshold values (u of their value, just an ordering to show you the lowest to highest1, u2, u3, u4) are unknown. We do not know the value of the index necessary to push you from very good to excellent.

    • In theory, the threshold values are different for everyone

    • Computer will not only estimate the β’s, but also the thresholds – average across people



    Probabilities
    Probabilities the assumed distribution of

    • Lets do the outliers, Pr(yi=1) and Pr(yi=5) first

    • Pr(yi=1)

    • = Pr(yi* ≤ u1)

    • = Pr(xi β +εi ≤ u1 )

    • =Pr(εi ≤ u1 - xi β)

    • = Φ[u1 - xi β] = 1- Φ[xi β – u1]


    • Pr(y the assumed distribution of i=5)

    • = Pr(yi* > u4)

    • = Pr(xi β +εi > u4 )

    • =Pr(εi > u4 - xi β)

    • = 1 - Φ[u4 - xi β] = Φ[xi β – u4]


    Sample one for y 3
    Sample one for y=3 the assumed distribution of

    • Pr(yi=3) = Pr(u2 < yi* ≤ u3)

      = Pr(yi* ≤ u3) – Pr(yi* ≤ u2)

      = Pr(xi β +εi≤ u3) – Pr(xi β +εi≤ u2)

      = Pr(εi≤ u3- xi β) - Pr(εi≤ u2 - xi β)

      = Φ[u3- xi β] - Φ[u2 - xi β]

      = 1 - Φ[xi β - u3] – 1 + Φ[xi β - u2]

      = Φ[xi β - u2] - Φ[xi β - u3]


    Summary
    Summary the assumed distribution of

    • Pr(yi=1) = 1- Φ[xi β – u1]

    • Pr(yi=2) = Φ[xi β – u1] - Φ[xi β – u2]

    • Pr(yi=3) = Φ[xi β – u2] - Φ[xi β – u3]

    • Pr(yi=4) = Φ[xi β – u3] - Φ[xi β – u4]

    • Pr(yi=5) = Φ[xi β – u4]


    Likelihood function
    Likelihood function the assumed distribution of

    • There are 5 possible choices for each person

    • Only 1 is observed

    • L = Σi ln[Pr(yi=k)] for k


    Programming example
    Programming example the assumed distribution of

    • Cancer control supplement to 1994 National Health Interview Survey

    • Question: what observed characteristics predict self reported health (1-5 scale)

    • 1=poor, 5=excellent

    • Key covariates: income, education, age, current and former smoking status

    • Programs

      • sr_health_status.do, .dta, .log


    • desc; the assumed distribution of

    • male byte %9.0g =1 if male

    • age byte %9.0g age in years

    • educ byte %9.0g years of education

    • smoke byte %9.0g current smoker

    • smoke5 byte %9.0g smoked in past 5 years

    • black float %9.0g =1 if respondent is black

    • othrace float %9.0g =1 if other race (white is ref)

    • sr_health float %9.0g 1-5 self reported health,

    • 5=excel, 1=poor

    • famincl float %9.0g log family income


    • tab sr_health; the assumed distribution of

    • 1-5 self |

    • reported |

    • health, |

    • 5=excel, |

    • 1=poor | Freq. Percent Cum.

    • ------------+-----------------------------------

    • 1 | 342 2.65 2.65

    • 2 | 991 7.68 10.33

    • 3 | 3,068 23.78 34.12

    • 4 | 3,855 29.88 64.00

    • 5 | 4,644 36.00 100.00

    • ------------+-----------------------------------

    • Total | 12,900 100.00


    In stata1
    In STATA the assumed distribution of

    • oprobit sr_health male age educ famincl black othrace smoke smoke5;


    • Ordered probit estimates Number of obs = 12900

    • LR chi2(8) = 2379.61

    • Prob > chi2 = 0.0000

    • Log likelihood = -16401.987 Pseudo R2 = 0.0676

    • ------------------------------------------------------------------------------

    • sr_health | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    • -------------+----------------------------------------------------------------

    • male | .1281241 .0195747 6.55 0.000 .0897583 .1664899

    • age | -.0202308 .0008499 -23.80 0.000 -.0218966 -.018565

    • educ | .0827086 .0038547 21.46 0.000 .0751535 .0902637

    • famincl | .2398957 .0112206 21.38 0.000 .2179037 .2618878

    • black | -.221508 .029528 -7.50 0.000 -.2793818 -.1636341

    • othrace | -.2425083 .0480047 -5.05 0.000 -.3365958 -.1484208

    • smoke | -.2086096 .0219779 -9.49 0.000 -.2516855 -.1655337

    • smoke5 | -.1529619 .0357995 -4.27 0.000 -.2231277 -.0827961

    • -------------+----------------------------------------------------------------

    • _cut1 | .4858634 .113179 (Ancillary parameters)

    • _cut2 | 1.269036 .11282

    • _cut3 | 2.247251 .1138171

    • _cut4 | 3.094606 .1145781

    • ------------------------------------------------------------------------------


    Interpret coefficients
    Interpret coefficients obs = 12900

    • Marginal effects/changes in probabilities are now a function of 2 things

      • Point of expansion (x’s)

      • Frame of reference for outcome (y)

    • STATA

      • Picks mean values for x’s

      • You pick the value of y


    Continuous x s
    Continuous x’s obs = 12900

    • Consider y=5

    • d Pr(yi=5)/dxi

      = d Φ[xi β – u4]/dxi = βφ[xi β – u4]

    • Consider y=3

    • d Pr(yi=3)/dxi = βφ[xi β – u3] - βφ[xi β – u4]


    Discrete x s
    Discrete X’s obs = 12900

    • xiβ = β0 + x1i β1 + x2i β2 …. xki βk

      • X2i is yes or no (1 or 0)

    • ΔPr(yi=5) =

    • Φ[β0 + x1i β1 + β2 + x3i β3 +.. xki βk]

      - Φ[β0 + x1i β1 + x3i β3 …. xki βk]

    • Change in the probabilities when x2i=1 and x2i=0


    Ask for marginal effects
    Ask for marginal effects obs = 12900

    • mfx compute, predict(outcome(5));


    • mfx compute, predict(outcome(5)); obs = 12900

    • Marginal effects after oprobit

    • y = Pr(sr_health==5) (predict, outcome(5))

    • = .34103717

    • ------------------------------------------------------------------------------

    • variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X

    • ---------+--------------------------------------------------------------------

    • male*| .0471251 .00722 6.53 0.000 .03298 .06127 .438062

    • age | -.0074214 .00031 -23.77 0.000 -.008033 -.00681 39.8412

    • educ | .0303405 .00142 21.42 0.000 .027565 .033116 13.2402

    • famincl | .0880025 .00412 21.37 0.000 .07993 .096075 10.2131

    • black*| -.0781411 .00996 -7.84 0.000 -.097665 -.058617 .124264

    • othrace*| -.0843227 .01567 -5.38 0.000 -.115043 -.053602 .04124

    • smoke*| -.0749785 .00773 -9.71 0.000 -.09012 -.059837 .289147

    • smoke5*| -.0545062 .01235 -4.41 0.000 -.078719 -.030294 .081395

    • ------------------------------------------------------------------------------

    • (*) dy/dx is for discrete change of dummy variable from 0 to 1


    Interpret the results
    Interpret the results obs = 12900

    • Males are 4.7 percentage points more likely to report excellent

    • Each year of age decreases chance of reporting excellent by 0.7 percentage points

    • Current smokers are 7.5 percentage points less likely to report excellent health


    Minor notes about estimation
    Minor notes about estimation obs = 12900

    • Wald tests/-2 log likelihood tests are done the exact same was as in PROBIT and LOGIT

    • Tests of individual parameters are done the same way (z-score)




    • age person

    • Avg|Chg| 1 2 3 4

    • Min->Max .13358317 .0184785 .06797072 .17686112 .07064757

    • -+1/2 .00321942 .00032518 .00141642 .00424452 .00206241

    • -+sd/2 .03728014 .00382077 .01648743 .04910323 .0237889

    • MargEfct .00321947 .00032515 .00141639 .00424462 .00206252


    Section vii

    Section VII person

    Count Data Models


    Introduction1
    Introduction person

    • Many outcomes of interest are integer counts

      • Doctor visits

      • Low work days

      • Cigarettes smoked per day

      • Missed school days

    • OLS models can easily handle some integer models


    • Example person

      • SAT scores are essentially integer values

      • Few at ‘tails’

      • Distribution is fairly continuous

      • OLS models well

    • In contrast, suppose

      • High fraction of zeros

      • Small positive values


    • OLS models will person

      • Predict negative values

      • Do a poor job of predicting the mass of observations at zero

    • Example

      • Dr visits in past year, Medicare patients(65+)

      • 1987 National Medical Expenditure Survey

      • Top code (for now) at 10

      • 17% have no visits


    • visits | Freq. Percent Cum. person

    • ------------+-----------------------------------

    • 0 | 915 17.18 17.18

    • 1 | 601 11.28 28.46

    • 2 | 533 10.01 38.46

    • 3 | 503 9.44 47.91

    • 4 | 450 8.45 56.35

    • 5 | 391 7.34 63.69

    • 6 | 319 5.99 69.68

    • 7 | 258 4.84 74.53

    • 8 | 216 4.05 78.58

    • 9 | 192 3.60 82.19

    • 10 | 949 17.81 100.00

    • ------------+-----------------------------------

    • Total | 5,327 100.00


    Poisson model
    Poisson Model person

    • yi is drawn from a Poisson distribution

    • Poisson parameter varies across observations

    • f(yi;λi) =e-λi λiyi/yi! For λi>0

    • E[yi]= Var[yi] = λi = f(xi, β)


    • λ personi must be positive at all times

    • Therefore, we CANNOT let λi = xiβ

    • Let λi = exp(xiβ)

    • ln(λi) = (xiβ)


    • d ln( personλi)/dxi = β

    • Remember that d ln(λi) = dλi/λi

    • Interpret β as the percentage change in mean outcomes for a change in x


    Problems with poisson
    Problems with Poisson person

    • Variance grows with the mean

      • E[yi]= Var[yi] = λi = f(xi, β)

    • Most data sets have over dispersion, where the variance grows faster than the mean

    • In dr. visits sample,  = 5.6, s=6.7

    • Impose Mean=Var, severe restriction and you tend to reduce standard errors


    Negative binomial model
    Negative Binomial Model person

    • Where γi = exp(xiβ) and δ ≥ 0

    • E[yi] = δγi = δexp(xiβ)

    • Var[yi] = δ (1+δ) γi

    • Var[yi]/ E[yi] = (1+δ)


    • δ person must always be ≥ 0

    • In this case, the variance grows faster than the mean

    • If δ=0, the model collapses into the Poisson

    • Always estimate negative binomial

    • If you cannot reject the null that δ=0, report the Poisson estimates


    • Notice that ln(E[y personi]) = ln(δ) + ln(γi), so

    • d ln(E[yi]) /dxi = β

    • Parameters have the same interpretation as in the Poisson model


    In stata2
    In STATA person

    • POISSON estimates a MLE model for poisson

      • Syntax

        POISSON y independent variables

    • NBREG estimates MLE negative binomial

      • Syntax

        NBREG y independent variables


    Interpret results for poisson
    Interpret results for Poisson person

    • Those with CHRONIC condition have 50% more mean MD visits

    • Those in EXCELent health have 78% fewer MD visits

    • BLACKS have 33% fewer visits than whites

    • Income elasticity is 0.021, 10% increase in income generates a 2.1% increase in visits


    Negative binomial
    Negative Binomial person

    • Interpret results the same was as Poisson

    • Look at coefficient/standard error on delta

    • Ho: delta = 0 (Poisson model is correct)

    • In this case, delta = 5.21 standard error is 0.15, easily reject null.

    • Var/Mean = 1+delta = 6.21, Poisson is mis-specificed, should see very small standard errors in the wrong model


    Selected results count models parameter standard error
    Selected Results, Count Models personParameter (Standard Error)


    Section viii

    Section VIII person

    Duration Data


    Introduction2
    Introduction person

    • Sometimes we have data on length of time of a particular event or ‘spells’

      • Time until death

      • Time on unemployment

      • Time to complete a PhD

    • Techniques we will discuss were originally used to examine lifespan of objects like light bulbs or machines. These models are often referred to as “time to failure”


    Notation2
    Notation person

    • T is a random variable that indicates duration (time til death, find a new job, etc)

    • t is the realization of that variable

    • f(t) is a PDF that describes the process that determines the time to failure

    • CDF is F(t) represents the probability an event will happen by time t




    • Hazard function, h(t) ‘t’.

    • What is the probability the spell will end at time t, given that it has already lasted t

    • What is the chance you find a new job in month 12 given that you’ve been unemployed for 12 months already




    • Mathematically lower the chance they will exit unemployment – ‘damaged goods’

      • d λ(t) /dt = 0 then there is no duration dep.

      • d λ(t) /dt > 0 there is + duration dependence

        the probability the spell will end

        increases with time

      • d λ(t) /dt < 0 there is – duration dependence

        the probability the spell will end

        decreases over time



    Different functional forms
    Different Functional Forms duration dependence

    • Exponential

      • λ(t)= λ

      • Hazard is the same over time, a ‘memory less’ process

    • Weibull

      • F(t) = 1 – exp(-γtρ) where ρ,γ > 0

      • λ(t) = ργρ-1

      • if ρ>1, increasing hazard

      • if ρ<1, decreasing hazard

      • if ρ=1, exponential



    A note about most data sets
    A note about most data sets duration dependence

    • Most data sets have ‘censored’ spells

      • Follow people over time

      • All will eventually die, but some do not in your period of analysis

      • Incomplete spells or censored data

    • Must build into the log likelihood function


    • Let t duration dependencei be the duration we observe for all people

    • Some people die, and their they lived until period ti

    • Others are observed for ti periods, but do not

    • Let di=1 if data is complete spell

    • di=1 if incomplete



    • If d si=1 then we observe f(ti), someone who died in period ti

    • If di=0 then someone lived past period ti and the probability of that is [1-F(ti)]

    • L = Σi {di ln[f(ti)] + (1-di)ln[1-F(ti)]}


    Introducing covariates
    Introducing covariates s

    • Look at exponential

    • λ(t)= λ

    • Allow this to vary across people

    • λi(t)= λi

    • But like Poisson, λi is always positive

    • Let λi = exp(β0 + x1i β1 + x2i β2 …. xki βk)


    • In the Weibull s

    • λ(t) = αγtα-1

    • Allow it to vary across people

    • λi(t) = αγi tα-1

    • γi = exp(β0 + x1i β1 + x2i β2 …. xki βk)


    Interpreting coefficients
    Interpreting Coefficients s

    • This is the same for both Weibull and Exponential

    • In Weibull, λ(ti ) = αγitα-1

    • Suppose x1i is a dummy variable

    • When xi1=1, then

    • γi1 = eβ0 + β1 + x2i β2 …. xki βk

    • When xi1=0, then

    • γi0 = eβ0 + x2i β2 …. xki βk


    • When you construct the ratio of sγi1/ γi0, all the others parameters cancel, so

    • (αγi1 tα-1 – αγi0 tα-1 ) / αγi0 tα-1 = eβ1 -1

    • Percentage change in the hazard when x1i turns from 0 to 1.

    • STATA prints out eβ1, just subtract 1


    Suppose x 2i is continuous
    Suppose x s2i is continuous

    • Suppose we increase x2i by 1 unit

    • γi1 = exp(β0 + β1x1i + x2iβ2 …. xkiβk)

    • γi2 = exp(β0 + β1 (x1i+1) + x2iβ2 …. xkiβk)

    • Can show that

    • (αγi1 tα-1 – αγi0 tα-1 ) / αγi0 tα-1 = eβ1 -1

    • = exp(β2) – 1

    • Percentage change in the hazard for 1 unit increase in x


    Nhis multiple cause of death
    NHIS Multiple Cause of Death s

    • NHIS

      • annual survey of 60K households

      • Data on individuals

      • Self-reported healthm DR visits, lost workdays, etc.

    • MCOD

      • Linked NHIS respondents from 1986-1994 to National Death Index through Dec 31, 1995

      • Identified whether respondent died and of what cause


    • Our sample s

      • Males, 50-70, who were married at the time of the survey

      • 1987-1989 surveys

      • Give everyone 5 years (60 months) of followup


    Key variables
    Key Variables s

    • max_mths maximum months in the survey.

    • Diedin5 respondent died during the 5 years of followup

    • Note if diedn5=0, the max_mths=60. Diedin5 identifies whether the data is censored or not.


    • Variable | Obs Mean Std. Dev. Min Max

    • -------------+--------------------------------------------------------

    • age_s_yrs | 26654 59.42586 5.962435 50 70

    • max_mths | 26654 56.49077 11.15384 0 60

    • black | 26654 .0928566 .2902368 0 1

    • hispanic | 26654 .0454716 .20834 0 1

    • income | 26654 3.592181 1.327325 1 5

    • -------------+--------------------------------------------------------

    • educ | 26654 2.766677 .961846 1 4

    • diedin5 | 26654 .1226082 .3279931 0 1


    Duration data in stata
    Duration Data in STATA Max

    • Need to identify which is the duration data

      stset length, failure(failvar)

      • Length=duration variable

      • Failvar=1 when durations end in failure, =0 for censored values

  • If all data is uncensored, omit failure(failvar)



  • Getting kaplan meier curves
    Getting Kaplan-Meier Curves Max

    • Tabular presentation of results

      sts list

    • Graphical presentation

      sts graph

    • Results by subgroup

      sts graph, by(educ)


    Mle of duration model with covariates
    MLE of duration model with Covariates Max

    • Basic syntax

    • streg covariates,d(distribution)

    • streg age_s_yrs black hispanic _Ie* _Ii*, d(weibull);

    • In this model, STATA will print out exp(β)

    • If you want the coefficients, add ‘nohr’ option (no hazard ratio)


    Weibull coefficients
    Weibull coefficients Max

    • No. of subjects = 26631 Number of obs = 26631

    • No. of failures = 3245

    • Time at risk = 1505705

    • LR chi2(10) = 595.74

    • Log likelihood = -12425.055 Prob > chi2 = 0.0000

    • ------------------------------------------------------------------------------

    • _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]

    • -------------+----------------------------------------------------------------

    • age_s_yrs | .0452588 .0031592 14.33 0.000 .0390669 .0514508

    • black | .4770152 .0511122 9.33 0.000 .3768371 .5771932

    • hispanic | .1333552 .082156 1.62 0.105 -.0276676 .294378

    • _Ieduc_2 | .0093353 .0591918 0.16 0.875 -.1066786 .1253492

    • _Ieduc_3 | -.072163 .0503131 -1.43 0.151 -.1707748 .0264488

    • _Ieduc_4 | -.1301173 .0657131 -1.98 0.048 -.2589126 -.0013221

    • _Iincome_2 | -.1867752 .0650604 -2.87 0.004 -.3142914 -.0592591

    • _Iincome_3 | -.3268927 .0688635 -4.75 0.000 -.4618627 -.1919227

    • _Iincome_4 | -.5166137 .0769202 -6.72 0.000 -.6673747 -.3658528

    • _Iincome_5 | -.5425447 .0722025 -7.51 0.000 -.684059 -.4010303

    • _cons | -9.201724 .2266475 -40.60 0.000 -9.645945 -8.757503

    • -------------+----------------------------------------------------------------

    • /ln_p | .1585315 .0172241 9.20 0.000 .1247729 .1922901

    • -------------+----------------------------------------------------------------

    • p | 1.171789 .020183 1.132891 1.212022

    • 1/p | .8533961 .014699 .8250675 .8826974

    • ------------------------------------------------------------------------------


    • The sign of the parameters is informative Max

      • Hazard increasing in age

      • Blacks, hispanics have higher mortality rates

      • Hazard decreases with income and age

    • The parameter ρ= 1.17.

      • Check 95% confidence interval (1.13, 1.21). Can reject null p=1 (exponential)

      • Hazard is increasing over time


    Hazard ratios
    Hazard ratios Max

    • No. of subjects = 26631 Number of obs = 26631

    • No. of failures = 3245

    • Time at risk = 1505705

    • LR chi2(10) = 595.74

    • Log likelihood = -12425.055 Prob > chi2 = 0.0000

    • ------------------------------------------------------------------------------

    • _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]

    • -------------+----------------------------------------------------------------

    • age_s_yrs | 1.046299 .0033055 14.33 0.000 1.03984 1.052797

    • black | 1.611258 .082355 9.33 0.000 1.457667 1.781032

    • hispanic | 1.142656 .093876 1.62 0.105 .9727116 1.342291

    • _Ieduc_2 | 1.009379 .059747 0.16 0.875 .8988145 1.133544

    • _Ieduc_3 | .9303792 .0468103 -1.43 0.151 .8430114 1.026802

    • _Ieduc_4 | .8779924 .0576956 -1.98 0.048 .7718905 .9986788

    • _Iincome_2 | .8296302 .0539761 -2.87 0.004 .7303062 .9424625

    • _Iincome_3 | .7211611 .0496617 -4.75 0.000 .6301089 .8253706

    • _Iincome_4 | .5965372 .0458858 -6.72 0.000 .5130537 .6936049

    • _Iincome_5 | .5812672 .041969 -7.51 0.000 .5045648 .6696297

    • -------------+----------------------------------------------------------------

    • /ln_p | .1585315 .0172241 9.20 0.000 .1247729 .1922901

    • -------------+----------------------------------------------------------------

    • p | 1.171789 .020183 1.132891 1.212022

    • 1/p | .8533961 .014699 .8250675 .8826974

    • ------------------------------------------------------------------------------


    Interpret coefficients1
    Interpret coefficients Max

    • Age: every year hazard increases by 4.6%

    • Black: have 61% greater hazard than whites

    • Hispanics: 14% greater hazard than non-hispanics

    • Educ 2, 3, 4 are some 9-11, 12-15 and 16+ years of school



    • Income 2-5 are dummies for people with $10-$20K, $20-$30K, $30-$40K, >$40K

    • Income 2: Those with $10-$20K have 0.83 – 1 = -0.17 or a 17% lower hazard than those with income <$10K

    • Income 5, those with >$40K in income have a 0.58 – 1 = -0.42 or a 42% lower hazard than those with income <$10K


    Topics not covered
    Topics not covered $30-$40K, >$40K

    • Time varying covariates

    • Competing risk models

      • Die from multiple causes

    • Cox proportional hazard model

      • Heterogeneity in baseline hazard


    ad