discrete and categorical data
Download
Skip this Video
Download Presentation
Discrete and Categorical Data

Loading in 2 Seconds...

play fullscreen
1 / 253

Discrete and Categorical Data - PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on

Discrete and Categorical Data. William N. Evans Department of Economics/MPRC University of Maryland. Part I. Introduction. Introduction. Workhorse statistical model in social sciences is the multivariate regression model Ordinary least squares (OLS)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Discrete and Categorical Data' - dyami


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
discrete and categorical data

Discrete and Categorical Data

William N. Evans

Department of Economics/MPRC

University of Maryland

part i

Part I

Introduction

introduction
Introduction
  • Workhorse statistical model in social sciences is the multivariate regression model
  • Ordinary least squares (OLS)
      • yi = β0 + x1iβ1+ x2iβ2+… xkiβk+ εi
      • yi = xi β + εi
linear model y i x i i
Linear model yi =  + xi + i
  •  and  are “population” values – represent the true relationship between x and y
  • Unfortunately – these values are unknown
  • The job of the researcher is to estimate these values
  • Notice that if we differentiate y with respect to x, we obtain
  • dy/dx = 
slide5
 represents how much y will change for a fixed change in x
    • Increase in income for more education
    • Change in crime or bankruptcy when slots are legalized
    • Increase in test score if you study more
put some concreteness on the problem
Put some concretenesson the problem
  • State of Maryland budget problems
    • Drop in revenues
    • Expensive k-12 school spending initiatives
  • Short-term solution – raise tax on cigarettes by 34 cents/pack
  • Problem – a tax hike will reduce consumption of taxable product
  • Question for state – as taxes are raised, how much will cigarette consumption fall?
slide7
Simple model: yi =  + xi + i
  • Suppose y is a state’s per capita consumption of cigarettes
  • x represents taxes on cigarettes
  • Question – how much will y fall if x is increased by 34 cents/pack?
  • Problem – many reasons why people smoke – cost is but one of them –
slide8
Data
    • (Y) State per capita cigarette consumption for the years 1980-1997
    • (X) tax (State + Federal) in real cents per pack
    • “Scatter plot” of the data
    • Negative covariance between variables
      • When x>, more likely that y<
      • When x<, more likely that y>
  • Goal: pick values of  and  that “best fit” the data
    • Define best fit in a moment
notation
Notation
  • True model
      • yi =  + xi + i
      • We observe data points (yi,xi)
      • The parameters  and  are unknown
      • The actual error (i)is unknown
  • Estimated model
      • (a,b) are estimates for the parameters (,)
      • ei is an estimate of i where
      • ei=yi-a-bxi
  • How do you estimate a and b?
objective minimize sum of squared errors
Objective: Minimize sum of squared errors
  • Min iei2 = i(yi – a – bxi)2
  • Minimize sum of squared errors (SSE)
  • Treat (+) and (-) errors equally
    • Over or under predict by “5” is the same magnitude of error
    • “Quadratic form”
    • The optimal value for a and b are those that make the 1st derivative equal zero
    • Functions reach min or max values when derivatives are zero
slide13
The model has a lot of nice features
    • Statistical properties easy to establish
    • Optimal estimates easy to obtain
    • Parameter estimates are easy to interpret
    • Model maximizes prediction
      • If you minimize SSE you maximize R2
  • The model does well as a first order approximation to lots of problems
discrete and qualitative data
Discrete and Qualitative Data
  • The OLS model work well when y is a continuous variable
    • Income, wages, test scores, weight, GDP
  • Does not has as many nice properties when y is not continuous
  • Example: doctor visits
      • Integer values
      • Low counts for most people
      • Mass of observations at zero
downside of forcing non standard outcomes into ols world
Downside of forcing non-standard outcomes into OLS world?
  • Can predict outside the allowable range
    • e.g., negative MD visits
  • Does not describe the data generating process well
    • e.g., mass of observations at zero
  • Violates many properties of OLS
    • e.g. heteroskedasticity
this talk
This talk
  • Look at situations when the data generating process does not lend itself well to OLS models
  • Mathematically describe the data generating process
  • Show how we use different optimization procedure to obtain estimates
  • Describe the statistical properties
slide17
Show how to interpret parameters
  • Illustrate how to estimate the models with popular program STATA
types of data generating processes we will consider
Types of data generating processes we will consider
  • Dichotomous events (yes or no)
    • 1=yes, 0=no
    • Graduate high school? work? Are obese? Smoke?
  • Ordinal data
    • Self reported health (fair, poor, good, excel)
    • Strongly disagree, disagree, agree, strongly agree
slide19
Count data
    • Doctor visits, lost workdays, fatality counts
  • Duration data
    • Time to failure, time to death, time to re-employment
recommended textbooks
Recommended Textbooks
  • Jeffrey Wooldridge, “Econometric analysis of cross sectional and panel data”
    • Lots of insight and mathematical/statistical detail
    • Very good examples
  • William Greene, “Econometric Analysis”
    • more topics
    • Somewhat dated examples
course web page
Course web page
  • www.bsos.umd.edu/econ/evans/jpsm.html
  • Contains
    • These notes
    • All STATA programs and data sets
    • A couple of “Introduction to STATA” handouts
    • Links to some useful web sites
stata resources discrete outcomes
STATA Resources Discrete Outcomes
  • “Regression Models for Categorical Dependent Variables Using STATA”
    • J. Scott Long and Jeremy Freese
  • Available for sale from STATA website for $52 (www.stata.com)
  • Post-estimation subroutines that translate results
    • Do not need to buy the book to use the subroutines
slide23
In STATA command line type
      • net search spost
  • Will give you a list of available programs to download
  • One is

Spostado from http://www.indiana.edu/~jslsoc/stata

  • Click on the link and install the files
part ii

Part II

A brief introduction to STATA

stata
STATA
  • Very fast, convenient, well-documented, cheap and flexible statistical package
  • Excellent for cross-section/panel data projects, not as great for time series but getting better
  • Not as easy to manipulate large data sets from flat files as SAS
  • I usually clean data in SAS, estimate models in STATA
slide26
Key characteristic of STATA
    • All data must be loaded into RAM
    • Computations are very fast
    • But, size of the project is limited by available memory
  • Results can be generated two different ways
    • Command line
    • Write a program, (*.do) then submit from the command line
sample program to get you started
Sample program to get you started
  • cps87_or.do
  • Program gets you to the point where can
      • Load data into memory
      • Construct new variables
      • Get simple statistics
      • Run a basic regression
      • Store the results on a disk
data cps87 do dta
Data (cps87_do.dta)
  • Random sample of data from 1987 Current Population Survey outgoing rotation group
  • Sample selection
    • Males
    • 21-64
    • Working 30+hours/week
  • 19,906 observations
major caveat
Major caveat
  • Hardest thing to learn/do: get data from some other source and get it into STATA data set
  • We skip over that part
  • All the data sets are loaded into a STATA data file that can be called by saying:

use data file name

housekeeping at the top of the program
Housekeeping at the top of the program
  • * this line defines the semicolon as the ;
  • * end of line delimiter;
  • # delimit ;
  • * set memork for 10 meg;
  • set memory 10m;
  • * write results to a log file;
  • * the replace options writes over old;
  • * log files;
  • log using cps87_or.log,replace;
  • * open stata data set;
  • use c:\bill\stata\cps87_or;
  • * list variables and labels in data set;
  • desc;
slide31
------------------------------------------------------------------------------------------------------------------------------------------------------------
  • > -
  • storage display value
  • variable name type format label variable label
  • ------------------------------------------------------------------------------
  • > -
  • age float %9.0g age in years
  • race float %9.0g 1=white, non-hisp, 2=place,
  • n.h, 3=hisp
  • educ float %9.0g years of education
  • unionm float %9.0g 1=union member, 2=otherwise
  • smsa float %9.0g 1=live in 19 largest smsa,
  • 2=other smsa, 3=non smsa
  • region float %9.0g 1=east, 2=midwest, 3=south,
  • 4=west
  • earnwke float %9.0g usual weekly earnings
  • ------------------------------------------------------------------------------
constructing new variables
Constructing new variables
  • Use ‘gen’ command for generate new variables
  • Syntax
    • gen new variable name=math statement
  • Easily construct new variables via
    • Algebraic operations
    • Math/trig functions (ln, exp, etc.)
    • Logical operators (when true, =1, when false, =0)
from program
From program
  • * generate new variables;
  • * lines 1-2 illustrate basic math functoins;
  • * lines 3-4 line illustrate logical operators;
  • * line 5 illustrate the OR statement;
  • * line 6 illustrates the AND statement;
  • * after you construct new variables, compress the data again;
  • gen age2=age*age;
  • gen earnwkl=ln(earnwke);
  • gen union=unionm==1;
  • gen topcode=earnwke==999;
  • gen nonwhite=((race==2)|(race==3));
  • gen big_ne=((region==1)&(smsa==1));
getting basic statistics
Getting basic statistics
  • desc -- describes variables in the data set
  • sum – gets summary statistics
  • tab – produces frequencies (tables) of discrete variables
from program1
From program
  • * get descriptive statistics;
  • sum;
  • * get detailed descriptics for continuous variables;
  • sum earnwke, detail;
  • * get frequencies of discrete variables;
  • tabulate unionm;
  • tabulate race;
  • * get two-way table of frequencies;
  • tabulate region smsa, row column cell;
results from sum
Results from sum
  • Variable | Obs Mean Std. Dev. Min Max
  • -------------+--------------------------------------------------------
  • age | 19906 37.96619 11.15348 21 64
  • race | 19906 1.199136 .525493 1 3
  • educ | 19906 13.16126 2.795234 0 18
  • unionm | 19906 1.769065 .4214418 1 2
  • smsa | 19906 1.908369 .7955814 1 3
  • -------------+--------------------------------------------------------
detailed summary
Detailed summary
  • usual weekly earnings
  • -------------------------------------------------------------
  • Percentiles Smallest
  • 1% 128 60
  • 5% 178 60
  • 10% 210 60 Obs 19906
  • 25% 300 63 Sum of Wgt. 19906
  • 50% 449 Mean 488.264
  • Largest Std. Dev. 236.4713
  • 75% 615 999
  • 90% 865 999 Variance 55918.7
  • 95% 999 999 Skewness .668646
  • 99% 999 999 Kurtosis 2.632356
results for tab
Results for tab
  • 1=union |
  • member, |
  • 2=otherwise | Freq. Percent Cum.
  • ------------+-----------------------------------
  • 1 | 4,597 23.09 23.09
  • 2 | 15,309 76.91 100.00
  • ------------+-----------------------------------
  • Total | 19,906 100.00
2x2 table
2x2 Table
  • 1=east, |
  • 2=midwest, | 1=live in 19 largest smsa,
  • 3=south, | 2=other smsa, 3=non smsa
  • 4=west | 1 2 3 | Total
  • -----------+---------------------------------+----------
  • 1 | 2,806 1,349 842 | 4,997
  • | 56.15 27.00 16.85 | 100.00
  • | 38.46 18.89 15.39 | 25.10
  • | 14.10 6.78 4.23 | 25.10
  • -----------+---------------------------------+----------
  • 2 | 1,501 1,742 1,592 | 4,835
  • | 31.04 36.03 32.93 | 100.00
  • | 20.58 24.40 29.10 | 24.29
  • | 7.54 8.75 8.00 | 24.29
  • -----------+---------------------------------+----------
  • 3 | 1,501 2,542 1,904 | 5,947
  • | 25.24 42.74 32.02 | 100.00
  • | 20.58 35.60 34.80 | 29.88
  • | 7.54 12.77 9.56 | 29.88
  • -----------+---------------------------------+----------
  • 4 | 1,487 1,507 1,133 | 4,127
  • | 36.03 36.52 27.45 | 100.00
  • | 20.38 21.11 20.71 | 20.73
  • | 7.47 7.57 5.69 | 20.73
  • -----------+---------------------------------+----------
  • Total | 7,295 7,140 5,471 | 19,906
  • | 36.65 35.87 27.48 | 100.00
  • | 100.00 100.00 100.00 | 100.00
  • | 36.65 35.87 27.48 | 100.00
running a regression
Running a regression
  • Syntax

reg dependent-variable independent-variables

  • Example from program

*run simple regression;

reg earnwkl age age2 educ nonwhite union;

slide41
Source | SS df MS Number of obs = 19906
  • -------------+------------------------------ F( 5, 19900) = 1775.70
  • Model | 1616.39963 5 323.279927 Prob > F = 0.0000
  • Residual | 3622.93905 19900 .182057239 R-squared = 0.3085
  • -------------+------------------------------ Adj R-squared = 0.3083
  • Total | 5239.33869 19905 .263217216 Root MSE = .42668
  • ------------------------------------------------------------------------------
  • earnwkl | Coef. Std. Err. t P>|t| [95% Conf. Interval]
  • -------------+----------------------------------------------------------------
  • age | .0679808 .0020033 33.93 0.000 .0640542 .0719075
  • age2 | -.0006778 .0000245 -27.69 0.000 -.0007258 -.0006299
  • educ | .069219 .0011256 61.50 0.000 .0670127 .0714252
  • nonwhite | -.1716133 .0089118 -19.26 0.000 -.1890812 -.1541453
  • union | .1301547 .0072923 17.85 0.000 .1158613 .1444481
  • _cons | 3.630805 .0394126 92.12 0.000 3.553553 3.708057
  • ------------------------------------------------------------------------------
analysis of variance
Analysis of variance
  • R2 = .3085
    • Variables explain 31% of the variation in log weekly earnings
  • F(5,19900)
    • Tests the hypothesis that all covariates (except constant) are jointly zero
interpret results
Interpret results
  • Y = β0 + β1Xi + εi
  • dY/dX = β1
  • But in this case Y=ln(W) where W weekly wages
  • dln(W)/dX = (dW/W)/dX = β1
    • Percentage change in wages given a change in x
slide44
For each additional year of education, wages increase by 6.9%
  • Non whites earn 17.2% less than whites
  • Union members earn 13% more than nonunion members
part iii

Part III

Some notes about probability distributions

continuous distributions
Continuous Distributions
  • Random variables with infinite number of possible values
  • Examples -- units of measure (time, weight, distance)
  • Many discrete outcomes can be treated as continuous, e.g., SAT scores
how to describe a continuous random variable
How to describe a continuous random variable
  • The Probability Density Function (PDF)
  • The PDF for a random variable x is defined as f(x), where

f(x) $ 0

If(x)dx = 1

  • Calculus review: The integral of a function gives the “area under the curve”
cumulative distribution function cdf
Cumulative Distribution Function (CDF)
  • Suppose x is a “measure” like distance or time
  • 0 # x # 4
  • We may be interested in the Pr(x#a) ?
slide50
CDF

What if we consider all values?

properties of cdf
Properties of CDF
  • Note that Pr(x # b) + Pr(x>b) =1
  • Pr(x>b) = 1 – Pr(x # b)
  • Many times, it is easier to work with compliments
general notation for continuous distributions
General notation for continuous distributions
  • The PDF is described by lower case such as f(x)
  • The CDF is defined as upper case such as F(a)
standard normal distribution
Standard Normal Distribution
  • Most frequently used continuous distribution
  • Symmetric “bell-shaped” distribution
  • As we will show, the normal has useful properties
  • Many variables we observe in the real world look normally distributed.
  • Can translate normal into ‘standard normal’
examples of variables that look normally distributed
Examples of variables that look normally distributed
  • IQ scores
  • SAT scores
  • Heights of females
  • Log income
  • Average gestation (weeks of pregnancy)
  • As we will show in a few weeks – sample means are normally distributed!!!
standard normal distribution1
Standard Normal Distribution
  • PDF:
  • For -# z #
notation1
Notation
  • (z) is the standard normal PDF evaluated at z
  • [a] = Pr(z  a)
standard normal
Standard Normal
  • Notice that:
    • Normal is symmetric: (a) = (-a)
    • Normal is “unimodal”
    • Median=mean
    • Area under curve=1
    • Almost all area is between (-3,3)
  • Evaluations of the CDF are done with
    • Statistical functions (excel, SAS, etc)
    • Tables
standard normal cdf
Standard Normal CDF
  • Pr(z  -0.98) = [-0.98] = 0.1635
slide65
Pr(0.1  z  1.9)

= Pr(z  1.9) – Pr(z  0.1)

= M(1.9) - M(0.1) = 0.9713 - 0.5398

= 0.4315

important properties of normal distribution
Important Properties of Normal Distribution
  • Pr(z  A) = [A]
  • Pr(z > A) = 1 - [A]
  • Pr(z  - A) = [-A]
  • Pr(z > -A) = 1 - [-A] = [A]
section iv

Section IV

Maximum likelihood estimation

maximum likelihood estimation
Maximum likelihood estimation
  • Observe n independent outcomes, all drawn from the same distribution
  • (y1, y2, y3….yn)
  • yi is drawn from f(yi; θ) where θ is an unknown parameter for the PDF f
  • Recall definition of indepedence. If a and b and independent, Prob(a and b) = Pr(a)Pr(B)
slide72
Because all the draws are independent, the probability these particular n values of Y would be drawn at random is called the ‘likelihood function’ and it equals
  • L = Pr(y1)Pr(y2)…Pr(yn)
  • L = f(y1; θ)f(y2; θ)…..f(y3; θ)
slide73
MLE: pick a value for θ that best represents the chance these n values of y would have been generated randomly
  • To maximize L, maximize a monotonic function of L
  • Recall ln(abcd)=ln(a)+ln(b)+ln(c)+ln(d)
slide74
Max L = ln(L) = ln[f(y1; θ)] +ln[f(y2; θ)] +

….. ln[f(yn; θ) = Σi ln[f(yi; θ)]

  • Pick θ so that L is maximized
  • dL/dθ = 0
slide75

L

θ

θ1

θ2

example poisson
Example: Poisson
  • Suppose y measures ‘counts’ such as doctor visits.
  • yi is drawn from a Poisson distribution
  • f(yi;λ) =e-λλyi/yi! For λ>0
  • E[yi]= Var[yi] = λ
slide77
Given n observations, (y1, y2, y3….yn)
  • Pick value of λ that maximizes L
  • Max L =Σi ln[f(yi; θ)] = Σi ln[e-λλyi/yi!]

= Σi [– λ + yiln(λ) – ln(yi!)]

= -n λ + ln(λ) Σi yi – Σi ln(yi!)

slide78
L = -n λ + ln(λ) Σi yi – Σi ln(yi!)
  • dL/dθ = -n + (1/ λ )Σi yi = 0
  • Solve for λ
  • λ = Σi yi /n =  = sample mean of y
slide79
In most cases however, cannot find a ‘closed form’ solution for the parameter in ln[f(yi; θ)]
  • Must ‘search’ over all possible solutions
  • How does the search work?
  • Start with candidate value of θ.
  • Calculate dL/dθ
slide80
If dL/dθ > 0, increasing θ will increase L so we increase θ some
  • If dL/dθ < 0, decreasing θ will increase L so we decrease θ some
  • Keep changing θ until dL/dθ = 0
  • How far you ‘step’ when you change θ is determined by a number of different factors
slide81

L

dL/dθ > 0

θ1

θ

slide82

L

dL/dθ < 0

θ3

θ

properties of mle estimates
Properties of MLE estimates
  • Sometimes call efficient estimation. Can never generate a smaller variance than one obtained by MLE
  • Parameters estimates are distributed as a normal distribution when samples sizes are large
  • Therefore, if we divide the parameter by its standard error, should be normally distributed with a mean zero and variance 1 if the null (=0) is correct
section 5

Section 5

Dichotomous outcomes

dichotomous data
Dichotomous Data
  • Suppose data is discrete but there are only 2 outcomes
  • Examples
    • Graduate high school or not
    • Patient dies or not
    • Working or not
    • Smoker or not
  • In data, yi=1 if yes, yi =0 if no
how to model the data generating process
How to model the data generating process?
  • There are only two outcomes
  • Research question: What factors impact whether the event occurs?
  • To answer, will model the probability the outcome occurs
      • Pr(Yi=1) when yi=1 or
      • Pr(Yi=0) = 1- Pr(Yi=1) when yi=0
slide87
Think of the problem from a MLE perspective
  • Likelihood for i’th observation
  • Li= Pr(Yi=1)Yi [1 - Pr(Yi=1)](1-Yi)
      • When yi=1, only relevant part is Pr(Yi=1)
      • When yi=0, only relevant part is [1 - Pr(Yi=1)]
slide88
L = Σi ln[Li] =

= Σi {yi ln[Pr(yi=1)] + (1-yi)ln[Pr(yi=0)] }

  • Notice that up to this point, the model is generic. The log likelihood function will determined by the assumptions concerning how we determine Pr(yi=1)
modeling the probability
Modeling the probability
  • There is some process (biological, social, decision theoretic, etc) that determines the outcome y
  • Some of the variables impacting are observed, some are not
  • Requires that we model how these factors impact the probabilities
  • Model from a ‘latent variable’ perspective
slide90
Consider a women’s decision to work
  • yi* = the person’s net benefit to work
  • Two components of yi*
    • Characteristics that we can measure
      • Education, age, income of spouse, prices of child care
    • Some we cannot measure
      • How much you like spending time with your kids
      • how much you like/hate your job
slide91
We aggregate these two components into one equation
  • yi* = β0 + x1iβ1+ x2iβ2+… xkiβk+ εi

= xi β + εi

      • xi β (measurable characteristics but with uncertain weights)
      • εi random unmeasured characteristics
  • Decision rule: person will work if yi* > 0

(if net benefits are positive)

yi=1 if yi*>0

yi=0 if yi*≤0

slide92
yi=1 if yi*>0
      • yi* = xi β + εi > 0 only if
      • εi > - xi β
  • yi=0 if yi*≤0
      • yi* = xi β + εi ≤ 0 only if
      • εi ≤ - xi β
how to interpret
How to interpret ε?
  • When we look at certain people, we have expectations about whether y should equal 1 or 0
  • These expectations do not always hold true
  • The error ε represents deviations from what we expect
  • Go back to the work example, suppose xi β is ‘big.’ We observe a woman with:
    • High wages
    • Low husband’s income
    • Low cost of child care
slide94
We would expect this person to work, UNLESS, there is some unmeasured variable that counteracts this
  • For example:
    • Suppose a mom really likes spending time with her kids, or she hates her job.
    • The unmeasured benefit of working is then a big negative coefficient εi
slide95
If we observe them working, there are a certain range of values that εi must have been in excess of
      • yi=1 if εi > - xi β
  • If we observe someone not working, then Consider the opposite. Suppose we observe someone NOT working.
  • Then εi must not have been big or it was a bigger negative number, since
      • yi=0 if εi ≤ - xi β
the probabilities
The Probabilities
  • The estimation procedure used is determined by the assumed distribution of ε
  • What is the probability we observe someone with y=1?
    • Use definition of the CDF
    • Pr(yi=1) = Pr(yi*>0) = Pr(εi > - xi β)

= 1 – F(-xi β)

slide97
What is the probability we observe someone with y=0?
    • Use definition of the CDF
    • Pr(yi=0) = Pr(yi*≤ 0) = Pr(εi ≤ - xi β) = F(-xi β)
  • Two standard models: ε is either
    • normal or
    • logistic
normal probit model
Normal (probit) Model
  • ε is distributed as a standard normal
    • Mean zero
    • Variance 1
  • Evaluate probability (y=1)
    • Pr(yi=1) = Pr(εi > - xi β) = 1 – Ф(-xi β)
    • Given symmetry: 1 – Ф(-xi β) = Ф(xi β)
  • Evaluate probability (y=0)
    • Pr(yi=0) = Pr(εi ≤ - xi β) = Ф(-xi β)
    • Given symmetry: Ф(-xi β) = 1 - Ф(xi β)
slide99
Summary
    • Pr(yi=1) = Ф(xi β)
    • Pr(yi=0) = 1 -Ф(xi β)
  • Notice that Ф(a) is increasing a. Therefore, is one of the x’s increases the probability of observing y, we would expect the coefficient on that variable to be (+)
slide100
The standard normal assumption (variance=1) is not critical
  • In practice, the variance may be not equal t 1, but given the math of the problem, we cannot identify the variance. It is absorbed into parameter estimates
logit
Logit
  • CDF: F(a) = exp(a)/(1+exp(a))
    • Symmetric, unimodal distribution
    • Looks a lot like the normal
    • Incredibly easy to evaluate the CDF and PDF
    • Mean of zero, variance > 1 (more variance than normal)
  • Evaluate probability (y=1)
    • Pr(yi=1) = Pr(εi > - xi β) = 1 – F(-xi β)
    • Given symmetry: 1 – F(-xi β) = F(xi β)
    • F(xi β) = exp(xi β)/(1+exp(xi β))
slide102
Evaluate probability (y=0)
    • Pr(yi=0) = Pr(εi ≤ - xi β) = F(-xi β)
    • Given symmetry: F(-xi β) = 1 - F(xi β)
    • 1 - F(xi β) = 1 /(1+exp(xi β))
  • When εi is a logistic distribution
    • Pr(yi =1) = exp(xi β)/(1+exp(xi β))
    • Pr(yi=0) = 1/(1+exp(xi β))
example workplace smoking bans
Example: Workplace smoking bans
  • Smoking supplements to 1991 and 1993 National Health Interview Survey
  • Asked all respondents whether they currently smoke
  • Asked workers about workplace tobacco policies
  • Sample: workers
  • Key variables: current smoking and whether they faced by workplace ban
slide104
Data: workplace1.dta
  • Sample program: workplace1.doc
  • Results: workplace1.log
description of variables in data
Description of variables in data
  • . desc;
  • storage display value
  • variable name type format label variable label
  • ------------------------------------------------------------------------
  • > -
  • smoker byte %9.0g is current smoking
  • worka byte %9.0g has workplace smoking bans
  • age byte %9.0g age in years
  • male byte %9.0g male
  • black byte %9.0g black
  • hispanic byte %9.0g hispanic
  • incomel float %9.0g log income
  • hsgrad byte %9.0g is hs graduate
  • somecol byte %9.0g has some college
  • college float %9.0g
  • -----------------------------------------------------------------------
summary statistics
Summary statistics
  • sum;
  • Variable | Obs Mean Std. Dev. Min Max
  • -------------+--------------------------------------------------------
  • smoker | 16258 .25163 .433963 0 1
  • worka | 16258 .6851396 .4644745 0 1
  • age | 16258 38.54742 11.96189 18 87
  • male | 16258 .3947595 .488814 0 1
  • black | 16258 .1119449 .3153083 0 1
  • -------------+--------------------------------------------------------
  • hispanic | 16258 .0607086 .2388023 0 1
  • incomel | 16258 10.42097 .7624525 6.214608 11.22524
  • hsgrad | 16258 .3355271 .4721889 0 1
  • somecol | 16258 .2685447 .4432161 0 1
  • college | 16258 .3293763 .4700012 0 1
running a probit
Running a probit
  • probit smoker age incomel male black hispanic hsgrad somecol college worka;
  • The first variable after ‘probit’ is the discrete outcome, the rest of the variables are the independent variables
  • Includes a constant as a default
running a logit
Running a logit
  • logit smoker age incomel male black hispanic hsgrad somecol college worka;
  • Same as probit, just change the first word
running linear probability
Running linear probability
  • reg smoker age incomel male black hispanic hsgrad somecol college worka, robust;
  • Simple regression.
  • Standard errors are incorrect (heteroskedasticity)
  • robust option produces standard errors with arbitrary form of heteroskedasticity
probit results
Probit Results
  • Probit estimates Number of obs = 16258
  • LR chi2(9) = 819.44
  • Prob > chi2 = 0.0000
  • Log likelihood = -8761.7208 Pseudo R2 = 0.0447
  • ------------------------------------------------------------------------------
  • smoker | Coef. Std. Err. z P>|z| [95% Conf. Interval]
  • -------------+----------------------------------------------------------------
  • age | -.0012684 .0009316 -1.36 0.173 -.0030943 .0005574
  • incomel | -.092812 .0151496 -6.13 0.000 -.1225047 -.0631193
  • male | .0533213 .0229297 2.33 0.020 .0083799 .0982627
  • black | -.1060518 .034918 -3.04 0.002 -.17449 -.0376137
  • hispanic | -.2281468 .0475128 -4.80 0.000 -.3212701 -.1350235
  • hsgrad | -.1748765 .0436392 -4.01 0.000 -.2604078 -.0893453
  • somecol | -.363869 .0451757 -8.05 0.000 -.4524118 -.2753262
  • college | -.7689528 .0466418 -16.49 0.000 -.860369 -.6775366
  • worka | -.2093287 .0231425 -9.05 0.000 -.2546873 -.1639702
  • _cons | .870543 .154056 5.65 0.000 .5685989 1.172487
  • ------------------------------------------------------------------------------
how to measure fit
How to measure fit?
  • Regression (OLS)
    • minimize sum of squared errors
    • Or, maximize R2
    • The model is designed to maximize predictive capacity
  • Not the case with Probit/Logit
    • MLE models pick distribution parameters so as best describe the data generating process
    • May or may not ‘predict’ the outcome well
pseudo r 2
Pseudo R2
  • LLk log likelihood with all variables
  • LL1 log likelihood with only a constant
  • 0 > LLk > LL1 so | LLk | < |LL1|
  • Pseudo R2 = 1 - |LL1/LLk|
  • Bounded between 0-1
  • Not anything like an R2 from a regression
predicting y
Predicting Y
  • Let b be the estimated value of β
  • For any candidate vector of xi , we can predict probabilities, Pi
  • Pi = Ф(xib)
  • Once you have Pi, pick a threshold value, T, so that you predict
      • Yp = 1 if Pi > T
      • Yp = 0 if Pi ≤ T
  • Then compare, fraction correctly predicted
slide114
Question: what value to pick for T?
  • Can pick .5
    • Intuitive. More likely to engage in the activity than to not engage in it
    • However, when the  is small, this criteria does a poor job of predicting Yi=1
    • However, when the  is close to 1, this criteria does a poor job of picking Yi=0
slide115
*predict probability of smoking;
  • predict pred_prob_smoke;
  • * get detailed descriptive data about predicted prob;
  • sum pred_prob, detail;
  • * predict binary outcome with 50% cutoff;
  • gen pred_smoke1=pred_prob_smoke>=.5;
  • label variable pred_smoke1 "predicted smoking, 50% cutoff";
  • * compare actual values;
  • tab smoker pred_smoke1, row col cell;
slide116
. sum pred_prob, detail;
  • Pr(smoker)
  • -------------------------------------------------------------
  • Percentiles Smallest
  • 1% .0959301 .0615221
  • 5% .1155022 .0622963
  • 10% .1237434 .0633929 Obs 16258
  • 25% .1620851 .0733495 Sum of Wgt. 16258
  • 50% .2569962 Mean .2516653
  • Largest Std. Dev. .0960007
  • 75% .3187975 .5619798
  • 90% .3795704 .5655878 Variance .0092161
  • 95% .4039573 .5684112 Skewness .1520254
  • 99% .4672697 .6203823 Kurtosis 2.149247
slide117
Notice two things
    • Sample mean of the predicted probabilities is close to the sample mean outcome
    • 99% of the probabilities are less than .5
    • Should predict few smokers if use a 50% cutoff
slide118
| predicted smoking,
  • is current | 50% cutoff
  • smoking | 0 1 | Total
  • -----------+----------------------+----------
  • 0 | 12,153 14 | 12,167
  • | 99.88 0.12 | 100.00
  • | 74.93 35.90 | 74.84
  • | 74.75 0.09 | 74.84
  • -----------+----------------------+----------
  • 1 | 4,066 25 | 4,091
  • | 99.39 0.61 | 100.00
  • | 25.07 64.10 | 25.16
  • | 25.01 0.15 | 25.16
  • -----------+----------------------+----------
  • Total | 16,219 39 | 16,258
  • | 99.76 0.24 | 100.00
  • | 100.00 100.00 | 100.00
  • | 99.76 0.24 | 100.00
slide119
Check on-diagonal elements.
  • The last number in each 2x2 element is the fraction in the cell
  • The model correctly predicts 74.75 + 0.15 = 74.90% of the obs
  • It only predicts a small fraction of smokers
slide120
Do not be amazed by the 75% percent correct prediction
  • If you said everyone has a  chance of smoking (a case of no covariates), you would be correct Max[(,(1-)] percent of the time
slide121
In this case, 25.16% smoke.
  • If everyone had the same chance of smoking, we would assign everyone Pr(y=1) = .2516
  • We would be correct for the 1 - .2516 = 0.7484 people who do not smoke
key points about prediction
Key points about prediction
  • MLE models are not designed to maximize prediction
  • Should not be surprised they do not predict well
  • In this case, not particularly good measures of predictive capacity
translating coefficients in probit continuous covariates
Translating coefficients in probit:Continuous Covariates
  • Pr(yi=1) = Φ[β0 + x1iβ1+ x2iβ2+… xkiβk]
  • Suppose that x1i is a continuous variable
  • d Pr(yi=1) /d x1i = ?
  • What is the change in the probability of an event give a change in x1i?
marginal effect
Marginal Effect
  • d Pr(yi=1) /d x1i
  • = β1φ[β0 + x1iβ1+ x2iβ2+… xkiβk]
  • Notice two things. Marginal effect is a function of the other parameters and the values of x.
translating coefficients discrete covariates
Translating Coefficients:Discrete Covariates
  • Pr(yi=1) = Φ[β0 + x1iβ1+ x2iβ2+… xkiβk]
  • Suppose that x2i is a dummy variable (1 if yes, 0 if no)
  • Marginal effect makes no sense, cannot change x2i by a little amount. It is either 1 or 0.
  • Redefine the variable of interest. Compare outcomes with and without x2i
slide126
y1 = Pr(yi=1 | x2i=1)

= Φ[β0 + x1iβ1+ β2 + x3iβ3 +… ]

  • y0 = Pr(yi=1 | x2i=0)

= Φ[β0 + x1iβ1+ x3iβ3 … ]

Marginal effect = y1 – y0.

Difference in probabilities with and without x2i?

in stata
In STATA
  • Marginal effects for continuous variables, and Change in probabilities for dichotomous outcomes, STATA picks sample means for X’s
stata command for marginal effects
STATA command for Marginal Effects
  • mfx compute;
  • Must come after the outcome when estimates are still active in program.
slide129
Marginal effects after probit
  • y = Pr(smoker) (predict)
  • = .24093439
  • ------------------------------------------------------------------------------
  • variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
  • ---------+--------------------------------------------------------------------
  • age | -.0003951 .00029 -1.36 0.173 -.000964 .000174 38.5474
  • incomel | -.0289139 .00472 -6.13 0.000 -.03816 -.019668 10.421
  • male*| .0166757 .0072 2.32 0.021 .002568 .030783 .39476
  • black*| -.0320621 .01023 -3.13 0.002 -.052111 -.012013 .111945
  • hispanic*| -.0658551 .01259 -5.23 0.000 -.090536 -.041174 .060709
  • hsgrad*| -.053335 .01302 -4.10 0.000 -.07885 -.02782 .335527
  • somecol*| -.1062358 .01228 -8.65 0.000 -.130308 -.082164 .268545
  • college*| -.2149199 .01146 -18.76 0.000 -.237378 -.192462 .329376
  • worka*| -.0668959 .00756 -8.84 0.000 -.08172 -.052072 .68514
  • ------------------------------------------------------------------------------
  • (*) dy/dx is for discrete change of dummy variable from 0 to 1
interpret results1
Interpret results
  • 10% increase in income will reduce smoking by 2.9 percentage points
  • 10 year increase in age will decrease smoking rates .4 percentage points
  • Those with a college degree are 21.5 percentage points less likely to smoke
  • Those that face a workplace smoking ban have 6.7 percentage point lower probability of smoking
slide131
Do not confuse percentage point and percent differences
    • A 6.7 percentage point drop is 29% of the sample mean of 24 percent.
    • Blacks have smoking rates that are 3.2 percentage points lower than others, which is 13 percent of the sample mean
marginal effects for specific characteristics
Marginal effects for specific characteristics
  • Can generate marginal effects for a specific x
  • prchange, x(age=40 black=0 hispanic=0 hsgrad=0 somecol=0 worka=0);
  • If an x is not specified, STATA will use the sample mean (e.g., log income in this case)
  • Make sure when you specify a particular dummy variable (=1) you set the rest to zero
slide134
probit: Changes in Predicted Probabilities for smoker
  • min->max 0->1 -+1/2 -+sd/2 MargEfct
  • age -0.0323 -0.0005 -0.0005 -0.0056 -0.0005
  • incomel -0.1795 -0.0320 -0.0344 -0.0263 -0.0345
  • male 0.0198 0.0198 0.0198 0.0097 0.0198
  • black -0.0385 -0.0385 -0.0394 -0.0124 -0.0394
  • hispanic -0.0804 -0.0804 -0.0845 -0.0202 -0.0847
  • hsgrad -0.0625 -0.0625 -0.0648 -0.0306 -0.0649
  • somecol -0.1235 -0.1235 -0.1344 -0.0598 -0.1351
  • college -0.2644 -0.2644 -0.2795 -0.1335 -0.2854
  • worka -0.0742 -0.0742 -0.0776 -0.0361 -0.0777
testing significance of individual parameters
Testing significance of individual parameters
  • In large samples, MLE estimates are normally distributed
  • Null hypothesis, βj=0
  • If the null is true and the sample is larges, βj is distributed as a normal with variance σj2.
  • Using notes from before, if we divide βj by the standard deviation, we get standard normal
slide136
βj/se(βj) should be N(0,1)
  • βj/se(βj) = z-score
  • 95% of the distribution of a N(0,1) is between -1.96, 1.96
  • Reject null of the z-score > |1.96|
  • Only age is statistically insignificant (cannot reject null)
when will results differ
When will results differ?
  • Normal and logit CDF look:
    • Similar in the mid point of the distribution
    • Different in the tails
  • You obtain more observations in the tails of the distribution when
    • Samples sizes are large
    •  approaches 1 or 0
  • These situations will produce more differences in estimates
some nice properties of the logit
Some nice properties of the Logit
  • Outcome, y=1 or 0
  • Treatment, x=1 or 0
  • Other covariates, x
  • Context,
    • x = whether a baby is born with a low weight birth
    • x = whether the mom smoked or not during pregnancy
slide139
Risk ratio

RR = Prob(y=1|x=1)/Prob(y=1|x=0)

Differences in the probability of an event when x is and is not observed

How much does smoking elevate the chance your child will be a low weight birth

slide140
Let Yyx be the probability y=1 or 0 given x=1 or 0
  • Think of the risk ratio the following way
      • Y11 is the probability Y=1 when X=1
      • Y10 is the probability Y=1 when X=0
  • Y11 = RR*Y10
slide141
Odds Ratio

OR=A/B = [Y11/Y01]/[Y10/Y00]

A = [Pr(Y=1|X=1)/Pr(Y=0|X=1)]

= odds of Y occurring if you are a smoker

B = [Pr(Y=1|X=0)/Pr(Y=0|X=0)]

= odds of y happening if you are not a smoker

What are the relative odds of Y happening if you do or do not experience X

slide142
Suppose Pr(Yi =1) = F(βo+ β1Xi + β2Z) and F is the logistic function
  • Can show that
  • OR = exp(β1) = e β1
  • This number is typically reported by most statistical packages
slide143
Details
      • Y11 = exp(βo+ β1 + β2Z) /(1+ exp(βo+ β1+ β2Z) )
      • Y10 = exp(βo+ β2Z)/(1+ exp(βo+β2Z))
      • Y01 = 1 /(1+ exp(βo+ β1 + β2Z) )
      • Y00 = 1/(1+ exp(βo+β2Z)
      • [Y11/Y01] = exp(βo+ β1 + β2Z)
      • [Y10/Y00] = exp(βo+ β2Z)
      • OR=A/B = [Y11/Y01]/[Y10/Y00]

= exp(βo+ β1 + β2Z)/ exp(βo + β2Z)

= exp(β1)

slide144
Suppose Y is rare,  close to 0
    • Pr(Y=0|X=1) and Pr(Y=0|X=0) are both close to 1, so they cancel
  • Therefore, when  is close to 0
    • Odds Ratio = Risk Ratio
  • Why is this nice?
population attributable risk
Population attributable risk
  • Average outcome in the population
  •  = (1-) Y10 +  Y11 = (1- )Y10 + (RR)Y10
  • Average outcomes are a weighted average of outcomes for X=0 and X=1
  • What would the average outcome be in the absence of X (e.g., reduce smoking rates to 0)
  • Ya = Y10
population attributable risk1
Population Attributable Risk
  • PAR
  • Fraction of outcome attributed to X
  • The difference between the current rate and the rate that would exist without X, divided by the current rate
  • PAR = ( – Ya)/

= (RR – 1)/[(1-) + RR]

example maternal smoking and low weight births
Example: Maternal Smoking and Low Weight Births
  • 6% births are low weight
    • < 2500 grams (
    • Average birth is 3300 grams (5.5 lbs)
  • Maternal smoking during pregnancy has been identified as a key cofactor
    • 13% of mothers smoke
    • This number was falling about 1 percentage point per year during 1980s/90s
    • Doubles chance of low weight birth
natality detail data
Natality detail data
  • Census of all births (4 million/year)
  • Annual files starting in the 60s
  • Information about
    • Baby (birth weight, length, date, sex, plurality, birth injuries)
    • Demographics (age, race, marital, educ of mom)
    • Birth (who delivered, method of delivery)
    • Health of mom (smoke/drank during preg, weight gain)
slide149
Smoking not available from CA or NY
  • ~3 million usable observations
  • I pulled .5% random sample from 1995
  • About 12,500 obs
  • Variables: birthweight (grams), smoked, married, 4-level race, 5 level education, mothers age at birth
slide150
------------------------------------------------------------------------------------------------------------------------------------------------------------
  • > -
  • storage display value
  • variable name type format label variable label
  • ------------------------------------------------------------------------------
  • > -
  • birthw int %9.0g birth weight in grams
  • smoked byte %9.0g =1 if mom smoked during
  • pregnancy
  • age byte %9.0g moms age at birth
  • married byte %9.0g =1 if married
  • race4 byte %9.0g 1=white,2=black,3=asian,4=other
  • educ5 byte %9.0g 1=0-8, 2=9-11, 3=12, 4=13-15,
  • 5=16+
  • visits byte %9.0g prenatal visits
  • ------------------------------------------------------------------------------
slide151
dummy |
  • variable, |
  • =1 | =1 if mom smoked
  • ifBW<2500 | during pregnancy
  • grams | 0 1 | Total
  • -----------+----------------------+----------
  • 0 | 11,626 1,745 | 13,371
  • | 86.95 13.05 | 100.00
  • | 94.64 89.72 | 93.96
  • | 81.70 12.26 | 93.96
  • -----------+----------------------+----------
  • 1 | 659 200 | 859
  • | 76.72 23.28 | 100.00
  • | 5.36 10.28 | 6.04
  • | 4.63 1.41 | 6.04
  • -----------+----------------------+----------
  • Total | 12,285 1,945 | 14,230
  • | 86.33 13.67 | 100.00
  • | 100.00 100.00 | 100.00
  • | 86.33 13.67 | 100.00
slide152
Notice a few things
    • 13.7% of women smoke
    • 6% have low weight birth
  • Pr(LBW | Smoke) =10.28%
  • Pr(LBW |~ Smoke) = 5.36%
  • RR

= Pr(LBW | Smoke)/ Pr(LBW |~ Smoke)

= 0.1028/0.0536 = 1.92

logit results
Logit results
  • Log likelihood = -3136.9912 Pseudo R2 = 0.0330
  • ------------------------------------------------------------------------------
  • lowbw | Coef. Std. Err. z P>|z| [95% Conf. Interval]
  • -------------+----------------------------------------------------------------
  • smoked | .6740651 .0897869 7.51 0.000 .4980861 .8500441
  • age | .0080537 .006791 1.19 0.236 -.0052564 .0213638
  • married | -.3954044 .0882471 -4.48 0.000 -.5683654 -.2224433
  • _Ieduc5_2 | -.1949335 .1626502 -1.20 0.231 -.5137221 .1238551
  • _Ieduc5_3 | -.1925099 .1543239 -1.25 0.212 -.4949791 .1099594
  • _Ieduc5_4 | -.4057382 .1676759 -2.42 0.016 -.7343769 -.0770994
  • _Ieduc5_5 | -.3569715 .1780322 -2.01 0.045 -.7059081 -.0080349
  • _Irace4_2 | .7072894 .0875125 8.08 0.000 .5357681 .8788107
  • _Irace4_3 | .386623 .307062 1.26 0.208 -.2152075 .9884535
  • _Irace4_4 | .3095536 .2047899 1.51 0.131 -.0918271 .7109344
  • _cons | -2.755971 .2104916 -13.09 0.000 -3.168527 -2.343415
  • ------------------------------------------------------------------------------
odds ratios
Odds Ratios
  • Smoked
    • exp(0.674) = 1.96
    • Smokers are twice as likely to have a low weight birth
  • _Irace4_2 (Blacks)
    • exp(0.707) = 2.02
    • Blacks are twice as likely to have a low weight birth
asking for odds ratios
Asking for odds ratios
  • Logistic y x1 x2;
  • In this case
  • xi: logistic lowbw smoked age married i.educ5 i.race4;
slide156
Log likelihood = -3136.9912 Pseudo R2 = 0.0330
  • ------------------------------------------------------------------------------
  • lowbw | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
  • -------------+----------------------------------------------------------------
  • smoked | 1.962198 .1761796 7.51 0.000 1.645569 2.33975
  • age | 1.008086 .0068459 1.19 0.236 .9947574 1.021594
  • married | .6734077 .0594262 -4.48 0.000 .5664506 .8005604
  • _Ieduc5_2 | .8228894 .1338431 -1.20 0.231 .5982646 1.131852
  • _Ieduc5_3 | .8248862 .1272996 -1.25 0.212 .6095837 1.116233
  • _Ieduc5_4 | .6664847 .1117534 -2.42 0.016 .4798043 .9257979
  • _Ieduc5_5 | .6997924 .1245856 -2.01 0.045 .4936601 .9919973
  • _Irace4_2 | 2.028485 .1775178 8.08 0.000 1.70876 2.408034
  • _Irace4_3 | 1.472001 .4519957 1.26 0.208 .8063741 2.687076
  • _Irace4_4 | 1.362817 .2790911 1.51 0.131 .9122628 2.035893
  • ------------------------------------------------------------------------------
slide157
PAR
  • PAR = (RR – 1)/[(1-) + RR]
  • = 0.137
  • RR = 1.96
  • PAR = 0.116
  • 11.6% of low weight births attributed to maternal smoking
hypothesis testing in mle models
Hypothesis Testing in MLE models
  • MLE are asymptotically normally distributed, one of the properties of MLE
  • Therefore, standard t-tests of hypothesis will work as long as samples are ‘large’
  • What ‘large’ means is open to question
  • What to do when samples are ‘small’ – table for a moment
testing a linear combination of parameters
Testing a linear combination of parameters
  • Suppose you have a probit model
      • Φ[β0 + x1iβ1+ x2iβ2 + x3iβ3 +… ]
  • Test a linear combination or parameters
  • Simplest example, test a subset are zero
  • β1= β2 = β3 = β4 =0
  • To fix the discussion
      • N observations
      • K parameters
      • J restrictions (count the equals signs, j=4)
wald test
Wald Test
  • Based on the fact that the parameters are distributed asymptotically normal
  • Probability theory review
    • Suppose you have m draws from a standard normal distribution (zi)
    • M = z12 + z22 + …. zm2
    • M is distributed as a Chi-square with m degrees of freedom
slide161
Wald test constructs a ‘quadratic form’ suggested by the test you want to perform
  • This combination, because it contains squares of the true parameters, should, if the hypothesis is true, be distributed as a Chi square with J degrees of freedom.
  • If the test statistic is ‘large’, relative to the degrees of freedom of the test, we reject, because there is a low probability we would have drawn that value at random from the distribution
reading critical values from a table
Reading critical values from a table
  • All stats books will report the ‘percentiles’ of a chi-square
    • Vertical axis (degrees of freedom)
    • Horizontal axis (percentiles)
    • Entry is the value where ‘percentile’ of the distribution falls below
slide163
Example: Suppose 4 restrictions
  • 95% of a chi-square distribution falls below 9.488.
  • So there is only a 5% a number drawn at random will exceed 9.488
  • If your test statistic is below, cannot reject null
  • If your test statistics is above, reject null
wald test in stata
Wald test in STATA
  • Default test in MLE models
  • Easy to do. Look at program
      • test hsgrad somecol college
  • Does not estimate the ‘restricted’ model
  • ‘Lower power’ than other tests, i.e., high chance of false negative
slide166
. test hsgrad somecol college;
  • ( 1) hsgrad = 0
  • ( 2) somecol = 0
  • ( 3) college = 0
  • chi2( 3) = 504.78
  • Prob > chi2 = 0.0000
slide167
Notice the higher value of the test statistic. There is a low chance that a variable, drawn at random from a ch-square with three degrees of freedom will be this large.
  • Reject null
2 log likelihood test
-2 Log likelihood test
  • * how to run the same tests with a -2 log like test;
  • * estimate the unresticted model and save the estimates ;
  • * in urmodel;
  • probit smoker age incomel male black hispanic
  • hsgrad somecol college worka;
  • estimates store urmodel;
  • * estimate the restricted model. save results in rmodel;
  • probit smoker age incomel male black hispanic
  • worka;
  • estimates store rmodel;
  • lrtest urmodel rmodel;
slide169
I prefer -2 log likelihood test
    • Estimates the restricted and unrestricted model
    • Therefore, has more power than a Wald test
  • In most cases, they give the same ‘decision’ (reject/not reject)
section vi

Section VI

Categorical Data

ordered probit
Ordered Probit
  • Many discrete outcomes are to questions that have a natural ordering but no quantitative interpretation:
  • Examples:
    • Self reported health status
      • (excellent, very good, good, fair, poor)
    • Do you agree with the following statement
      • Strongly agree, agree, disagree, strongly disagree
slide172
Can use the same type of model as in the previous section to analyze these outcomes
  • Another ‘latent variable’ model
  • Key to the model: there is a monotonic ordering of the qualitative responses
self reported health status
Self reported health status
  • Excellent, very good, good, fair, poor
  • Coded as 1, 2, 3, 4, 5 on National Health Interview Survey
  • We will code as 5,4,3,2,1 (easier to think of this way)
  • Asked on every major health survey
  • Important predictor of health outcomes, e.g. mortality
  • Key question: what predicts health status?
slide174
Important to note – the numbers 1-5 mean nothing in terms of their value, just an ordering to show you the lowest to highest
  • The example below is easily adapted to include categorical variables with any number of outcomes
model
Model
  • yi* = latent index of reported health
  • The latent index measures your own scale of health. Once yi* crosses a certain value you report poor, then good, then very good, then excellent health
slide176
yi = (1,2,3,4,5) for (fair, poor, VG, G, excel)
  • Interval decision rule
      • yi=1 if yi* ≤ u1
      • yi=2 if u1 < yi* ≤ u2
      • yi=3 if u2 < yi* ≤ u3
      • yi=4 if u3 < yi* ≤ u4
      • yi=5 if yi* > u4
slide177
As with logit and probit models, we will assume yi* is a function of observed and unobserved variables
  • yi* = β0 + x1i β1 + x2i β2 …. xki βk + εi
  • yi* = xi β + εi
slide178
The threshold values (u1, u2, u3, u4) are unknown. We do not know the value of the index necessary to push you from very good to excellent.
  • In theory, the threshold values are different for everyone
  • Computer will not only estimate the β’s, but also the thresholds – average across people
slide179
As with probit and logit, the model will be determined by the assumed distribution of ε
  • In practice, most people pick nornal, generating an ‘ordered probit’ (I have no idea why)
  • We will generate the math for the probit version
probabilities
Probabilities
  • Lets do the outliers, Pr(yi=1) and Pr(yi=5) first
  • Pr(yi=1)
  • = Pr(yi* ≤ u1)
  • = Pr(xi β +εi ≤ u1 )
  • =Pr(εi ≤ u1 - xi β)
  • = Φ[u1 - xi β] = 1- Φ[xi β – u1]
slide181
Pr(yi=5)
  • = Pr(yi* > u4)
  • = Pr(xi β +εi > u4 )
  • =Pr(εi > u4 - xi β)
  • = 1 - Φ[u4 - xi β] = Φ[xi β – u4]
sample one for y 3
Sample one for y=3
  • Pr(yi=3) = Pr(u2 < yi* ≤ u3)

= Pr(yi* ≤ u3) – Pr(yi* ≤ u2)

= Pr(xi β +εi≤ u3) – Pr(xi β +εi≤ u2)

= Pr(εi≤ u3- xi β) - Pr(εi≤ u2 - xi β)

= Φ[u3- xi β] - Φ[u2 - xi β]

= 1 - Φ[xi β - u3] – 1 + Φ[xi β - u2]

= Φ[xi β - u2] - Φ[xi β - u3]

summary
Summary
  • Pr(yi=1) = 1- Φ[xi β – u1]
  • Pr(yi=2) = Φ[xi β – u1] - Φ[xi β – u2]
  • Pr(yi=3) = Φ[xi β – u2] - Φ[xi β – u3]
  • Pr(yi=4) = Φ[xi β – u3] - Φ[xi β – u4]
  • Pr(yi=5) = Φ[xi β – u4]
likelihood function
Likelihood function
  • There are 5 possible choices for each person
  • Only 1 is observed
  • L = Σi ln[Pr(yi=k)] for k
programming example
Programming example
  • Cancer control supplement to 1994 National Health Interview Survey
  • Question: what observed characteristics predict self reported health (1-5 scale)
  • 1=poor, 5=excellent
  • Key covariates: income, education, age, current and former smoking status
  • Programs
      • sr_health_status.do, .dta, .log
slide186
desc;
  • male byte %9.0g =1 if male
  • age byte %9.0g age in years
  • educ byte %9.0g years of education
  • smoke byte %9.0g current smoker
  • smoke5 byte %9.0g smoked in past 5 years
  • black float %9.0g =1 if respondent is black
  • othrace float %9.0g =1 if other race (white is ref)
  • sr_health float %9.0g 1-5 self reported health,
  • 5=excel, 1=poor
  • famincl float %9.0g log family income
slide187
tab sr_health;
  • 1-5 self |
  • reported |
  • health, |
  • 5=excel, |
  • 1=poor | Freq. Percent Cum.
  • ------------+-----------------------------------
  • 1 | 342 2.65 2.65
  • 2 | 991 7.68 10.33
  • 3 | 3,068 23.78 34.12
  • 4 | 3,855 29.88 64.00
  • 5 | 4,644 36.00 100.00
  • ------------+-----------------------------------
  • Total | 12,900 100.00
in stata1
In STATA
  • oprobit sr_health male age educ famincl black othrace smoke smoke5;
slide189
Ordered probit estimates Number of obs = 12900
  • LR chi2(8) = 2379.61
  • Prob > chi2 = 0.0000
  • Log likelihood = -16401.987 Pseudo R2 = 0.0676
  • ------------------------------------------------------------------------------
  • sr_health | Coef. Std. Err. z P>|z| [95% Conf. Interval]
  • -------------+----------------------------------------------------------------
  • male | .1281241 .0195747 6.55 0.000 .0897583 .1664899
  • age | -.0202308 .0008499 -23.80 0.000 -.0218966 -.018565
  • educ | .0827086 .0038547 21.46 0.000 .0751535 .0902637
  • famincl | .2398957 .0112206 21.38 0.000 .2179037 .2618878
  • black | -.221508 .029528 -7.50 0.000 -.2793818 -.1636341
  • othrace | -.2425083 .0480047 -5.05 0.000 -.3365958 -.1484208
  • smoke | -.2086096 .0219779 -9.49 0.000 -.2516855 -.1655337
  • smoke5 | -.1529619 .0357995 -4.27 0.000 -.2231277 -.0827961
  • -------------+----------------------------------------------------------------
  • _cut1 | .4858634 .113179 (Ancillary parameters)
  • _cut2 | 1.269036 .11282
  • _cut3 | 2.247251 .1138171
  • _cut4 | 3.094606 .1145781
  • ------------------------------------------------------------------------------
interpret coefficients
Interpret coefficients
  • Marginal effects/changes in probabilities are now a function of 2 things
    • Point of expansion (x’s)
    • Frame of reference for outcome (y)
  • STATA
    • Picks mean values for x’s
    • You pick the value of y
continuous x s
Continuous x’s
  • Consider y=5
  • d Pr(yi=5)/dxi

= d Φ[xi β – u4]/dxi = βφ[xi β – u4]

  • Consider y=3
  • d Pr(yi=3)/dxi = βφ[xi β – u3] - βφ[xi β – u4]
discrete x s
Discrete X’s
  • xiβ = β0 + x1i β1 + x2i β2 …. xki βk
    • X2i is yes or no (1 or 0)
  • ΔPr(yi=5) =
  • Φ[β0 + x1i β1 + β2 + x3i β3 +.. xki βk]

- Φ[β0 + x1i β1 + x3i β3 …. xki βk]

  • Change in the probabilities when x2i=1 and x2i=0
ask for marginal effects
Ask for marginal effects
  • mfx compute, predict(outcome(5));
slide194
mfx compute, predict(outcome(5));
  • Marginal effects after oprobit
  • y = Pr(sr_health==5) (predict, outcome(5))
  • = .34103717
  • ------------------------------------------------------------------------------
  • variable | dy/dx Std. Err. z P>|z| [ 95% C.I. ] X
  • ---------+--------------------------------------------------------------------
  • male*| .0471251 .00722 6.53 0.000 .03298 .06127 .438062
  • age | -.0074214 .00031 -23.77 0.000 -.008033 -.00681 39.8412
  • educ | .0303405 .00142 21.42 0.000 .027565 .033116 13.2402
  • famincl | .0880025 .00412 21.37 0.000 .07993 .096075 10.2131
  • black*| -.0781411 .00996 -7.84 0.000 -.097665 -.058617 .124264
  • othrace*| -.0843227 .01567 -5.38 0.000 -.115043 -.053602 .04124
  • smoke*| -.0749785 .00773 -9.71 0.000 -.09012 -.059837 .289147
  • smoke5*| -.0545062 .01235 -4.41 0.000 -.078719 -.030294 .081395
  • ------------------------------------------------------------------------------
  • (*) dy/dx is for discrete change of dummy variable from 0 to 1
interpret the results
Interpret the results
  • Males are 4.7 percentage points more likely to report excellent
  • Each year of age decreases chance of reporting excellent by 0.7 percentage points
  • Current smokers are 7.5 percentage points less likely to report excellent health
minor notes about estimation
Minor notes about estimation
  • Wald tests/-2 log likelihood tests are done the exact same was as in PROBIT and LOGIT
  • Tests of individual parameters are done the same way (z-score)
slide197
Use PRCHANGE to calculate marginal effect for a specific person

prchange, x(age=40 black=0 othrace=0 smoke=0 smoke5=0 educ=16);

    • When a variable is NOT specified (famincl), STATA takes the sample mean.
slide198
PRCHANGE will produce results for all outcomes
  • male
  • Avg|Chg| 1 2 3 4
  • 0->1 .0203868 -.0020257 -.00886671 -.02677558 -.01329902
  • 5
  • 0->1 .05096698
slide199
age
  • Avg|Chg| 1 2 3 4
  • Min->Max .13358317 .0184785 .06797072 .17686112 .07064757
  • -+1/2 .00321942 .00032518 .00141642 .00424452 .00206241
  • -+sd/2 .03728014 .00382077 .01648743 .04910323 .0237889
  • MargEfct .00321947 .00032515 .00141639 .00424462 .00206252
section vii

Section VII

Count Data Models

introduction1
Introduction
  • Many outcomes of interest are integer counts
    • Doctor visits
    • Low work days
    • Cigarettes smoked per day
    • Missed school days
  • OLS models can easily handle some integer models
slide202
Example
    • SAT scores are essentially integer values
    • Few at ‘tails’
    • Distribution is fairly continuous
    • OLS models well
  • In contrast, suppose
    • High fraction of zeros
    • Small positive values
slide203
OLS models will
    • Predict negative values
    • Do a poor job of predicting the mass of observations at zero
  • Example
    • Dr visits in past year, Medicare patients(65+)
    • 1987 National Medical Expenditure Survey
    • Top code (for now) at 10
    • 17% have no visits
slide204
visits | Freq. Percent Cum.
  • ------------+-----------------------------------
  • 0 | 915 17.18 17.18
  • 1 | 601 11.28 28.46
  • 2 | 533 10.01 38.46
  • 3 | 503 9.44 47.91
  • 4 | 450 8.45 56.35
  • 5 | 391 7.34 63.69
  • 6 | 319 5.99 69.68
  • 7 | 258 4.84 74.53
  • 8 | 216 4.05 78.58
  • 9 | 192 3.60 82.19
  • 10 | 949 17.81 100.00
  • ------------+-----------------------------------
  • Total | 5,327 100.00
poisson model
Poisson Model
  • yi is drawn from a Poisson distribution
  • Poisson parameter varies across observations
  • f(yi;λi) =e-λi λiyi/yi! For λi>0
  • E[yi]= Var[yi] = λi = f(xi, β)
slide206
λi must be positive at all times
  • Therefore, we CANNOT let λi = xiβ
  • Let λi = exp(xiβ)
  • ln(λi) = (xiβ)
slide207
d ln(λi)/dxi = β
  • Remember that d ln(λi) = dλi/λi
  • Interpret β as the percentage change in mean outcomes for a change in x
problems with poisson
Problems with Poisson
  • Variance grows with the mean
    • E[yi]= Var[yi] = λi = f(xi, β)
  • Most data sets have over dispersion, where the variance grows faster than the mean
  • In dr. visits sample,  = 5.6, s=6.7
  • Impose Mean=Var, severe restriction and you tend to reduce standard errors
negative binomial model
Negative Binomial Model
  • Where γi = exp(xiβ) and δ ≥ 0
  • E[yi] = δγi = δexp(xiβ)
  • Var[yi] = δ (1+δ) γi
  • Var[yi]/ E[yi] = (1+δ)
slide210
δ must always be ≥ 0
  • In this case, the variance grows faster than the mean
  • If δ=0, the model collapses into the Poisson
  • Always estimate negative binomial
  • If you cannot reject the null that δ=0, report the Poisson estimates
slide211
Notice that ln(E[yi]) = ln(δ) + ln(γi), so
  • d ln(E[yi]) /dxi = β
  • Parameters have the same interpretation as in the Poisson model
in stata2
In STATA
  • POISSON estimates a MLE model for poisson
    • Syntax

POISSON y independent variables

  • NBREG estimates MLE negative binomial
    • Syntax

NBREG y independent variables

interpret results for poisson
Interpret results for Poisson
  • Those with CHRONIC condition have 50% more mean MD visits
  • Those in EXCELent health have 78% fewer MD visits
  • BLACKS have 33% fewer visits than whites
  • Income elasticity is 0.021, 10% increase in income generates a 2.1% increase in visits
negative binomial
Negative Binomial
  • Interpret results the same was as Poisson
  • Look at coefficient/standard error on delta
  • Ho: delta = 0 (Poisson model is correct)
  • In this case, delta = 5.21 standard error is 0.15, easily reject null.
  • Var/Mean = 1+delta = 6.21, Poisson is mis-specificed, should see very small standard errors in the wrong model
section viii

Section VIII

Duration Data

introduction2
Introduction
  • Sometimes we have data on length of time of a particular event or ‘spells’
    • Time until death
    • Time on unemployment
    • Time to complete a PhD
  • Techniques we will discuss were originally used to examine lifespan of objects like light bulbs or machines. These models are often referred to as “time to failure”
notation2
Notation
  • T is a random variable that indicates duration (time til death, find a new job, etc)
  • t is the realization of that variable
  • f(t) is a PDF that describes the process that determines the time to failure
  • CDF is F(t) represents the probability an event will happen by time t
slide219
F(t) represents the probability that the event happens by ‘t’.
  • What is the probability a person will die on or before the 65th birthday?
slide220
Survivor function, what is the chance you live past (t)
  • S(t) = 1 – F(t)
  • If 10% of a cohort dies by their 65th birthday, 90% will die sometime after their 65th birthday
slide221
Hazard function, h(t)
  • What is the probability the spell will end at time t, given that it has already lasted t
  • What is the chance you find a new job in month 12 given that you’ve been unemployed for 12 months already
slide222
PDF, CDF (Failure function), survivor function and hazard function are all related
  • λ(t) = f(t)/S(t) = f(t)/(1-F(t))
  • We focus on the ‘hazard’ rate because its relationship to time indicates ‘duration dependence’
slide223
Example: suppose the longer someone is out of work, the lower the chance they will exit unemployment – ‘damaged goods’
  • This is an example of duration dependence, the probability of exiting a state of the world is a function of the length
slide224
Mathematically
      • d λ(t) /dt = 0 then there is no duration dep.
      • d λ(t) /dt > 0 there is + duration dependence

the probability the spell will end

increases with time

      • d λ(t) /dt < 0 there is – duration dependence

the probability the spell will end

decreases over time

different functional forms
Different Functional Forms
  • Exponential
    • λ(t)= λ
    • Hazard is the same over time, a ‘memory less’ process
  • Weibull
    • F(t) = 1 – exp(-γtρ) where ρ,γ > 0
    • λ(t) = ργρ-1
    • if ρ>1, increasing hazard
    • if ρ<1, decreasing hazard
    • if ρ=1, exponential
slide227
Others: Lognormal, log-logistic, Gompertz.
  • Little more difficult – can examine when you get comfortable with Weibull
a note about most data sets
A note about most data sets
  • Most data sets have ‘censored’ spells
    • Follow people over time
    • All will eventually die, but some do not in your period of analysis
    • Incomplete spells or censored data
  • Must build into the log likelihood function
slide229
Let ti be the duration we observe for all people
  • Some people die, and their they lived until period ti
  • Others are observed for ti periods, but do not
  • Let di=1 if data is complete spell
  • di=1 if incomplete
slide230
Recall, that f(s) is the PDF for someone who dies at period s
  • F(t) is the probability you die by t
  • 1-F(t) = the probability you die after (t)
slide231
If di=1 then we observe f(ti), someone who died in period ti
  • If di=0 then someone lived past period ti and the probability of that is [1-F(ti)]
  • L = Σi {di ln[f(ti)] + (1-di)ln[1-F(ti)]}
introducing covariates
Introducing covariates
  • Look at exponential
  • λ(t)= λ
  • Allow this to vary across people
  • λi(t)= λi
  • But like Poisson, λi is always positive
  • Let λi = exp(β0 + x1i β1 + x2i β2 …. xki βk)
slide233
In the Weibull
  • λ(t) = αγtα-1
  • Allow it to vary across people
  • λi(t) = αγi tα-1
  • γi = exp(β0 + x1i β1 + x2i β2 …. xki βk)
interpreting coefficients
Interpreting Coefficients
  • This is the same for both Weibull and Exponential
  • In Weibull, λ(ti ) = αγitα-1
  • Suppose x1i is a dummy variable
  • When xi1=1, then
  • γi1 = eβ0 + β1 + x2i β2 …. xki βk
  • When xi1=0, then
  • γi0 = eβ0 + x2i β2 …. xki βk
slide235
When you construct the ratio of γi1/ γi0, all the others parameters cancel, so
  • (αγi1 tα-1 – αγi0 tα-1 ) / αγi0 tα-1 = eβ1 -1
  • Percentage change in the hazard when x1i turns from 0 to 1.
  • STATA prints out eβ1, just subtract 1
suppose x 2i is continuous
Suppose x2i is continuous
  • Suppose we increase x2i by 1 unit
  • γi1 = exp(β0 + β1x1i + x2iβ2 …. xkiβk)
  • γi2 = exp(β0 + β1 (x1i+1) + x2iβ2 …. xkiβk)
  • Can show that
  • (αγi1 tα-1 – αγi0 tα-1 ) / αγi0 tα-1 = eβ1 -1
  • = exp(β2) – 1
  • Percentage change in the hazard for 1 unit increase in x
nhis multiple cause of death
NHIS Multiple Cause of Death
  • NHIS
    • annual survey of 60K households
    • Data on individuals
    • Self-reported healthm DR visits, lost workdays, etc.
  • MCOD
    • Linked NHIS respondents from 1986-1994 to National Death Index through Dec 31, 1995
    • Identified whether respondent died and of what cause
slide238
Our sample
    • Males, 50-70, who were married at the time of the survey
    • 1987-1989 surveys
    • Give everyone 5 years (60 months) of followup
key variables
Key Variables
  • max_mths maximum months in the survey.
  • Diedin5 respondent died during the 5 years of followup
  • Note if diedn5=0, the max_mths=60. Diedin5 identifies whether the data is censored or not.
slide240
Variable | Obs Mean Std. Dev. Min Max
  • -------------+--------------------------------------------------------
  • age_s_yrs | 26654 59.42586 5.962435 50 70
  • max_mths | 26654 56.49077 11.15384 0 60
  • black | 26654 .0928566 .2902368 0 1
  • hispanic | 26654 .0454716 .20834 0 1
  • income | 26654 3.592181 1.327325 1 5
  • -------------+--------------------------------------------------------
  • educ | 26654 2.766677 .961846 1 4
  • diedin5 | 26654 .1226082 .3279931 0 1
duration data in stata
Duration Data in STATA
  • Need to identify which is the duration data

stset length, failure(failvar)

      • Length=duration variable
      • Failvar=1 when durations end in failure, =0 for censored values
  • If all data is uncensored, omit failure(failvar)
slide242
In our case
  • stset max_mths, failure(diedin5)
getting kaplan meier curves
Getting Kaplan-Meier Curves
  • Tabular presentation of results

sts list

  • Graphical presentation

sts graph

  • Results by subgroup

sts graph, by(educ)

mle of duration model with covariates
MLE of duration model with Covariates
  • Basic syntax
  • streg covariates,d(distribution)
  • streg age_s_yrs black hispanic _Ie* _Ii*, d(weibull);
  • In this model, STATA will print out exp(β)
  • If you want the coefficients, add ‘nohr’ option (no hazard ratio)
weibull coefficients
Weibull coefficients
  • No. of subjects = 26631 Number of obs = 26631
  • No. of failures = 3245
  • Time at risk = 1505705
  • LR chi2(10) = 595.74
  • Log likelihood = -12425.055 Prob > chi2 = 0.0000
  • ------------------------------------------------------------------------------
  • _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]
  • -------------+----------------------------------------------------------------
  • age_s_yrs | .0452588 .0031592 14.33 0.000 .0390669 .0514508
  • black | .4770152 .0511122 9.33 0.000 .3768371 .5771932
  • hispanic | .1333552 .082156 1.62 0.105 -.0276676 .294378
  • _Ieduc_2 | .0093353 .0591918 0.16 0.875 -.1066786 .1253492
  • _Ieduc_3 | -.072163 .0503131 -1.43 0.151 -.1707748 .0264488
  • _Ieduc_4 | -.1301173 .0657131 -1.98 0.048 -.2589126 -.0013221
  • _Iincome_2 | -.1867752 .0650604 -2.87 0.004 -.3142914 -.0592591
  • _Iincome_3 | -.3268927 .0688635 -4.75 0.000 -.4618627 -.1919227
  • _Iincome_4 | -.5166137 .0769202 -6.72 0.000 -.6673747 -.3658528
  • _Iincome_5 | -.5425447 .0722025 -7.51 0.000 -.684059 -.4010303
  • _cons | -9.201724 .2266475 -40.60 0.000 -9.645945 -8.757503
  • -------------+----------------------------------------------------------------
  • /ln_p | .1585315 .0172241 9.20 0.000 .1247729 .1922901
  • -------------+----------------------------------------------------------------
  • p | 1.171789 .020183 1.132891 1.212022
  • 1/p | .8533961 .014699 .8250675 .8826974
  • ------------------------------------------------------------------------------
slide248
The sign of the parameters is informative
    • Hazard increasing in age
    • Blacks, hispanics have higher mortality rates
    • Hazard decreases with income and age
  • The parameter ρ= 1.17.
    • Check 95% confidence interval (1.13, 1.21). Can reject null p=1 (exponential)
    • Hazard is increasing over time
hazard ratios
Hazard ratios
  • No. of subjects = 26631 Number of obs = 26631
  • No. of failures = 3245
  • Time at risk = 1505705
  • LR chi2(10) = 595.74
  • Log likelihood = -12425.055 Prob > chi2 = 0.0000
  • ------------------------------------------------------------------------------
  • _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
  • -------------+----------------------------------------------------------------
  • age_s_yrs | 1.046299 .0033055 14.33 0.000 1.03984 1.052797
  • black | 1.611258 .082355 9.33 0.000 1.457667 1.781032
  • hispanic | 1.142656 .093876 1.62 0.105 .9727116 1.342291
  • _Ieduc_2 | 1.009379 .059747 0.16 0.875 .8988145 1.133544
  • _Ieduc_3 | .9303792 .0468103 -1.43 0.151 .8430114 1.026802
  • _Ieduc_4 | .8779924 .0576956 -1.98 0.048 .7718905 .9986788
  • _Iincome_2 | .8296302 .0539761 -2.87 0.004 .7303062 .9424625
  • _Iincome_3 | .7211611 .0496617 -4.75 0.000 .6301089 .8253706
  • _Iincome_4 | .5965372 .0458858 -6.72 0.000 .5130537 .6936049
  • _Iincome_5 | .5812672 .041969 -7.51 0.000 .5045648 .6696297
  • -------------+----------------------------------------------------------------
  • /ln_p | .1585315 .0172241 9.20 0.000 .1247729 .1922901
  • -------------+----------------------------------------------------------------
  • p | 1.171789 .020183 1.132891 1.212022
  • 1/p | .8533961 .014699 .8250675 .8826974
  • ------------------------------------------------------------------------------
interpret coefficients1
Interpret coefficients
  • Age: every year hazard increases by 4.6%
  • Black: have 61% greater hazard than whites
  • Hispanics: 14% greater hazard than non-hispanics
  • Educ 2, 3, 4 are some 9-11, 12-15 and 16+ years of school
slide251
Educ 3: those with 12-15 years of educ have .93 – 1 = -0.07 or a 7% lower hazard than those with <9 years of school
  • Educ 4: those with a college degree have 0.88 – 1 = -0.12 or a 12% lower hazard than those with <9 years of school
slide252
Income 2-5 are dummies for people with $10-$20K, $20-$30K, $30-$40K, >$40K
  • Income 2: Those with $10-$20K have 0.83 – 1 = -0.17 or a 17% lower hazard than those with income <$10K
  • Income 5, those with >$40K in income have a 0.58 – 1 = -0.42 or a 42% lower hazard than those with income <$10K
topics not covered
Topics not covered
  • Time varying covariates
  • Competing risk models
    • Die from multiple causes
  • Cox proportional hazard model
    • Heterogeneity in baseline hazard
ad