AMMBR II

AMMBR II Gerrit Rooks

Today • Introduction to Stata • Files / directories • Stata syntax • Useful commands / functions • Logistic regression analysis with Stata • Estimation • GOF • Coefficients • Checking assumptions

Stata file types • .ado • programs that add commands to Stata • .do • Batch files that execute a set of Stata commands • .dta • Data file in Stata’s format • .log • Output saved as plain text by thelog using command

The working directory • The working directory is the default directory for any file operations such as using & saving data, or logging output • cd “d:\my work\”

Saving output to log files • Syntax for the log command • log using filename [, append replace [smcl|text]] • To close a log file • log close

Using and saving datasets • Load a Stata dataset • use d:\myproject\data.dta, clear • Save • save d:\myproject\data, replace • Using change directory • cd d:\myproject • Use data, clear • save data, replace

Entering data • Data in other formats • You can use SPSS to convert data • You can use the infile and insheet commands to import data in ASCII format • Entering data by hand • Type edit or just click on the data-editor button

Do-files • You can create a text file that contains a series of commands • Use the do-editor to work with do-files • Example I

Adding comments • // or * denote comments stata should ignore • Stata ignores whatever follows after /// and treats the next line as a continuation • Example II

A recommended structure //if a log file is open, close it capture log close //dont'pause when output scrolls off the page set more off //change directory to your working directory cd d:\myproject //log results to file myfile.log log using myfile, replace text // * myfile.do-written 7 feb 2010 to illustrate do-files // your commands here //close the log file log close

Serious data analysis • Ensure replicability use do+log files • Document your do-files • What is obvious today, is baffling in six months • Keep a research log • Diary that includes a description of every program you run • Develop a system for naming files

Serious data analysis • New variables should be given new names • Use labels and notes • Double check every new variable • ARCHIVE

The Stata syntax • Regress y x1 x2 if x3 <20, cluster(x4) • Regress = Command • Whataction do you want to performed • y x1 x2 = Names of variables, files orotherobjects • Onwhatthings is the commandperformed • if x3 <20 = Qualifieronobservations • Onwhichobservationsshould the commandbeperformed • , cluster(x4) = Options • What special thingsshouldbedone in executing the command

Examples • tabulate smoking race if agemother > 30, row • Example of the if qualifier • sum agemother if smoking == 1 & weightmother < 100

Elements used for logical statements

Missing values • Automatically excluded when Stata fits models, they are stored as the largest positive values • Beware • The expression ‘age > 65’ can thus also include missing values • To be sure type: ‘age > 65 & age != .’

Selecting observations • drop variable list • Keep variable list • drop if age < 65

Creating new variables • generate command • generate age2 = age * age • generate • see help function • !!sometimes the command egen is a useful alternative, f.i. • egen meanage = mean(age)

Useful functions

Replace command • replace has the same syntax as generate but is used to change values of a variable that already exists • gen age_dum = . • replace age = 0 if age < 5 • replace age = 1 if age >=5

Recode • Change values of exisiting variables • Change 1 to 2 and 3 to 4: recode origvar (1=2)(3=4), gen(myvar1) • Change missings to 1: recode origvar (.=1), gen(origvar)

Logistic regression • Lets use a set of data collected by the state of California from 1200 high schools measuring academic achievement. • Our dependent variable is called hiqual. • Our predictor variable will be a continuous variable called avg_ed, which is a continuous measure of the average education (ranging from 1 to 5) of the parents of the students in the participating high schools.

OLS in Stata

Logistic regression in Stata

Multiple predictors

Model fit: the likelihood ratio test

Model fit: LR test

Pseudo R2: proportionalchange in LL

Classification Table

Interpreting coefficients: significance

Comparing models

After the full model and storage, estimate nested model

Likelihood ratio test

Interpretation of coefficients: direction

Interpretation of coefficients: Magnitude

the assumptions of logistic regression • The true conditional probabilities are a logistic function of the independent variables. • No important variables are omitted. • No extraneous variables are included. • The independent variables are measured without error. • The observations are independent. • The independent variables are not linear combinations of each other.

Hosmer & Lemeshow Test divides sample in subgroups, checks whether difference between observed and predicted is about equal in these groups Test should not be significant (indicating no difference)

Hosmer & Lemeshow Average Probability In j th group

First logistic regression

Then postestimation command

Specification error

Including interaction term helps

Ok now

Multicollinearity

AMMBR II

AMMBR II

Presentation Transcript

II

STARK II PHASE II

II Lactic Fermentation II

II

II

II -

STARK II PHASE II

Part II of II

II CARPETS II RUGS II UPHOLSTERY II TITLES II

(ii)

AMMBR from xtreg to xtmixed (+checking for normality, random slopes)

II. UNIT II

AMMBR II

II. UNIT II

AMMBR II

AMMBR III

AMMBR - final stuff xtmixed (and xtreg) (checking for normality, random slopes)

Exam II Powerpoint II

PART II - II