1 / 50

AMMBR II

AMMBR II. Gerrit Rooks 8-2-10. Suppose I have a data-set with 100 observations. A graphic representation. Age groups. Logistics of logistic regression. Estimate the coefficients Assess model fit Interpret coefficients Check regression assumptions. Maximum likelihood estimation.

Download Presentation

AMMBR II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AMMBR II Gerrit Rooks 8-2-10

  2. Suppose I have a data-set with 100 observations

  3. A graphic representation

  4. Age groups

  5. Logistics of logistic regression • Estimate the coefficients • Assess model fit • Interpret coefficients • Check regression assumptions

  6. Maximum likelihood estimation • Method of maximum likelihood yields values for the unknown parameters whichi maximize te probability of obtaining the observed set of data. • First we have to construct the likelihood function. • Expresses the probability of the observed data as a function of the unknown parameters.

  7. The probability of a givenobservation

  8. Since the observations are assumed to be independent the likelihoodfunctioncanbewritten as • For technicalreasons the likelihood is transformed in the log-likelihood Likelihood = pr(obs1)*pr(obs2)*pr(obs3)…*pr(obsn) LL= ln[pr(obs1)]+ln[pr(obs2)]+ln[pr(obs3)]…+ln[pr(obsn)]

  9. Logistics of logistic regression • Estimate the coefficients • Assess model fit • Interpret coefficients • Check regression assumptions

  10. Assessing model fit • Between model comparisons • Pseudo R2 (similar to multiple regression) • Predictiveaccuracy per group

  11. 2[-53.69-(-68.33)] = 29.31

  12. Predictive accuracy

  13. Logistics of logistic regression • Estimate the coefficients • Assess model fit • Interpret coefficients • Check regression assumptions

  14. 3. Interpreting coefficients: direction original b reflects changes in logit: b>0 -> positive relationship exponentiated b reflects the changes in odds: exp(b) > 1 -> positive relationship 21

  15. Interpreting coefficients: significance How statistically significant is the estimation? Logistic regression uses Wald statistics (instead of t-values) However, the Wald is sometimes underestimated (you are more likely to make a Type II error) Note: This is not the WaldStatistic SPSS presents!!!

  16. 3. Interpreting coefficients: magnitude The slope coefficient (b) is interpreted as the rate of change in the "log odds" as X changes … not very useful. exp(b) is the effect of the independent variable on the odds, more useful for calculating the size of an effect 23

  17. Magnitude of association Percentage change in odds: (Exponentiated coefficienti- 1.0) * 100

  18. For variable ’previous’: Percentage change in odds = (exponentiated coefficient – 1) * 100 = 6.7 A one unit increase in previous will result in 6,7% increase in the odds So if a soccer player has a 10% higher scoring percentage, the odds that (s)he will score is 67% higher For variable ‘pswq’ (worrying) Percentage change in odds = (exponentiated coefficient – 1) * 100 = -20.6 A one unit increase in pswq will result in 20,6% decrease in the odds So if a soccer player scores 1 on the pswq test instead of 0, the odds that (s)he will score is 20.6% lower Magnitude of association 25

  19. Calculating probabilities

  20. Checking assumptions • Influential data points & Residuals • Follow Samanthas tips • Hosmer & Lemeshow • Divides sample in subgroups • Checks whether there are differences between observed and predicted between subgroups • Test should not be significant, if so: indication of lack of fit

  21. Stata file types • .ado • programs that add commands to Stata • .do • Batch files that execute a set of Stata commands • .dta • Data file in Stata’s format • .log • Output saved as plain text by thelog using command

  22. The working directory • The working directory is the default directory for any file operations such as using & saving data, or logging output • cd “d:\my work\”

  23. Saving output to log files • Syntax for the log command • log using filename [, append replace [smcl|text]] • To close a log file • log close

  24. Using and saving datasets • Load a Stata dataset • use d:\myproject\data.dta, clear • Save • save d:\myproject\data, replace • Using change directory • cd d:\myproject • Use data, clear • save data, replace

  25. Entering data • Data in other formats • You can use SPSS to convert data • You can use the infile and insheet commands to import data in ASCII format • Entering data by hand • Type edit or just click on the data-editor button

  26. Do-files • You can create a text file that contains a series of commands • Use the do-editor to work with do-files • Example I

  27. Adding comments • // or * denote comments stata should ignore • Stata ignores whatever follows after /// and treats the next line as a continuation • Example II

  28. A recommended structure //if a log file is open, close it capture log close //dont'pause when output scrolls off the page set more off //change directory to your working directory cd d:\myproject //log results to file myfile.log log using myfile, replace text // * myfile.do-written 7 feb 2010 to illustrate do-files // your commands here //close the log file log close

  29. Serious data analysis • Ensurereplicabilityuse do+log files • Document yourdo-files • What is obvioustoday, is baffling in sixmonths • Keep a research log • Diarythatincludes a description of every program you run • Develop a system fornaming files

  30. Serious data analysis • New variables shouldbegivennewnames • Use labels and notes • Double check everynewvariable • ARCHIVE

  31. The Stata syntax • Command • Whataction do you want to performed • Names of variables, files orotherobjects • Onwhatthings is the commandperformed • Qualifieronobservations • Onwhichobservationsshould the commandbeperformed • Options • What special thingsshouldbedone in executing the command

  32. Example • tabulate smoking race if agemother > 30, row • Example of the if qualifier • sum agemother if smoking == 1 & weightmother < 100

  33. Elements used for logical statements

  34. Missing values • Automatically excluded when Stata fits models, they are stored as the largest positive values • Beware • The expression ‘age > 65’ can thus also include missing values • To be sure type: ‘age > 65 & age < .’

  35. Selecting observations • drop variable list • Keep variable list • drop age if age < 65

  36. Creating new variables • generatecommand • generate age2 = age * age • generate • see help function • !!sometimes the commandegen is a usefulalternative, f.i. • gen meanage = mean(age)

  37. Useful functions

  38. Replace command • replace has the samesyntax as generatebut is used to changevalues of a variablethatalreadyexists • gen age_dum = . • replaceage = 0 ifage < 5 • replaceage = 1 ifage >=5

  39. Recode • Change values of exisiting variables • Change 1 to 2 and 3 to 4 • recode origvar (1=2)(3=4), gen(myvar1) • Change missings to 1 • recode origvar (.=1), gen(origvar)

  40. Now a little exercise • Using the clslowbwt data • give summaray statistics of the weight of the mother • Give the frequency of the number of mothersthatsmokedduringpregnacy • compute a dummy variable indicating whether mother is older than 30 • Recode the race variable – joincategory 2 and 3

  41. Regress the weight of the motheron race, smoking and age • regressdep var indep varlist

  42. Logisticregression • Logitorlogistic • estatclass • estatgof • Example III

  43. Use the `clslowbwt` data • Perform a logisticregressionanalysis of low vsnormalbirthweight. Howcanyoupredictthis? • Estimate the coefficients • Assess model fit • Interpret coefficients • Check regression assumptions

More Related