1 / 16

BIO503: Lecture 4 Statistical models in R --- Recap ---

BIO503: Lecture 4 Statistical models in R --- Recap ---. Stefan Bentink bentink@jimmy.harvard.edu. residual error. dependent variable. intercept. independent variable. regression coefficient. Linear Regression Models.

adin
Download Presentation

BIO503: Lecture 4 Statistical models in R --- Recap ---

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BIO503: Lecture 4Statistical models in R--- Recap --- Stefan Bentink bentink@jimmy.harvard.edu

  2. residual error dependent variable intercept independent variable regression coefficient Linear Regression Models Using the methods of least squares, we can derive the following estimators: Our goal is to test the hypothesis: We can do this with a T test: under the null hypothesis, this follows a T distribution with (n-1) df.

  3. my.model <- lm(y ~ x)

  4. Some important functions my.model <- lm(x~y) summary(my.model) anova(my.model) predict(my.model,new.data)

  5. Specifying Models In R we use model formula to specify the model we want to fit to our data. y ~ x Simple Linear Regression y ~ x – 1 Simple Linear Regression without the intercept (line goes through origin) y ~ x1 + x2 + x3 Multiple Regression y ~ x + I(x^2) Quadratic Regression log(y) ~ x1 + x2 Multiple Regression of Transformed Variable For factors A, B: y ~ A 1-way ANOVA y ~ A + B 2-way ANOVA y ~ A*B 2-way ANOVA + interaction term

  6. ANOVA Example Let's use a different dataset: > library(MASS) > data(ChickWeight) > attach(ChickWeight) The factor Diet has 4 levels. > levels(Diet) > anova(lm(weight ~ Diet, data=ChickWeight)) Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) Diet 3 155863 51954 10.81 6.433e-07 Residuals 574 2758693 4806

  7. Two-way ANOVA We can fit a two-way ANOVA: > anova(lm(weight ~ Diet + Chick, data=ChickWeight)) Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) Diet 3 155863 51954 11.5045 2.571e-07 Chick 46 374243 8136 1.8015 0.001359 Residuals 528 2384450 4516 The interpretation of the model output is sequential, from the bottom to the top. This line tests the model: weight ~ Diet + Chick This line tests the model: weight ~ Diet vs weight ~ Diet + Chick.

  8. Generalized Linear Models Linear regression models hinge on the assumption that the response variable follows a Normal distribution. Generalized linear models are able to handle non-Normal response variables and transformations to linearity.

  9. Logistic Regression When faced with a binary response Y = (0,1), we use logistic regression. where Logit

  10. Fit the Logistic Regression Model > anes.logit <- glm(move ~ conc, family=binomial(link=logit), data=anesthetic) The output summary looks like this: > summary(anes.logit) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 6.469 2.418 2.675 0.00748 ** conc -5.567 2.044 -2.724 0.00645 ** Estimates of P(Y=1) are given by: > fitted.values(anes.logit)

  11. Update models and model selection Some handy functions to know about: new.model <- update(old.model, new.formula) Model Selection functions available in the MASS package drop1, dropterm add1, addterm step stepAIC

  12. Survival analysis

  13. Problem 5 – Survival Analysis • Read in the data file aml.txt. This data stores the survival data on patients with Acute Myelogenous Leukemia. • Compute the Kaplan-Meier estimate for all patients in this data. Compute the corresponding Kaplan-Meier plot. Construct Kaplan-Meier plots grouped by chemotherapy status. • Using a log-rank test, test if the two survival curves (patients on maintenance chemotherapy, patients who are not) are identical. • Fit a Cox proportional hazards model to the data set. • Plot these survival functions for patients from the different groups.

  14. Survival Analysis library(survival) Example: aml leukemia data Kaplan-Meier curve fit1 <- survfit(Surv(aml$time,aml$status)~1) summary(fit1) plot(fit1) Log-rank test survdiff(Surv(time, status)~x, data=aml)

  15. Survival analysis > cp <- coxph(Surv(aml$time, + aml$status)~x,data=aml) > > summary(cp) > > plot(survfit(Surv(aml$time,aml$status)~x, + data=aml),col=c("red","green"),lwd=2)

More Related