BIO503: Lecture 4 Statistical models in R --- Recap ---

BIO503: Lecture 4Statistical models in R--- Recap --- Stefan Bentink bentink@jimmy.harvard.edu

residual error dependent variable intercept independent variable regression coefficient Linear Regression Models Using the methods of least squares, we can derive the following estimators: Our goal is to test the hypothesis: We can do this with a T test: under the null hypothesis, this follows a T distribution with (n-1) df.

my.model <- lm(y ~ x)

Some important functions my.model <- lm(x~y) summary(my.model) anova(my.model) predict(my.model,new.data)

Specifying Models In R we use model formula to specify the model we want to fit to our data. y ~ x Simple Linear Regression y ~ x – 1 Simple Linear Regression without the intercept (line goes through origin) y ~ x1 + x2 + x3 Multiple Regression y ~ x + I(x^2) Quadratic Regression log(y) ~ x1 + x2 Multiple Regression of Transformed Variable For factors A, B: y ~ A 1-way ANOVA y ~ A + B 2-way ANOVA y ~ A*B 2-way ANOVA + interaction term

ANOVA Example Let's use a different dataset: > library(MASS) > data(ChickWeight) > attach(ChickWeight) The factor Diet has 4 levels. > levels(Diet) > anova(lm(weight ~ Diet, data=ChickWeight)) Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) Diet 3 155863 51954 10.81 6.433e-07 Residuals 574 2758693 4806

Two-way ANOVA We can fit a two-way ANOVA: > anova(lm(weight ~ Diet + Chick, data=ChickWeight)) Analysis of Variance Table Response: weight Df Sum Sq Mean Sq F value Pr(>F) Diet 3 155863 51954 11.5045 2.571e-07 Chick 46 374243 8136 1.8015 0.001359 Residuals 528 2384450 4516 The interpretation of the model output is sequential, from the bottom to the top. This line tests the model: weight ~ Diet + Chick This line tests the model: weight ~ Diet vs weight ~ Diet + Chick.

Generalized Linear Models Linear regression models hinge on the assumption that the response variable follows a Normal distribution. Generalized linear models are able to handle non-Normal response variables and transformations to linearity.

Logistic Regression When faced with a binary response Y = (0,1), we use logistic regression. where Logit

Fit the Logistic Regression Model > anes.logit <- glm(move ~ conc, family=binomial(link=logit), data=anesthetic) The output summary looks like this: > summary(anes.logit) Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 6.469 2.418 2.675 0.00748 ** conc -5.567 2.044 -2.724 0.00645 ** Estimates of P(Y=1) are given by: > fitted.values(anes.logit)

Update models and model selection Some handy functions to know about: new.model <- update(old.model, new.formula) Model Selection functions available in the MASS package drop1, dropterm add1, addterm step stepAIC

Survival analysis

Problem 5 – Survival Analysis • Read in the data file aml.txt. This data stores the survival data on patients with Acute Myelogenous Leukemia. • Compute the Kaplan-Meier estimate for all patients in this data. Compute the corresponding Kaplan-Meier plot. Construct Kaplan-Meier plots grouped by chemotherapy status. • Using a log-rank test, test if the two survival curves (patients on maintenance chemotherapy, patients who are not) are identical. • Fit a Cox proportional hazards model to the data set. • Plot these survival functions for patients from the different groups.

Survival Analysis library(survival) Example: aml leukemia data Kaplan-Meier curve fit1 <- survfit(Surv(aml$time,aml$status)~1) summary(fit1) plot(fit1) Log-rank test survdiff(Surv(time, status)~x, data=aml)

Survival analysis > cp <- coxph(Surv(aml$time, + aml$status)~x,data=aml) > > summary(cp) > > plot(survfit(Surv(aml$time,aml$status)~x, + data=aml),col=c("red","green"),lwd=2)

BIO503: Lecture 4 Statistical models in R --- Recap ---

BIO503: Lecture 4 Statistical models in R --- Recap ---

Presentation Transcript

Lecture 4 Non-Linear and Generalized Mixed Effects Models

Part 2 Statistical Mechanics

Statistical Models of Appearance for Computer Vision

INTRODUCTION TO RECAP WORKSHOP

TR 555 Statistics “Refresher” Lecture 3: Models

Lecture # 25 Pakistan and the Muslim world II

LING / C SC 439/539 Statistical Natural Language Processing

Corpora and Statistical Methods

Statistical Modeling of Text

CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 10

CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 6 Mälardalen University

Statistical Models to Control Extraneous Factors (Confounders and Interactions)

Distributed Memory Machines and Programming Lecture 7

Part 2 Statistical Mechanics

Statistical Methods for Image Reconstruction

Antennas and Propagation Review/Recap

Statistical Models for Web Search Click Log Analysis

CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 10

Planning under Uncertainty with Markov Decision Processes: Lecture II

CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 12

Lemur Toolkit Introduction

Materials for Lecture 08