Workshop in r glms 2
Download
1 / 34

Workshop in R & GLMs: #2 - PowerPoint PPT Presentation

Workshop in R & GLMs: #2 Diane Srivastava University of British Columbia srivast@zoology.ubc.ca Start by loading your Lakedata_06 dataset: diane<-read.table(file.choose(),sep=";",header=TRUE) Dataframes Two ways to identify a column (called "treatment") in your dataframe (called "diane"):

Related searches for Workshop in R & GLMs: #2

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

Workshop in R & GLMs: #2

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Workshop in r glms 2 l.jpg

Workshop in R & GLMs: #2

Diane Srivastava

University of British Columbia

srivast@zoology.ubc.ca


Slide2 l.jpg

Start by loading your Lakedata_06 dataset:

diane<-read.table(file.choose(),sep=";",header=TRUE)


Dataframes l.jpg

Dataframes

Two ways to identify a column (called "treatment") in your dataframe (called "diane"):

diane$treatment

OR

attach(diane); treatment

At end of session, remember to: detach(diane)


Summary statistics l.jpg

Summary statistics

length (x)

mean (x)

var (x)

cor (x,y)

sum (x)

summary (x)

minimum, maximum, mean, median, quartiles

What is the correlation between two variables in your dataset?


Factors l.jpg

Factors

  • A factor has several discrete levels (e.g. control, herbicide)

  • If a vector contains text, R automatically assumes it is a factor.

  • To manually convert numeric vector to a factor:

  • x <- as.factor(x)

  • To check if your vector is a factor, and what the levels are:

  • is.factor(x) ; levels(x)


Exercise l.jpg

Exercise

Make lake area into a factor called AreaFactor:

Area 0 to 5 ha: small

Area 5.1 to 10: medium

Area > 10 ha: large

You will need to:

1. Tell R how long AreaFactor will be.

AreaFactor<-Area; AreaFactor[1:length(Area)]<-"medium"

2. Assign cells in AreaFactor to each of the 3 levels

3. Make AreaFactor into a factor, then check that it is a factor


Exercise7 l.jpg

Exercise

Make lake area into a factor called AreaFactor:

Area 0 to 5 ha: small

Area 5.1 to 10: medium

Area > 10 ha: large

You will need to:

1. Tell R how long AreaFactor will be.

AreaFactor<-Area; AreaFactor[1:length(Area)]<-"medium"

2. Assign cells in AreaFactor to each of the 3 levels

AreaFactor[Area<5.1]<-“small"; AreaFactor[Area>10]<-“large"

3. Make AreaFactor into a factor, then check that it is a factor

AreaFactor<-as.factor(AreaFactor); is.factor(AreaFactor)


Linear regression l.jpg

Linear regression

ALT+126

model <- lm (y ~ x, data = diane)

insert your

dataframe

name here

invent a name for your model

linear model

insert your

x vector name here

insert your

y vector name here


Linear regression9 l.jpg

Linear regression

model <- lm (Species ~ Elevation, data = diane)

summary (model)

Call:

lm(formula = Species ~ Elevation, data = diane)

Residuals:

Min 1Q Median 3Q Max

-7.29467 -2.75041 -0.04947 1.83054 15.00270

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 9.421568 2.426983 3.882 0.000551 ***

Elevation -0.0026090.003663 -0.712 0.482070

---

Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.811 on 29 degrees of freedom

Multiple R-Squared: 0.01719, Adjusted R-squared: -0.0167

F-statistic: 0.5071 on 1 and 29 DF, p-value: 0.4821


Linear regression10 l.jpg

Linear regression

model2 <- lm (Species ~ AreaFactor, data = diane)

summary (model2)

Coefficients:

Estimate Std. Error t value Pr(>|t|)

(Intercept) 11.833 1.441 8.210 6.17e-09 ***

AreaFactormedium -2.405 1.723 -1.396 0.174

AreaFactorsmall -8.288 1.792 -4.626 7.72e-05 ***

Large has mean species richness of 11.8

Medium has mean species richness of 11.8 - 2.4 = 9.4

Small has a mean species richness of 11.8 - 8.3 = 3.5

mean(Species[AreaFactor=="medio"])


Anova l.jpg

ANOVA

model2 <- lm (Species ~ AreaFactor, data = diane)

anova (model2)

Analysis of Variance Table

Response: Species

Df Sum Sq Mean Sq F value Pr(>F)

AreaFactor 2 333.85 166.92 13.393 8.297e-05 ***

Residuals 28 348.99 12.46


F tests in regression l.jpg

F tests in regression

model3 <- lm (Species ~ Elevation + pH, data = diane)

anova (model, model3)

Model 1: Species ~ Elevation

Model 2: Species ~ Elevation + pH

Res.Df RSS Df Sum of Sq F Pr(>F)

1 29 671.10

2 28 502.06 1 169.05 9.4279 0.004715 **

F 1, 28 = 9.43


Slide13 l.jpg

Exercise

  • Fit the model: Species~pH

  • Fit the model: Species~pH+pH2

  • ("pH2" is just pH2)

  • Use the ANOVA command to decide whether species richness is a linear or quadratic function of pH


Distributions not so normal l.jpg

Distributions: not so normal!

  • Review assumptions for parametric stats (ANOVA, regressions)

  • Why don’t transformations always work?

  • Introduce non-normal distributions


Tests before anova t tests l.jpg

Tests before ANOVA, t-tests

Normality

Constant variances


Tests for normality exercise l.jpg

Tests for normality: exercise

data<-c(rep(0:6,c(42,30,10,5,5,4,4)));data

How many datapoints are there?


Tests for normality exercise17 l.jpg

Tests for normality: exercise

  • Shapiro-Wilks (if sig, not normal)

    shapiro.test (data)

    If error message, make sure the stats package is loaded, then try again:

    library(stats); shapiro.test (data)


Tests for normality exercise18 l.jpg

Tests for normality: exercise

  • Kolmogorov-Smirnov (if sig, not normal)

    ks.test(data,”pnorm”,mean(data),sd=sqrt(var(data)))


Tests for normality exercise19 l.jpg

Tests for normality: exercise

  • Quantile-quantile plot (if wavers substantially off 1:1 line, not normal)

    par(pty="s")

    qqnorm(data); qqline(data)

opens up a single plot window


Tests for normality exercise20 l.jpg

Tests for normality: exercise


Tests for normality exercise21 l.jpg

Tests for normality: exercise

If the distribution isn´t normal, what is it?

freq.data<-table(data); freq.data

barplot(freq.data)


Non normal distributions l.jpg

Non-normal distributions

  • Poisson (count data, left-skewed, variance = mean)

  • Negative binomial (count data, left-skewed, variance >> mean)

  • Binomial (binary or proportion data, left-skewed, variance constrained by 0,1)

  • Gamma (variance increases as square of mean, often used for survival data)


Exercise23 l.jpg

Exercise

model2 <- lm (Species ~ AreaFactor, data = diane)

anova (model2)

1. Test for normality of residuals

resid2<- resid (model2)

...you do the rest!

2. Test for homogeneity of variances

summary (lm (abs (resid2) ~ AreaFactor))


Regression diagnostics l.jpg

Regression diagnostics

  • Residuals are normally distributed

  • Absolute value of residuals do not change with predicted value (homoscedastcity)

  • Residuals show no pattern with predicted values (i.e. the function “fits”)

  • No datapoint has undue leverage on the model.


Regression diagnostics25 l.jpg

Regression diagnostics

model3 <- lm (Species ~ Elevation + pH, data = diane)

par(mfrow=c(2,2)); plot(model3)


1 residuals are normally distributed l.jpg

1. Residuals are normally distributed

  • Straight “Normal Q-Q plot”

Std. deviance resid.

Theoretical Quantiles


2 absolute residuals do not change with predicted values l.jpg

2. Absolute residuals do not change with predicted values

  • No fan shape in Residuals vs fitted plot

  • No upward (or downward) trend in Scale-location plot

MALO

BUENO

Residuals

Sqrt (abs (SD resid.))

Fitted values

Fitted values


Examples of neutral and fan shapes l.jpg

Examples of neutral and fan-shapes


3 residuals show no pattern l.jpg

3. Residuals show no pattern

Curved residual plots result from fitting a straight line to non-linear data (e.g. quadratic)


4 no unusual leverage l.jpg

4. No unusual leverage

Cook’s distance > 1 indicates a point with undue leverage (large change in model fit when removed)


Transformations l.jpg

Transformations

Try transforming your y-variable to improve the regression diagnostic plot

  • replace Species with log(Species)

  • replace Species with sqrt(Species)


Slide32 l.jpg

Poisson distribution

  • Frequency data

  • Lots of zero (or minimum value) data

  • Variance increases with the mean


What do you do with poisson data l.jpg

What do you do with Poisson data?

  • Correct for correlation between mean and variance by log-transforming y (but log (0) is undefined!!)

  • Use non-parametric statistics (but low power)

  • Use a “generalized linear model” specifying a Poisson distribution


The problem hard to transform data to satisfy all requirements l.jpg

The problem: Hard to transform data to satisfy all requirements!

Tarea: Janka example

Janka dataset: Asks if Janka hardness values are a good estimate of timber density? N=36


ad
  • Login