1 / 28

Workshop in R & GLMs: #3

Workshop in R & GLMs: #3. Diane Srivastava University of British Columbia srivast@zoology.ubc.ca. Housekeeping. ls() asks what variables are in the global environment rm(list=ls()) gets rid of EVERY variable q() quit, get a prompt to save workspace or not. hard~dens.

osbourne
Download Presentation

Workshop in R & GLMs: #3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Workshop in R & GLMs: #3 Diane Srivastava University of British Columbia srivast@zoology.ubc.ca

  2. Housekeeping ls() asks what variables are in the global environment rm(list=ls()) gets rid of EVERY variable q() quit, get a prompt to save workspace or not

  3. hard~dens

  4. hard^0.45~dens

  5. log(hard)~dens

  6. Janka exercise Conclusion: The best y transformation to optimize the model fit (highest log likelihood)… ..is not the best y transformation for normal residuals

  7. This workshop • Linear, general linear, and generalized linear models. • Understand how GLMs work [Excel simulation] • Definitions: e.g. deviance, link functions • Poisson GLMs[R exercise] • Binomial distribution and logistic regression • Fit GLMs in R! [Exercise]

  8. But wait…couldn’t we just code a categorical variable to be continuous? In the beginning there were… Linear models: a normally-distributed y fit to a continuous x Y x 1.2 0 1.3 0 1.1 1 0.9 1

  9. But wait…why do we force our data to be normal when often it isn’t? Then there were… General Linear Models: a normally-distributed y fit to a continuous OR categorical x

  10. Proud to be Poisson ! Generalized linear models No more need for tedious transformations! All variances are unequal, but some are more unequal than others… Distribution solution ! Because most things in life aren’t normal !

  11. Log Y X What linear models do: Y X • Transform y • Fit line to transformed y • Back transform to linear y

  12. What GLMs do: Log fitted values Y X X • Start with an arbitrary fitted line • Back-transform line into linear space • Calculate residuals • Improve fitted line to maximize likelihood Many iterations

  13. Maximum likelihood • Means that an iterative process is used to find the model equation that has the highest probability (likelihood) of explaining the y values given the x values. • Equation for likelihood depends on the error distribution chosen • Least squares – by contrast – minimizes variation from the model. • If the data are normally distributed, maximum likelihood gives the same answer as least squares.

  14. GLM simulation exercise • Simulates fitting a model with normal errors and a log link to data. • Your task: • understand how the spreadsheet works • find through an iterative process the best slope

  15. Generalized linear models In least squares, we fit: y=mx + b + error In GLM, the model is fit more indirectly: y=g(mx + b + error) where g is a function, the inverse of which is called the “link function”: linkfn(expected y) = mx + b + error

  16. Uses least squares Assumes normality Based on Sum of Squares Fits model to transformed y Uses maximum likelihood Specify one of several distributions Based on deviance Fits model to untransformed y by means of a link function LMs vs GLMs

  17. All that really matters… • By using a log link function, we do not need to calculate log(0). • Be careful! A log link model predicts log y not y! • Error distribution need not be normal : Poisson, binomial, gamma, Gaussian (=normal)

  18. Exercise 1. Open up the file : Rlecture.csv diane<-read.table(file.choose(),sep=“,",header=TRUE) 2. Look at dataframe. Make treat a factor (“treat”) 3. Fit this model: my.first.glm<-glm(growth~size*treat, family=poisson (link=log), data=diane); summary(my.first.glm) 4. Model dignostics par(mfrow=c(2,2)); plot(my.first.glm)

  19. Overdispersion Underdispersed Overdispersed Random

  20. Overdispersion Is your residual deviance = residual df (approx.)? If residual dev>>residual df, overdispersed. If residual dev<<residual df, underdispersed. Solution: second.glm<-glm(growth~size*treat, family = quasipoisson (link=log), data=diane); summary(second.glm)

  21. Options family default link other links binomial logit probit, cloglog gaussian identity Gamma -- identity,inverse, log poisson log identity, sqrt

  22. Rlecture.csv

  23. Binomial errors • Variance gets constrained near limits; binomial accounts for this • Type 1: Classic example: series of trials resulting in success (value=1) or failure (value=0). • Type 2: Also continuous but bounded (e.g. % mortality bounded between 0% and 100%).

  24. Logistic regression • Least squares: arcsine transformations • GLMs: use logit (or probit) link with binomial errors y x

  25. Logit p = proportion of successes If p = eax+b / (1+ eax+b) calculate: loge(p/1-p)

  26. Logits continued Output from logistic regression with logit link: predicted loge (p/1-p) = a+bx To obtain any expected values of p, need to input a and b in original equation: p = eax+b / (1+ eax+b)

  27. Binomial GLMs Type 1 binomial • Simply set family = binomial (link=logit) Type 2 binomial • First create a vector of % not parasitized. • Then “cbind” into a matrix (% parasitized, % not parasitized) • Then run your binomial glm (link = logit) with the matrix as your y.

  28. Homework • Fit the binomial glm survival = size*treat • 2. Fit the bionomial glm parasitism = size*treat • 3. Predict what size has 50% parasitism in treatment “0”

More Related