slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Lecture 5 Advanced (= Modern) Regression Analysis PowerPoint Presentation
Download Presentation
Lecture 5 Advanced (= Modern) Regression Analysis

Loading in 2 Seconds...

play fullscreen
1 / 201
satin

Lecture 5 Advanced (= Modern) Regression Analysis - PowerPoint PPT Presentation

119 Views
Download Presentation
Lecture 5 Advanced (= Modern) Regression Analysis
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. NUMERICAL ANALYSIS OF BIOLOGICAL AND ENVIRONMENTAL DATA Lecture 5 Advanced (= Modern) Regression Analysis John Birks

  2. ADVANCED REGRESSION ANALYSIS = MODERN REGRESSION ANALYSIS Generalised Linear Models (GLM) -What are GLMs? -A simple GLM -Advantages of GLM -Structure of GLM Error function Linear predictor Link function -Parameter estimation -Minimal adequate model -Concept of deviance -Model building -Model notation -Examples of models -Model criticism Classification Locally weighted regression (LOWESS) Spline functions Generalised additive models (GAM) Classification and regression trees (CART) Examples of modern techniques Artificial neural networks Software

  3. GENERALISED LINEAR MODELS Brew, J.S. & Maddy, D. 1995. Generalised linear modelling. In Statistical Modelling of Quaternary Science Data (eds. D. Maddy & J.S. Brew) – Quaternary Research Association Technical Guide 5 Crawley, M.J. 1993. GLIM for ecologists – Blackwell Crawley, M.J. 2002. Statistical computing: An introduction to data analysis using S-PLUS – Wiley Crawley, M.J. 2005. Statistics. An introduction using R - Wiley Crosbie, S.F. & Hinch, G.N. 1985. New Zealand J. Agr. Research 28, 19-29 Faraway, J.J. 2004. Linear Models with R. Chapman & Hall/CRC. Faraway, J.J. 2006. Extending the Linear Model with R – Chapman & Hall/CRC Fox, J. 2002. An R and S-PLUS companion to applied regression – Sage. McCullagh, P. & Nelder, J.A. 1989. Generalized Linear Models – Chapman & Hall Nicholls, A.O. 1989. Biological Conservation 50, 51-75 O’Brian, L. 1992. Introducing Quantitative Geography. Measurement, Methods and Generalized Linear Models - Routledge

  4. WHAT ARE GENERALISED LINEAR MODELS? Not a straight-line relationship between response variable and predictor variable. Linear model is an equation that contains mathematical variables, parameters and random variables that is LINEAR in the parameters and the random variables. y = a + bx y = a + bx + cx2 = a + bx + cz (x2 = z) y = a + bex = a + bzwherez = exponential (x) Some non-linear models can be linearised by transformation y = exp (a + bx) Logny = a + bx Michaelis-Menten equation Take reciprocals

  5. Linear models are not necessarily straight-line models: a) polynomial (y =1 + x – x2/15) b) exponential (y = 3 + 0.1ex) Inverse polynomials: a) The Michaelis-Menten or Holling functional response equation; b) The n-shaped curve 1/y = a + bx + c/x These are linear models!

  6. Some models are intrinsically non-linear: hyperbolic function asymptotic exponential No transformation can linearise them in all parameters

  7. EXAMPLES OF GENERALISED LINEAR FUNCTIONS

  8. A SIMPLE GENERALISED LINEAR MODEL Primary aim - provide a mathematical expression for use in description, interpretation, prediction, or reconstruction involving the relationship between variables y = a + bx Want to find linear combinations of predictor (= explanatory or independent) (x) variables which best predict the response variable (y). Influences estimates of a and b Systematic Error component component • Five steps: • Identification of response (y) and predictor (x) variables. • Identification of model equation. • Choice of appropriate error function for response variable. • Appropriate model parameter estimation procedures. • Appropriate model evaluation procedures. R

  9. ADVANTAGES OF GLM 1: Error function can follow several distributions, not just normal distribution. Errors may be: strongly skewed kurtotic strictly bounded (0/1, proportions, %) cannot lead to negative fitted values (counts) 2: Linear combination of the x variables, LINEAR PREDICTOR(‘eta’) may be used to predict y through a non-linear intermediary function, so-called LINK FUNCTION. Use of a non-linear link function allows the model to use response and predictor variables that are measured on different scales by effectively mapping the linear predictor onto the scale of the response variable. 3: Common framework for regression and ANOVA. 4: Can handle many problems that look non-linear. 5: Not necessary to transform data since the regression is transformed through the link function.

  10. STRUCTURE OF GENERALISED LINEAR MODEL • ERROR FUNCTION • Poisson count data • Binomial proportions, 1/0 • Gamma data with constant coefficient of variation • Exponential data on time to death (survival analysis) • CHARACTERISTICS OF COMMON GLM PROBABILITY DISTRIBUTIONS Choice depends on range of y and on the proportional relationship between variance and expected value .

  11. Some members of the exponential family of probability distributions

  12. ECOLOGICALLY MEANINGFUL ERROR DISTRIBUTIONS • Normal errors rarely adequate in ecology, but GLM offer ecologically meaningful alternatives. • Poisson. Counts: integers, non-negative, variance increases with mean. • Binomial. Observed proportions from a total: integers, non-negative, have a maximum value, variance largest at  = 0.5. • Gamma. Concentrations: non-negative real values, standard deviation increases with mean, many near-zero values and some high peaks. J. Oksanen (2002)

  13. (2) LINEAR PREDICTOR unknown parameters LINEAR STRUCTURE predictor variables To determine fit of a given model, linear predictor is needed for each value of response variable and then compares predicted value with a transformed value of y, the transformation to be applied specified by LINK FUNCTION. The fitted value is computed by applying the inverse of the link function to get back to the original scale of measurement of y. Log-link - Fitted values are anti-log of linear predictor Reciprocal link - Fitted values are reciprocal of linear predictor

  14. (3) LINK FUNCTION Link function relates the mean value of y to its linear predictor (η). η = g(μ) where g(·) is link function and μare fitted values of y. y = predictable component + error component y =  +  Linear predictor is sum of terms for each of the parameters and value of  is obtained by transforming value of y by link function and obtaining predicted value of y as inverse link function. μ = g-1(η) Can combine link function and linear predictor to form basic or core equation of GLM. y = g-1 (η) + ε Error component OR g(y) = η + ε Linear predictor Link function

  15. ENSURE FITTED VALUES STAY WITHIN REASONABLE BOUNDS

  16. Common combinations of Error Functions and Link Functions

  17. TYPES OF GLM ANALYSIS

  18. GENERALISED LINEAR MODELS – A SUMMARY Mathematical extensions of linear models that do not force data into unnatural scales. Thereby allow for non-linearity and non-constant variance structures in the data. Based on an assumed relationship (link function) between the mean of the response variable and the linear combination of the predictor variables. Data can be assumed to be from several families of probability distributions – normal, binomial, Poisson, gamma, etc – which better fit the non-normal error structures of most real-life data. More flexible and better suited for analysing real-life data than 'conventional' regression techniques.

  19. PARAMETER ESTIMATION Given error function and link function can now formulate linear predictor term. Need to be able to estimate its parameters and find linear predictor that minimises the deviance. Normal distribution, least-squares algorithm appropriate. Other error functions need maximum likelihood estimation. In maximum likelihood, aim is find parameter values that give ‘best fit’ to the data. Best in ML considers: 1. data on response variable y 2. model specification 3. parameter estimates Need to find the MINIMAL ADEQUATE MODEL to describe the data. ‘BEST’ model is that producing the minimal residual deviance subject to the constraint that all the parameters in the model are statistically significant. Model should be minimal because of principle of parsimony and adequate because there is no point in retaining an inadequate model that does not describe a significant part of the variation in the data. NO ONE MODEL, many possible models may be adequate. Need to find MINIMAL ADEQUATE MODEL.

  20. PRINCIPLE OF PARSIMONY (Ockham’s Razor) • Models should have as few parameters as possible. • Linear models are to be preferred to non-linear models.  • Models relying on few assumptions are to be preferred to models with many assumptions. • Models should be simplified until they are minimal adequate. • Simple explanations are to be preferred to complex ones. • Maximum likelihood estimation, given the data, model, link, and error functions, provides values for the parameters by finding iteratively the parameter values in the model that would make the data most likely, i.e. to find the parameter values that maximise the likelihood of the data being observed. • Depends not only on the data but on the model specification.

  21. CONCEPT OF DEVIANCE Deviance - measure of the goodness of fit Fitted values are most unlikely to match the observed data perfectly. Size of discrepancy between model and data is a measure of the inadequacy of the model. DEVIANCE is measure of discrepancy. Twice the log likelihood of the observed data under a specified model. Its value is defined relative to an arbitrary constant, so that only differences in DEVIANCE (i.e. ratios of likelihoods) have any useful meaning. CONSTANT is deviance for FULL MODEL ­– parameter for each observation – is zero. Discrepancy of fit is proportional to twice the difference between the maximum log likelihood achievable and that attained using a particular model. OTHER OUTPUT FOR GLM Parameter estimates, standard errors, t-values Standardised parameter estimates (estimates/se) Fitted values Covariance matrix for parameter estimates Standardised residuals

  22. REF CALCULATION OF DEVIANCE REF The formulae used by GLIM in calculating deviance, where y is the data and μ is the fitted value under the model in question (the grand mean in the simplest case); note that, for the grand mean, the term Σ(y – μ) = 0 in the Poisson deviance, and so this reduces to 2Σyln(y/μ); in the binomial deviance, n is the sample size (the binomial denominator), out of which y successes were obtained. REF REF

  23. MODEL BUILDING • Aim is to find minimal adequate model and use deviance as principal criterion for assessing different models. • GENERAL LINEAR MODELS • Common framework for Regression Analysis and ANOVA • Goodness of fit: Sum of Squares (SS) • Least squares estimation • Degrees of freedom (df) = {Number of observations} minus {number of parameters}, or df = n – p • Statistical testing: Compare two models with different number (p and m) of estimated parameters

  24. REGRESSION ANALYSIS Is the regression coefficient significant? μ = b0 df = N – 1 SS0 μ = b0 + b1x df = N – 2 SSA

  25. B A C ANOVA (Analysis of variance) Are the class means equal? μ = b0 df = N – 1 SS0 μ = b0 + b1B + b2C df = N – 3 SSA

  26. In GLM we have DEVIANCE RATIO TEST To consider if model A is a significant improvement over model B, we use: F corresponding to  =0.05df1 = dfA – dfB df2 = dfB Value greater than tabulated value of F would indicate model A is a significant improvement over model B.

  27. REF LEAST SQUARES AND MAXIMUM LIKELIHOOD REF Least squares maximize Normal log-likelihood Other error distributions can be used in analogous way Deviance is based on log-likelihood, and has the same distribution - Deviance = 0: Observed and fitted values are equal (= ‘deviation’) - Deviance is always positive Log-likelihood, Sum of Squares and Deviance follow Chi-Squared distribution Scaled Chi-Squared distribution follows F distribution REF REF

  28. STATISTICAL TESTING IN GLM Deviance: Same distribution as Sum of Squares - Chi-squared: Model fits - F test: Scaled deviance Tests exactly like general linear models Expected value of deviance = degrees of freedom Overdispersion: Model does not fit - Deviance > degrees of freedom Deviance must be scaled - Divide by overdispersion coefficient (D/df) - Use F test (scaling automatic)

  29. GOODNESS OF FIT AND MODEL INFERENCE • Deviance: Measure of goodness of fit • – Derived from the error function: Residual sum of squares in Normal error • – Distributed approximately like x2 • Residual degrees of freedom: Each fitted parameter uses one degree of freedom and (probably) reduces the deviance. • Inference: Compare change in deviance against change in degrees of freedom • Overdispersion: Deviance larger than expected under strict likelihood model • Use F–statistic in place of x2. J. Oksanen (2002)

  30. MODEL BUILDING The aim of the exercise is to determine the minimal adequate model in which all the parameters are significantly different from zero. This is achieved by a step-wise process of model simplification, beginning with the full model, then proceeding by the elimination of non-significant terms, and the retention of significant terms.

  31. MINIMAL ADEQUATE MODEL Adequate model is statistically as acceptable as the most complex model Start with all explanatory variables in the model: Full model Try all models and accept the minimal adequate model Minimal adequate model is - Adequate itself -Has no adequate submodels If you are lucky, you have only one adequate model which is minimal as well If the full model has Sum of Squares SSf with p parameters, the tested model is adequate if its SSr satisfies: SSr / SSf > 1+pFα,p,n-p-1/(n-p-1) α is the risk level adjusted for number of parameters, e.g. α = 1-0.05p

  32. The steps involved in model simplification. There are no hard and fast rules, and this is only a guide to one sensible way of approaching the problem of model simplification.

  33. EXAMPLE OF FINDING MINIMAL ADEQUATE MODEL Effect of altitude on sulphur concentration in terricolous lichens Explanatory variables - ALT: Altitude (m) - SPE: Species (Cetraria nivalis, Hypogymnia physodes) - EXP: Exposition (E, W) - FJE: Fjell (three alternatives) Parameters - n= 72, p– 1 = 23, df = 48, α = 1 – .0523 = 0.693  Minimal adequate model: - RSSr/RSSf = 1 + 23 · 0.819 / 48 = 1.392

  34. TOOLS FOR FINDING MINIMAL ADEQUATE MODEL OR PARSIMONY AIC - Akiake information criterion (or penalised log likelihood) BIC - Bayes information criterion AIC = -2 x log likelihood + 2(parameters + 1) (1 is added for the estimated variance, an additional parameter) BIC = -2 x log likelihood + logen(parameters + 1) R

  35. More parameters in the model, better the fit but less and less explanatory power. Trade-off between goodness of fit and the number of parameters. AIC and BIC penalise any superfluous parameters by adding 2p (AIC) or logen times p (BIC) to the deviance. AIC applies a relatively light penalty for lack of parsimony. BIC applies a heavier penalty for lack of parsimony. Select the model that gives the lowest AIC and/or BIC. R

  36. Additive model A + B + X Linear predictor constant parameter Parameter with appro-priate factor level Interaction term between main effects of B and X Model For each lithology factor level REF MODEL NOTATION REF Variables X and Y Factors A,B,C with levels i, j, k (categorical variables) Model formula involves parameters being added to model, one for each variable and (n – 1) for each n level factor. Proportions of a given lithology (A – factor) may depend on depth (X – variable) and site (B – factor). What if proportion of a given lithology A may depend on depth and site in such a way that the effect of depth is different at different sites. Interaction term between two factorsAandBisA.Band introduces a new factor ()ijfor eachcombination of factor levels. Interaction term between two variablesXandY (X.Y) is equal to new variableZ = (XY). Multiple interactions:A.B.C = A + B + C + A.B + A.C + B.C + A.B.C REF

  37. EXAMPLES OF GLMs TAYLOR (1980): California precipitation – 30 localities

  38. (a) Location of California weather stations; (b) Map of regression residuals; (c) Map of regression residuals from second analysis.

  39. Pine and spruce needle damage and SO2 emissions

  40. Predicted damages and their 95% confidence limits against sulphur concentration of Scots pine needles. The regression model was fitted with different levels (heights of the peaks) for the transects and using observed shoot lengths as offset; the lines shown correspond to transect 1 and 1cm shoot length

  41. Diatom – pH responses The Gaussian response curve for the abundance value (y) of a taxon against an environmental variable (x) (u = optimum or mode; t = tolerance; c = maximum).

  42. Gaussian Logit Model yk(x) = yk(x) is expected proportional abundance of taxon k as a function of x (pH) Generalised linear model log = b0 + b1x + b2x2 where p is shorthand for yk(x)

  43. Gaussian response function: GLM estimation μ = h exp log μ = b0 + b1x + b2x2 • Gaussian response function can be written as a generalized linear model (which is easy to fit) - Linear predictor: explanatory variables x and x2 - Link function log (or logit) - Error Poisson (or Binomial) • The original Gaussian response parameters can be found by u =-b1/2b2OPTIMUM t = TOLERANCE h = exp(b0 - b12 / 4b2) HEIGHT

  44. Results of fitting Gaussian logit, linear logit and null models to the SWAP 167-lake training set and lake-water pH 225 taxa

  45. SEVERAL GRADIENTS • Gaussian response can be fitted to several gradients: Bell-shaped models J. Oksanen (2002)

  46. INTERACTIONS IN GAUSSIAN RESPONSES • No interactions: responses parallel to the gradients • Interactions: the optimum on one gradient depends on the other J. Oksanen (2002)