Stats 330 lecture 32
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

Stats 330: Lecture 32 PowerPoint PPT Presentation


  • 83 Views
  • Uploaded on
  • Presentation posted in: General

Stats 330: Lecture 32. Course Summary. The Exam!. 25 multiple choice questions, similar to term test (60% for 330, 50% for 762) 3 “long answer” questions, similar to past exams (STATS 330) You have to do all the multiple choice questions, and 2 out of 3 “long answer” questions

Download Presentation

Stats 330: Lecture 32

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Stats 330 lecture 32

Stats 330: Lecture 32

Course

Summary


The exam

The Exam!

  • 25 multiple choice questions, similar to term test (60% for 330, 50% for 762)

  • 3 “long answer” questions, similar to past exams (STATS 330) You have to do all the multiple choice questions, and 2 out of 3 “long answer” questions

  • (STATS 762) You have to do a compulsory extra question

  • Held on am of Wed 31th October 2012


Help schedule

Help Schedule

  • I will be available in Rm 265 from 10:30 am to 12:00 on Monday, Tuesday and Wednesday in the week before the exam, my schedule permitting

    • Tutors: as per David Smiths’s email


Stats 330 course summary

STATS 330: Course Summary

The course was about

  • Graphics for data analysis

  • Regression models for data analysis


Graphics

Graphics

Important ideas:

  • Visualizing multivariate data

    • Pairs plots

    • 3d plots

    • Coplots

    • Trellis plots

      • Same scales

      • Plots in rows and columns

  • Diagnostic plots for model criticism


Regression models

Regression models

We studied 3 types of regression:

  • “Ordinary” (normal, least squares) regression for continuous responses

  • Logistic regression for binomial responses

  • Poisson regression for count responses

    (log-linear models)


Normal regression

Normal regression

  • Response is assumed to be N(m,s2)

  • Mean is a linear function of the covariates

  • m = b0 + b1x1 + . . . + bkxk

  • Covariates can be eithercontinuous or categorical

  • Observations independent, same variance


Logistic regression

Logistic regression

  • Response ( s “successes” out of n) is assumed to be Binomial Bin(n,p)

  • Logit of Probability log(p/(1-p))is a linear function of the covariates

  • log(p/(1-p)) = b0 + b1x1 + . . . + bkxk

  • Covariates can be eithercontinuous or categorical

  • Observations independent


Poisson regression

Poisson regression

  • Response is assumed to be Poisson(m)

  • Log of mean log(m) is a linear function of the covariates (log-linear models)

  • log(m) = b0 + b1x1 + . . . + bkxk

  • (Or, equivalently

  • m = exp(b0 + b1x1 + . . . + bkxk)

  • Covariates can be eithercontinuous or categorical

  • Observations independent


Interpretation of b coefficients

Interpretation of b -coefficients

  • For continuous covariates:

    • In normal regression, b is the increase in mean response associated with a unit increase in x

    • In logistic regression, b is the increase in log odds associated with a unit increase in x

    • In Poisson regression, b is the increase in log mean associated with a unit increase in x

    • In logistic regression, if x is increased by 1, the odds are increased by a factor of exp(b)

    • In Poisson regression, if x is increased by 1, the mean is increased by a factor of exp(b)


Interpretation of b coefficients1

Interpretation of b -coefficients

  • For categorical covariates (main effects only):

    • In normal regression, b is the increase in mean response relative to the baseline

    • In logistic regression, b is the increase in log odds relative to the baseline

    • In logistic regression, if we change from baseline to some level, the odds are increased by a factor of exp(parameter for that level) relative to the baseline

    • In Poisson regression, if we change from baseline to some level, the mean is increased by a factor of exp(parameter for that level) relative to the baseline


Measures of fit

Measures of Fit

  • R2 (for normal regression)

  • Residual Deviance (for Logistic and Poisson regression)

    • But not for ungrouped data in logistic, or Poisson with very small means (cell counts)


Prediction

Prediction

  • For normal regression,

    • Predict response at covariates x1, . . . ,xk

    • Estimate mean response at covariates x1, . . . ,xk

  • For logistic regression,

    • estimate log-odds at covariates x1, . . . ,xk

    • Estimate probability of “success” at covariates x1, . . . ,xk

  • For Poisson regression,

    • Estimate mean at covariates x1, . . . ,xk


Inference

Inference

  • Summary table

    • Estimates of regression coefs

    • Standard errors

    • Test stats for coef = 0

    • R2 etc (normal regression)

    • F-test for null model

    • Null and residual deviances (logistic/Poisson)


Testing model vs sub model

Testing model vs sub-model

  • Use and interpretation of both forms of anova

    • Comparing model with a sub-model

    • Adding successive terms to a model


Topics specific to normal regression

Topics specific to normal regression

  • Collinearity

    • VIF’s

    • Correlation

    • Added variable plots

  • Model selection

    • Stepwise procedures: FS, BE, stepwise

    • All possible regressions approach

      • AIC, BIC, CP, adjusted R2, CV


Factors categorical explanatory variables

Factors (categorical explanatory variables)

  • Factors

    • Baselines

    • Levels

    • Factor level combinations

    • Interactions

    • Dummy variables

    • Know how to express interactions in terms of means, means in terms of interactions

    • Know how to interpret zero interactions


Fitting and choosing models

Fitting and Choosing models

  • Fit a separate plane (mean if no continuous covariates) to each combination of factor levels

  • Search for a simpler submodel (with some interactions zero) using stepwise and anova


Diagnostics

Diagnostics

  • For non-planar data

    • Plot res/fitted, res/x’s, partial residual plots, gam plots, box-cox plot

    • Transform either x’s or response, fit polynomial terms

  • For unequal variance

    • Plot res/ fitted, look for funnel effect

    • Weighted least squares

    • Transform response


Diagnostics 2

Diagnostics (2)

  • For outliers and high-leverage points

    • Hat matrix diagonals

    • Standardised residuals,

    • Leave-one-out diagnostics

  • Independent observations

    • Acf plots

    • Residual/previous residual

    • Time series plot of residuals

    • Durbin-Watson test


Diagnostics 3

Diagnostics (3)

  • Normality

    • Normal plot

    • Weisberg-Bingham test

    • Box Cox (select power)


Specifics for logistic regression

Specifics for Logistic Regression

Log-likelihood is


Deviance

Deviance

Deviance = 2(log LMAX - log LMOD)

  • log LMAX: replace p’s with frequencies si/ni

  • log LMOD: replace p’s with estimated p’s from logistic model i.e.

Can’t use as goodness of fit measure in ungrouped case


Odds and log odds

Odds and log-odds

  • Probabilities p: Pr(Y=1)

  • Odds p /(1- p)

  • Log-odds: log p/(1- p)


Residuals

Residuals

  • Pearson

  • Deviance


Topics specific to poisson regression

Topics specific to Poisson regression

  • Offsets

  • Interpretation of regression coefficients

    • (same as for odds in logistic regression)

  • Correspondence between Poisson regression (Log-linear models) and the multinomial model for contingency tables

    • The “Poisson trick”


Contingency tables

Contingency tables

  • Cells 1,2,…, m

  • Cell probabilities p1, . . . , pm

  • Counts y1, . . . , ym

  • Log-likelihood is


Contingency tables 2

Contingency tables (2)

  • A “model” for the table is anything that specifies the form of the probabilities, possibly up to k unknown parameters

  • Test if the model is OK by

    • Calculate Deviance = 2(log LMAX - log LMOD)

      log LMAX: replace p’s with table frequencies

      log LMOD: replace p’s with estimated p’s from the model

    • Model OK if deviance is small,(p-value > 0.05)

    • Degrees of freedom m - 1 - k

    • k = number of parameters in the model


Independence models

Independence models

  • Correspond to interactions being zero

  • Fit a “saturated” model using Poisson regression

  • Use anova, stepwise to see which interactions are zero

  • Identify the appropriate model

  • Models can be represented by graphs


Odds ratios

Odds Ratios

  • Definition and interpretation

  • Connection to independence

  • Connection with interactions

  • Relationship between conditional OR’s and interactions

  • Homogeneous association model


Association graphs

Association graphs

  • Each node is a factor

  • Factors joined by lines if an interaction between them

  • Interpretation in terms of conditional independence

  • Interpretation in terms of collapsibility


Contingency tables final topics

Contingency tables: final topics

  • Association reversal

    • Simpson’s paradox

    • When can you collapse

  • Product multinomial

    • comparing populations

    • populations the same if certain interactions are zero

  • Goodness of fit to a distribution

    • Special case of 1-dimensional table


  • Login