stats 330 lecture 32
Download
Skip this Video
Download Presentation
Stats 330: Lecture 32

Loading in 2 Seconds...

play fullscreen
1 / 32

Stats 330: Lecture 32 - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

Stats 330: Lecture 32. Course Summary. The Exam!. 25 multiple choice questions, similar to term test (60\% for 330, 50\% for 762) 3 “long answer” questions, similar to past exams (STATS 330) You have to do all the multiple choice questions, and 2 out of 3 “long answer” questions

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Stats 330: Lecture 32' - talbot


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
stats 330 lecture 32
Stats 330: Lecture 32

Course

Summary

the exam
The Exam!
  • 25 multiple choice questions, similar to term test (60% for 330, 50% for 762)
  • 3 “long answer” questions, similar to past exams (STATS 330) You have to do all the multiple choice questions, and 2 out of 3 “long answer” questions
  • (STATS 762) You have to do a compulsory extra question
  • Held on am of Wed 31th October 2012
help schedule
Help Schedule
  • I will be available in Rm 265 from 10:30 am to 12:00 on Monday, Tuesday and Wednesday in the week before the exam, my schedule permitting
    • Tutors: as per David Smiths’s email
stats 330 course summary
STATS 330: Course Summary

The course was about

  • Graphics for data analysis
  • Regression models for data analysis
graphics
Graphics

Important ideas:

  • Visualizing multivariate data
    • Pairs plots
    • 3d plots
    • Coplots
    • Trellis plots
      • Same scales
      • Plots in rows and columns
  • Diagnostic plots for model criticism
regression models
Regression models

We studied 3 types of regression:

  • “Ordinary” (normal, least squares) regression for continuous responses
  • Logistic regression for binomial responses
  • Poisson regression for count responses

(log-linear models)

normal regression
Normal regression
  • Response is assumed to be N(m,s2)
  • Mean is a linear function of the covariates
  • m = b0 + b1x1 + . . . + bkxk
  • Covariates can be eithercontinuous or categorical
  • Observations independent, same variance
logistic regression
Logistic regression
  • Response ( s “successes” out of n) is assumed to be Binomial Bin(n,p)
  • Logit of Probability log(p/(1-p))is a linear function of the covariates
  • log(p/(1-p)) = b0 + b1x1 + . . . + bkxk
  • Covariates can be eithercontinuous or categorical
  • Observations independent
poisson regression
Poisson regression
  • Response is assumed to be Poisson(m)
  • Log of mean log(m) is a linear function of the covariates (log-linear models)
  • log(m) = b0 + b1x1 + . . . + bkxk
  • (Or, equivalently
  • m = exp(b0 + b1x1 + . . . + bkxk)
  • Covariates can be eithercontinuous or categorical
  • Observations independent
interpretation of b coefficients
Interpretation of b -coefficients
  • For continuous covariates:
    • In normal regression, b is the increase in mean response associated with a unit increase in x
    • In logistic regression, b is the increase in log odds associated with a unit increase in x
    • In Poisson regression, b is the increase in log mean associated with a unit increase in x
    • In logistic regression, if x is increased by 1, the odds are increased by a factor of exp(b)
    • In Poisson regression, if x is increased by 1, the mean is increased by a factor of exp(b)
interpretation of b coefficients1
Interpretation of b -coefficients
  • For categorical covariates (main effects only):
    • In normal regression, b is the increase in mean response relative to the baseline
    • In logistic regression, b is the increase in log odds relative to the baseline
    • In logistic regression, if we change from baseline to some level, the odds are increased by a factor of exp(parameter for that level) relative to the baseline
    • In Poisson regression, if we change from baseline to some level, the mean is increased by a factor of exp(parameter for that level) relative to the baseline
measures of fit
Measures of Fit
  • R2 (for normal regression)
  • Residual Deviance (for Logistic and Poisson regression)
    • But not for ungrouped data in logistic, or Poisson with very small means (cell counts)
prediction
Prediction
  • For normal regression,
    • Predict response at covariates x1, . . . ,xk
    • Estimate mean response at covariates x1, . . . ,xk
  • For logistic regression,
    • estimate log-odds at covariates x1, . . . ,xk
    • Estimate probability of “success” at covariates x1, . . . ,xk
  • For Poisson regression,
    • Estimate mean at covariates x1, . . . ,xk
inference
Inference
  • Summary table
    • Estimates of regression coefs
    • Standard errors
    • Test stats for coef = 0
    • R2 etc (normal regression)
    • F-test for null model
    • Null and residual deviances (logistic/Poisson)
testing model vs sub model
Testing model vs sub-model
  • Use and interpretation of both forms of anova
    • Comparing model with a sub-model
    • Adding successive terms to a model
topics specific to normal regression
Topics specific to normal regression
  • Collinearity
    • VIF’s
    • Correlation
    • Added variable plots
  • Model selection
    • Stepwise procedures: FS, BE, stepwise
    • All possible regressions approach
      • AIC, BIC, CP, adjusted R2, CV
factors categorical explanatory variables
Factors (categorical explanatory variables)
  • Factors
    • Baselines
    • Levels
    • Factor level combinations
    • Interactions
    • Dummy variables
    • Know how to express interactions in terms of means, means in terms of interactions
    • Know how to interpret zero interactions
fitting and choosing models
Fitting and Choosing models
  • Fit a separate plane (mean if no continuous covariates) to each combination of factor levels
  • Search for a simpler submodel (with some interactions zero) using stepwise and anova
diagnostics
Diagnostics
  • For non-planar data
    • Plot res/fitted, res/x’s, partial residual plots, gam plots, box-cox plot
    • Transform either x’s or response, fit polynomial terms
  • For unequal variance
    • Plot res/ fitted, look for funnel effect
    • Weighted least squares
    • Transform response
diagnostics 2
Diagnostics (2)
  • For outliers and high-leverage points
    • Hat matrix diagonals
    • Standardised residuals,
    • Leave-one-out diagnostics
  • Independent observations
    • Acf plots
    • Residual/previous residual
    • Time series plot of residuals
    • Durbin-Watson test
diagnostics 3
Diagnostics (3)
  • Normality
    • Normal plot
    • Weisberg-Bingham test
    • Box Cox (select power)
deviance
Deviance

Deviance = 2(log LMAX - log LMOD)

  • log LMAX: replace p’s with frequencies si/ni
  • log LMOD: replace p’s with estimated p’s from logistic model i.e.

Can’t use as goodness of fit measure in ungrouped case

odds and log odds
Odds and log-odds
  • Probabilities p: Pr(Y=1)
  • Odds p /(1- p)
  • Log-odds: log p/(1- p)
residuals
Residuals
  • Pearson
  • Deviance
topics specific to poisson regression
Topics specific to Poisson regression
  • Offsets
  • Interpretation of regression coefficients
    • (same as for odds in logistic regression)
  • Correspondence between Poisson regression (Log-linear models) and the multinomial model for contingency tables
    • The “Poisson trick”
contingency tables
Contingency tables
  • Cells 1,2,…, m
  • Cell probabilities p1, . . . , pm
  • Counts y1, . . . , ym
  • Log-likelihood is
contingency tables 2
Contingency tables (2)
  • A “model” for the table is anything that specifies the form of the probabilities, possibly up to k unknown parameters
  • Test if the model is OK by
    • Calculate Deviance = 2(log LMAX - log LMOD)

log LMAX: replace p’s with table frequencies

log LMOD: replace p’s with estimated p’s from the model

    • Model OK if deviance is small,(p-value > 0.05)
    • Degrees of freedom m - 1 - k
    • k = number of parameters in the model
independence models
Independence models
  • Correspond to interactions being zero
  • Fit a “saturated” model using Poisson regression
  • Use anova, stepwise to see which interactions are zero
  • Identify the appropriate model
  • Models can be represented by graphs
odds ratios
Odds Ratios
  • Definition and interpretation
  • Connection to independence
  • Connection with interactions
  • Relationship between conditional OR’s and interactions
  • Homogeneous association model
association graphs
Association graphs
  • Each node is a factor
  • Factors joined by lines if an interaction between them
  • Interpretation in terms of conditional independence
  • Interpretation in terms of collapsibility
contingency tables final topics
Contingency tables: final topics
  • Association reversal
    • Simpson’s paradox
    • When can you collapse
  • Product multinomial
    • comparing populations
    • populations the same if certain interactions are zero
  • Goodness of fit to a distribution
    • Special case of 1-dimensional table
ad