- 86 Views
- Uploaded on
- Presentation posted in: General

Stats 330: Lecture 32

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Course

Summary

- 25 multiple choice questions, similar to term test (60% for 330, 50% for 762)
- 3 “long answer” questions, similar to past exams (STATS 330) You have to do all the multiple choice questions, and 2 out of 3 “long answer” questions
- (STATS 762) You have to do a compulsory extra question
- Held on am of Wed 31th October 2012

- I will be available in Rm 265 from 10:30 am to 12:00 on Monday, Tuesday and Wednesday in the week before the exam, my schedule permitting
- Tutors: as per David Smiths’s email

The course was about

- Graphics for data analysis
- Regression models for data analysis

Important ideas:

- Visualizing multivariate data
- Pairs plots
- 3d plots
- Coplots
- Trellis plots
- Same scales
- Plots in rows and columns

- Diagnostic plots for model criticism

We studied 3 types of regression:

- “Ordinary” (normal, least squares) regression for continuous responses
- Logistic regression for binomial responses
- Poisson regression for count responses
(log-linear models)

- Response is assumed to be N(m,s2)
- Mean is a linear function of the covariates
- m = b0 + b1x1 + . . . + bkxk
- Covariates can be eithercontinuous or categorical
- Observations independent, same variance

- Response ( s “successes” out of n) is assumed to be Binomial Bin(n,p)
- Logit of Probability log(p/(1-p))is a linear function of the covariates
- log(p/(1-p)) = b0 + b1x1 + . . . + bkxk
- Covariates can be eithercontinuous or categorical
- Observations independent

- Response is assumed to be Poisson(m)
- Log of mean log(m) is a linear function of the covariates (log-linear models)
- log(m) = b0 + b1x1 + . . . + bkxk
- (Or, equivalently
- m = exp(b0 + b1x1 + . . . + bkxk)
- Covariates can be eithercontinuous or categorical
- Observations independent

- For continuous covariates:
- In normal regression, b is the increase in mean response associated with a unit increase in x
- In logistic regression, b is the increase in log odds associated with a unit increase in x
- In Poisson regression, b is the increase in log mean associated with a unit increase in x
- In logistic regression, if x is increased by 1, the odds are increased by a factor of exp(b)
- In Poisson regression, if x is increased by 1, the mean is increased by a factor of exp(b)

- For categorical covariates (main effects only):
- In normal regression, b is the increase in mean response relative to the baseline
- In logistic regression, b is the increase in log odds relative to the baseline
- In logistic regression, if we change from baseline to some level, the odds are increased by a factor of exp(parameter for that level) relative to the baseline
- In Poisson regression, if we change from baseline to some level, the mean is increased by a factor of exp(parameter for that level) relative to the baseline

- R2 (for normal regression)
- Residual Deviance (for Logistic and Poisson regression)
- But not for ungrouped data in logistic, or Poisson with very small means (cell counts)

- For normal regression,
- Predict response at covariates x1, . . . ,xk
- Estimate mean response at covariates x1, . . . ,xk

- For logistic regression,
- estimate log-odds at covariates x1, . . . ,xk
- Estimate probability of “success” at covariates x1, . . . ,xk

- For Poisson regression,
- Estimate mean at covariates x1, . . . ,xk

- Summary table
- Estimates of regression coefs
- Standard errors
- Test stats for coef = 0
- R2 etc (normal regression)
- F-test for null model
- Null and residual deviances (logistic/Poisson)

- Use and interpretation of both forms of anova
- Comparing model with a sub-model
- Adding successive terms to a model

- Collinearity
- VIF’s
- Correlation
- Added variable plots

- Model selection
- Stepwise procedures: FS, BE, stepwise
- All possible regressions approach
- AIC, BIC, CP, adjusted R2, CV

- Factors
- Baselines
- Levels
- Factor level combinations
- Interactions
- Dummy variables
- Know how to express interactions in terms of means, means in terms of interactions
- Know how to interpret zero interactions

- Fit a separate plane (mean if no continuous covariates) to each combination of factor levels
- Search for a simpler submodel (with some interactions zero) using stepwise and anova

- For non-planar data
- Plot res/fitted, res/x’s, partial residual plots, gam plots, box-cox plot
- Transform either x’s or response, fit polynomial terms

- For unequal variance
- Plot res/ fitted, look for funnel effect
- Weighted least squares
- Transform response

- For outliers and high-leverage points
- Hat matrix diagonals
- Standardised residuals,
- Leave-one-out diagnostics

- Independent observations
- Acf plots
- Residual/previous residual
- Time series plot of residuals
- Durbin-Watson test

- Normality
- Normal plot
- Weisberg-Bingham test
- Box Cox (select power)

Log-likelihood is

Deviance = 2(log LMAX - log LMOD)

- log LMAX: replace p’s with frequencies si/ni
- log LMOD: replace p’s with estimated p’s from logistic model i.e.

Can’t use as goodness of fit measure in ungrouped case

- Probabilities p: Pr(Y=1)
- Odds p /(1- p)
- Log-odds: log p/(1- p)

- Pearson
- Deviance

- Offsets
- Interpretation of regression coefficients
- (same as for odds in logistic regression)

- Correspondence between Poisson regression (Log-linear models) and the multinomial model for contingency tables
- The “Poisson trick”

- Cells 1,2,…, m
- Cell probabilities p1, . . . , pm
- Counts y1, . . . , ym
- Log-likelihood is

- A “model” for the table is anything that specifies the form of the probabilities, possibly up to k unknown parameters
- Test if the model is OK by
- Calculate Deviance = 2(log LMAX - log LMOD)
log LMAX: replace p’s with table frequencies

log LMOD: replace p’s with estimated p’s from the model

- Model OK if deviance is small,(p-value > 0.05)
- Degrees of freedom m - 1 - k
- k = number of parameters in the model

- Calculate Deviance = 2(log LMAX - log LMOD)

- Correspond to interactions being zero
- Fit a “saturated” model using Poisson regression
- Use anova, stepwise to see which interactions are zero
- Identify the appropriate model
- Models can be represented by graphs

- Definition and interpretation
- Connection to independence
- Connection with interactions
- Relationship between conditional OR’s and interactions
- Homogeneous association model

- Each node is a factor
- Factors joined by lines if an interaction between them
- Interpretation in terms of conditional independence
- Interpretation in terms of collapsibility

- Association reversal
- Simpson’s paradox
- When can you collapse

- Product multinomial
- comparing populations
- populations the same if certain interactions are zero

- Goodness of fit to a distribution
- Special case of 1-dimensional table