Predicting Count Data

Predicting Count Data Poisson Regression

Review: Confusing Statistical Terms General Linear Model (GLM) -Anything that can be written like this: -Solved using ordinary least squares -Assumptions revolve around the Normal Dist. Generalized Linear Model -Anything that can be written like this: -Solved using maximum likelihood -Assumptions use many different distributions

Remember: Why These Models? • Linear Regression: Assuming normal errors around the predicted score • When we violate this assumptions, our estimates of the distributions of the B’s are incorrect • Also…in some case our estimates of the effect size are inaccurate (usually too small)

Linear Regression • Linear regression is really a predictive model before anything else. (The statistical aspect is extra). B1 B0

Count Data

Examples • (Criminal Justice) Number of offenses per year • (Domestic Violence) Number of DV events per person • (Epidemiology) Number of seizures per week

Count Data • This type of data can only have discrete values that are greater than or equal to zero. • In situations, this data follows the Poisson Distribution

Poisson Distribution • The Poisson random variable is defined by one parameter: the mean (μ) • It has the strong assumption that the mean is equal to the variance μ=σ

Poisson Regression • In this model, instead of predicting mean of a normal distribution, you are predicting the mean of a Poisson distribution (given some predictors)

Fundamental Equation • In linear regression: • In Poisson regression:

Assumptions • In your outcome variable (Y), the mean equals the variance. (There is a test for this) • For violations you can use Negative Binomial…which is just a Poisson where the variance is separate from the mean. • Observations are independent (as with most analyses) • And, basically, that the predictive model makes sense ( )

Interpreting Parameters • Like logistic, we have to interpret the EXP(B) • (This is the notation for ) • Instead of an odds ratio, this is a relative risk ratio: it is the additional rate given a one unit increase in X • 1 is the null hypothesis • 1.2 would be an increase of .2 in the relative rate for a one unit increase

Really, why the trouble? • Turns out that not using Poisson isn’t the worst thing ever. • Actually get alpha deflation • BUT- Many journals that are used to this kind of data will reject articles that do not use the proper technique

Predicting Count Data

Predicting Count Data

Presentation Transcript

Modelling Count Data: Outline

Making your NAPLAN Data Count

Making Data Count:

Post-School Data: Make it Count

Predicting protein function from heterogeneous data

Predicting protein function from heterogeneous data

Count Data Models

Analysis of count data

Predicting

Statistical model for count data

Analysis of Count Data Chapter 14

Kids Count Data 2008 Hillsdale County

Predicting

Predicting Survival Time From Genomic Data

Analysis of count data

Analysis of Count Data Chapter 26

Count Data

Predicting Genetic Merit Using Genomic Data

Analysis of count data

Predicting Flu Trends using Twitter Data

Modelling Count Data: Outline