1 / 10

Count Data

Count Data. Harry R. Erwin, PhD School of Computing and Technology University of Sunderland. Resources. Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. Freund, RJ, and WJ Wilson (1998) Regression Analysis, Academic Press.

Download Presentation

Count Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Count Data Harry R. Erwin, PhD School of Computing and Technology University of Sunderland

  2. Resources • Crawley, MJ (2005) Statistics: An Introduction Using R. Wiley. • Freund, RJ, and WJ Wilson (1998) Regression Analysis, Academic Press. • Gentle, JE (2002) Elements of Computational Statistics. Springer. • Gonick, L., and Woollcott Smith (1993) A Cartoon Guide to Statistics. HarperResource (for fun).

  3. Introduction • These four demonstration sessions of this class address special types of data: • Counts • Proportions • Survival analysis • Binary responses

  4. Frequencies and Proportions • With frequency data, we know how often something happened, but not how often it didn’t happen. • With proportion data (next week), we know how often it didn’t happen.

  5. Count Data • Linear regression assumes constant variance and normal errors. This is not appropriate for count data: • Counts are non-negative. • Response variance usually increases with the mean. • Errors are not normally distributed. • Zeros are hard to transform.

  6. Handling Count Data in R • Use a glmwith family=poisson. • This sets errors to Poisson, so variance is proportional to the mean. • This sets link to log, so fitted values are positive. • Book example • If you have overdispersion (residual deviance greater than residual degrees of freedom), use family=quasipoisson.

  7. Analysis of Count Data • Book example (230ff) • Use of table() • Use of tapply() • fitting the glm with family = poisson. • refitting with family = quasipoisson. • three and four-way interactions • model simplification • documentation

  8. Contingency Tables • Risk of data aggregation over important explanatory variables (nuisance variables) • Book example (234ff) • The saturated model • Remove the N-way interaction and see if it was significant. • If the N-way interaction is significant, go no further. • Then remove the scientifically interesting interaction and see if it is significant. • You have to check the nuisance variables first!

  9. ANCOVA with Counts • Book example (237ff) • plotting and use of split to gain insight. • analysis—testing for the need for different slopes. • use of predict() to draw lines through the plot.

  10. Frequency Distributions • Book example (240ff) • testing for independence • use of table() • use of dpois() • plotting and interpretation • use the negative binomial distribution for data with variance much greater than the mean • use the binomial distribution for data with variance less than the mean

More Related