1 / 55

Buried treasures Old statistics in new contexts

Explore the application of old statistical methods in new and interesting contexts. Discover how genomics meets sample surveys, the theory behind bootstrapping and rank statistics, and the application of cancer genetics in stochastic geometry.

betsyf
Download Presentation

Buried treasures Old statistics in new contexts

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Buried treasuresOld statistics in new contexts

  2. “If I have seen further it is by standing on the shoulders of giants” - Isaac Newton

  3. You are dealing with a statistical problem in a special context. You solve it by realizing a new interpretation of an old, interesting, but uncelebrated result, which was developed in a completely different context. One form of the past effect - -

  4. V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application) Three vignettes

  5. V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application)

  6. John Tukey V1: Genomics meets sample surveys Context Second-order gene-set enrichment analysis Buried treasure J.W. Tukey, 1950, Some sampling simplified. J. Amer. Statist. Assoc., 45, 501-519.

  7. Context D Pyeon, MA Newton, PF Lambert, JA den Boon, S Sengupta, CJ Marsit, CD Woodworth, JP Connor, TH Haugen, EM Smith, KT Kelsey, LP Turek and P Ahlquist (2007). Fundamental Differences in Cell Cycle Deregulation in Human Papillomavirus Positive and Human Papillomavirus Negative Head/Neck and Cervical Cancers. Cancer Research, 67, 4605-4619. MA Newton, X Ma, D Sarkar, D Pyeon, and P Ahlquist (2007). Second order enrichment analysis of microarray expression data reveals gene sets with heterogeneous activation states. Submitted.

  8. Context D Pyeon, MA Newton, PF Lambert, JA den Boon, S Sengupta, CJ Marsit, CD Woodworth, JP Connor, TH Haugen, EM Smith, KT Kelsey, LP Turek and P Ahlquist (2007). Fundamental Differences in Cell Cycle Deregulation in Human Papillomavirus Positive and Human Papillomavirus Negative Head/Neck and Cervical Cancers. Cancer Research, 67, 4605-4619. MA Newton, X Ma, D Sarkar, D Pyeon, and P Ahlquist (2007). Second order enrichment analysis of microarray expression data reveals gene sets with heterogeneous activation states. Submitted.

  9. genes (a few) HPV - tissue samples HPV + Slice of expression data from Pyeon et al. 2007

  10. Fold changes between HPV+ and HPV- (all genes) density -2 -1 0 1 2 log2 [ HPV+ / HPV- ]

  11. The post-processing problem + expression exogenous results biology

  12. Exogenous biology B = { c: c = {genes with specific property } } e.g. - gene ontology (GO) - Kyoto Encylopedia (KEGG)

  13. In HPV example, cell cycle may be an interesting gene set Excess differential expression in both directions Large sample variance (largest in KEGG, GO)

  14. Expression results: Gene set: B Gene set variance: Standardized statistic:

  15. Connection: C indexes a simple random sample of genes I.e. finite population sampling Centering: ?? Scaling:

  16. We get: following Tukey’s 1950 calculation involving “K” functions: set-level statistics whose expected value equals the same statistic computed on the whole population

  17. b1 b2 1 0 -3 1 -4 0 12 -2 -6 1 where

  18. V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application)

  19. V2: Bootstrapping and rank statistics Context Mason and Newton, 1992, A rank statistics approach to the Consistency of a general bootstrap. Ann. Statist., 20,1611-24 Buried treasure J. Hajak, 1961, Some extensions of the Wald- Wolfowitz-Noether theorem. Ann. Math. Statist., 32, 506-523. Jaroslav Hajek

  20. iid Data: CLT: Bootstrap mean: Bootstrap CLT: multinomials

  21. Generalized bootstrap: exchangeable weights Mason, Newton asked: What is CLT for this case?

  22. And the sum For a random permutation Consider two triangular arrays of numbers

  23. Notes about: - Linear rank statistic; studied in nonparametrics. - Hajak 1961 gives weak conditions for AN

  24. Now condition on both data and weights Back to the general bootstrap problem: Key fact: random permutation This is precisely a linear rank statistic, and Hajek (1961) gives general conditions for its asymptotic normality.

  25. V1: Genomics meets sample surveys (methodology) V2: Bootstrapping and rank statistics (theory) V3: Cancer genetics and stochastic geometry (application)

  26. V3: Cancer genetics and stochastic geometry Context Cellular events during tumor initiation, intestinal cancer Buried treasure P. Armitage, 1949, An overlap problem arising in particle counting. Biometrika,45, 501-519. Peter Armitage

  27. Context AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005). Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965. MA Newton, L Clipson, AT Thliveris and RB Halberg (2006). A statistical test of the hypothesis that polyclonal intestinal tumors arise by random collision of initiated clones. Biometrics, 62, 721-7. MA Newton (2006). On estimating the polyclonal fraction in lineage marker studies of tumor origin. Biostatistics, 7, 503-14.

  28. Context AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005). Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965. MA Newton, L Clipson, AT Thliveris and RB Halberg (2006). A statistical test of the hypothesis that polyclonal intestinal tumors arise by random collision of initiated clones. Biometrics, 62, 721-7. MA Newton (2006). On estimating the polyclonal fraction in lineage marker studies of tumor origin. Biostatistics, 7, 503-14.

  29. Context AT Thiliveris, RB Halberg, L Clipson, WF Dove, R Sullivan, MK Washington, S Stanhope, and MA Newton (2005). Polyclonality of familial murine adenomas: Analyses of mouse chimeras with low tumor multiplicity suggest short-range interactions. PNAS, 102, 6960-6965. MA Newton, L Clipson, AT Thliveris and RB Halberg (2006). A statistical test of the hypothesis that polyclonal intestinal tumors arise by random collision of initiated clones. Biometrics, 62, 721-7. MA Newton (2006). On estimating the polyclonal fraction in lineage marker studies of tumor origin. Biostatistics, 7, 503-14.

  30. Monoclonal theory of tumor origin genetic defect apears in a cell

  31. Monoclonal theory of tumor origin aberrant cell divides and persists

  32. Aggregation chimeras provide data on clonality.

  33. B6 Apc Min/+ Mom1 R/R <--> B6 Apc Min/+ Mom1 R/R Rosa26/+

  34. B6 Apc Min/+ Mom1 R/R <--> B6 Apc Min/+ Mom1 R/R Rosa26/+ Heterotypic tumor!

  35. Summary count data

  36. clonal cooperation - recruitment; selection many heterotypic tumors … but why?

  37. clonal cooperation - recruitment; selection random collision many heterotypic tumors … but why?

  38. # initiated clones collision distance # isolated clones # doublets # triplets # tumors (one mouse) Key parameters: Induced R.V.’s

  39. # initiated clones collision distance # isolated clones # doublets # triplets # tumors (one mouse) Key parameters: Induced R.V.’s Intractable distribution!!

  40. where But thanks to Armitage, 1949,

  41. Armitage was studying dust particles … not cancer

  42. Closing the inference loop • Lineage marking • Unknown N’s • Extra Poisson variation

  43. Conditional predictive p-values

  44. You are dealing with a statistical problem in a special context. You solve it by realizing a new interpretation of an old, interesting, but uncelebrated result, which was developed in a completely different context. One form of the past effect - -

  45. 1915-2000 1926-1974 1924-present John Tukey Jaraslav Hajek Peter Armitage

  46. 1915-2000 1926-1974 1924-present 8 43 9 John Tukey Jaraslav Hajek Peter Armitage # citations of key paper

More Related