0 Views

Download Presentation
##### Biostatistics 760

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Biostatistics 760**Random Thoughts**Upcoming Classes**• Bios 761: Advanced Probability and Statistical Inference • Bios 763: Generalized Linear Model Theory and Applications • Bios 767: Longitudinal Data Analysis • Bios 780: Theory and Methods for Survival Analysis • Bios 841: Statistical Consulting**Bios 761**• Frequentist and Bayesian decision theory • Hypothesis testing: UMP tests, etc. • Bootstrap and other methods of inference • Stochastic processes: • Poisson processes • Markov chains • Martingales • Brownian motion**Bios 780**• Time-to-event data • Right censoring • Counting processes; martingales • Semiparametric approaches • Kaplan-Meier estimator • Log-rank statistic • Cox model • Data analysis**Bios 841**• Consulting versus collaboration • Bringing it all together to solve problems • Communicating about statistics • Three real problems • Three journal style reports • One final oral presentation • Real time problem solving • What is the role of statistical theory?**A Few War Stories**• As a student: thesis on surrogates • As a postdoc: infectious diseases • As a new professor: cystic fibrosis (CF)* • Working on tenure: empirical processes • Empirical processes and cancer* • Chair of the DSMC for NICHD • Artificial intelligence and NSCLC**CF Neonatal Screening**• 1992: Joined Phil Farrell’s CF study team • 1997: Farrell, Kosorok, Laxova, et al, published in NEJM • 2004 (Oct. 15): CDC recommended CF newborn screening: the 1997 article was judged the only valid randomized trial • States offering CF newborn screening: 3 in 1997, 12 in 2004, 45 today**What Role Did “Theory” Play?**• Used state-of-the-art statistical methods that were robust (GEE) • In other CF research we have used: • Current status methods (parametric, robust) • Constrained regression estimation • Semiparametric bootstrap inference • Martingale based survival analysis • New work using artificial intelligence**Empirical Processes and Cancer**• Non-Hodgkin’s Lymphoma Prognostic Factors Project (1993, NEJM) • Cox proportional hazards model employed to ascertain risks of 5 prognostic factors: Age, performance Status, serum lactate dehydrogenase Level, number of extranodal disease Sites, tumor Stage • Diagnostics show the model fits poorly**What is the Problem?**• Poor survival function prediction • Possibly incorrect interpretation of risk factor effects • A model that adds a single parameter to the Cox model was developed and fit • This new model fits well (Kosorok,Lee and Fine, 2004) • Inference for the new model is complicated**What Does Theory Tell Us?**• We can derive valid inferential tools for the new model: estimation and bootstrap • Robustness was also studied: we learn theoretically that the Cox model is robust to this kind of model misspecification: • The direction of the regression coefficients is preserved • Should use robust variance for Cox model**Theory Versus Applications**• The title implies there is conflict between theory and applications • This isn’t true! • Theory provides a basis for correct thinking and problem solving for applications • Applications drive new theoretical development**Theory Can Be Impractical**• Law of iterated logarithm: needs sample size of 108 (“asymptopia”). • Sometimes higher order approximations are needed before it becomes useful. • Sometimes computational properties of asymptotically optimal estimators are poor. • Some hard problems take years to solve.**Why Theory is Needed**• Often it does work for practical sample sizes. • Can reveal properties that are universally valid: simulation studies are limited to the scenarios investigated. • Theory can lead toward methodological solutions (Cook and Kosorok, 2004 JASA). • Theory can drive scientific discovery. • Some results are beautiful.**Data Mining Versus Inference**• Data mining is summarizing and representing data no matter how complicated • Inference is determining valid measures of uncertainty • Patterns obtained from data mining can be misleading • Inference without data mining may miss important structure**The Core of Statistics**• Statistics is the science of science • How do we learn from our world and draw meaningful and valid conclusions from it? • Need both data mining and valid inference • Requires a unique kind of intuition • Needs many different intellectual perspectives • One of the most challenging of all fields**Everyone Needs Core Literacy**• All statisticians need to know enough theory to have core literacy about statistics and to be able to problem solve • All statisticians need to know enough about applications to know what is important • All biostatisticians need to know enough statistical methods to be useful in practice • The purpose of a Ph.D. in Biostatistics is to enable the creation of new methodology**Semiparametric Inference**• The study of statistical models with parametric and/or nonparametric parts • Can achieve trade-off between scientific meaning and model “robustness” • Estimation and inference are often hard • There exists an efficiency bound for parametric and some nonparametric parts • NPMLE, testing and estimating equations**Empirical Processes**• Tools for complex model inference and high dimensional data • Can determine universal properties of semiparametric methods: • Consistency • Rate of convergence • Limiting distributions • Valid inference (empirical process bootstrap) • Empirical processes are everywhere**The Road Ahead**• Whatever you choose to do, the core statistical theory classes will help you. • Be patient as your learn. • Be willing to work hard (struggle is good). • It takes many different kinds of thinkers with different learning styles. • There are important discoveries to be made in both applications and theory.