Data and statistics new methods and future challenges phil o neill university of nottingham
Download
1 / 30

Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham - PowerPoint PPT Presentation


  • 54 Views
  • Uploaded on

Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham . Professors: How they spend their time. Professors: How they spend their time. 1. High-resolution genetic data 2. Model assessment . 1. High-resolution genetic data 2. Model assessment .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham ' - lexine


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Data and statistics new methods and future challenges phil o neill university of nottingham

Data and Statistics: New methods and future challengesPhil O’NeillUniversity of Nottingham


Professors how they spend their time

Professors: How theyspend their time


Professors how they spend their time1

Professors: How theyspend their time



1 high resolution genetic data 2 model assessment1

1. High-resolution genetic data2. Model assessment


Gardy 2011 nejm

Gardy 2011 NEJM


“High-resolution genetic data”: what are they? individual-level data on the pathogen can be taken at single or multiple time points  high-dimensional e.g. whole genome sequences proportion of individuals sampled could be high/low  becoming far more common due to cost reduction


“High-resolution genetic data”: what use are they? better inference about transmission paths more reliable estimates of epi quantities? understand evolution of the pathogen


.


.

A C C C T T G G G A A A .....


Modelling and Data Analysis methodsTwo kinds of approaches exist:1. Separate genetic and epidemic components (e.g. Volz, Rasmussen) 2. Combine genetic and epidemic components (e.g. Ypma, Worby, Morelli)


1. Separate genetic and epidemic componentse.g: - estimate phylogenetic tree - given the tree, fit epidemic modelor - cluster individuals into genetically similar groups - given the groups, fit multi-type epidemic model


1. Separate genetic and epidemic components + “Simple” approach + Avoids complex modelling- Ignores any relationship between transmission and genetic information


2. Combine genetic and epidemic componentse.g: - model genetic evolution explicitly - define model featuring both genetic and epidemic parts


2. Combine genetic and epidemic components + “Integrated” approach - Is modelling too detailed? - Initial conditions: typical sequence?+/- Model differences between individuals instead?



Model assessment what is it does our model fit the data is there a better model

“Model assessment”: what is it? Does our model fit the data? Is there a better model?


“Model assessment”: why do it? Poor fit sheds doubt on conclusions from modelling Model choice can be a tool for directly addressing questions of interest


Linear regression y k ax k b e k e k n 0 v minimise distance of model mean from observed data

Linear regression: yk= axk + b + ek, ek ~ N(0,v)Minimise distance of model mean from observed data


Linear regression y k ax k b e k e k n 0 v minimise distance of model mean from observed data1

Linear regression: yk= axk + b + ek, ek ~ N(0,v)Minimise distance of model mean from observed data


For outbreak data: What are the right residuals? Should observed or unobserved data be compared to the model? (Streftaris and Gibson) Mean model may only be available via simulation Is the mean the right quantity to consider?


For outbreak data: What are the right residuals? Should observed or unobserved data be compared to the model? (Streftaris and Gibson) Mean model may only be available via simulation Is the mean the right quantity to consider?


Simulation-based approaches to model fit: Forward simulation – “close” to data? Choice of summary statistics? Close ties to ABC methods (McKinley, Neal)


Approaches to model choice  Hypermodels/saturated models Bayesian non-parametric methods Bayesian methods e.g. RJMCMC Mixture models


Hypermodels/saturated modelse.g. Infection rates βS or βSI or βSI0.5 in an SIR model? Instead use βSI and estimate  (O’Neill and Wen)


 Bayesian non-parametric methodse.g. Infection rate β(t)SI or β(t) in an SIR model; Estimate β(t) in a Bayesian non-parametric manner using Gaussian process machinery (Kypraios,O’Neill and Xu; Knock and Kypraios)


 Reversible Jump MCMCe.g. Distinct models (usually small number), estimate Bayes factors by running MCMC on union of parameter spaces (O’Neill; Neal and Roberts; Knock and O’Neill)


 Mixture modelse.g. Given two models (f, g), create mixture model f(x) =  g(x) + (1-  ) h(x);estimation of  enables estimation of Bayes Factors (Kypraios and O’Neill)


ad