approximate bayesian computation n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Approximate Bayesian Computation PowerPoint Presentation
Download Presentation
Approximate Bayesian Computation

Loading in 2 Seconds...

play fullscreen
1 / 14

Approximate Bayesian Computation - PowerPoint PPT Presentation


  • 336 Views
  • Uploaded on

Approximate Bayesian Computation. Studying demographic parameters. Joao Lopes, Mark Beaumont University of Reading joao.lopes@rdg.ac.uk. ABC algorithm:. Assumptions: Discordance between gene and species trees is not expected Mutation rate is variable in space, but not in time Features:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Approximate Bayesian Computation' - Olivia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
approximate bayesian computation

Approximate Bayesian Computation

Studying demographic parameters

Joao Lopes, Mark Beaumont

University of Reading

joao.lopes@rdg.ac.uk

slide2

ABC algorithm:

  • Assumptions:
    • Discordance between gene and species trees is not expected
    • Mutation rate is variable in space, but not in time
  • Features:
    • Based on construction of gene trees using The Coalescent model
    • Easily applied to 4 or 5 populations/species
    • Some tweaks are necessary to use in more populations
  • But most importantly:
    • Handles large datasets (typically hundreds of samples per population/species)
    • Complex population/species models can be used (e.g. presence of gene flow)
    • Assumptions can be greatly relaxed (e.g. variable mutation rate over time)
slide3

Popanc

Pop2

Pop1

  • ABC algorithm

ABC algorithm:

F = {Ne1, Ne2, NeA, m1, m2, t}

  • Sample from prior(s): Fi ~ p(F)
  • Simulate data, given Fi: Di ~ p(D | Fi)
  • Summarize Di with set of Summary Statistics obtaining Si; go to 1. until N points (S,F) have been created.
  • _
  • Acceptthe points whose S is within a distance d from s’ the real data summarized by the same set.
  • _
  • Correct the values F according to their distance from the real data by performing a local linear regression

NeA

t

m2

Ne1

m1

Ne2

The population model

slide4

Simulated data

DNA sequence data (1 locus)

Pop1: 45 samples

Pop2: 55 samples

ABC: 200 data sets

Comparison with MCMC: 10 data sets

  • Summary Statistics used:
  • mean of pairwise differences
    • in each population
    • both populations joined together
  • number of segregating sites
    • in each population
    • both populations joined together
  • number of haplotypes
    • in each population
    • both populations joined together

Relative Mean Integrated Square Error (relMISE):

,

where n is the number of accepted points, fi is the value

of a determined parameter for the ith point and f‘ is the

true value of the parameter.

slide5

“real” data

ABC

prior distribution

MCMC

  • Simulated data

‘Real’ data and Prior information

10000

20000

5000

0

0

5000

0 12500

0 40000

0 10000

0 0.0005

0 0.0005

0 10000

Ne1

Ne2

NeA

m1

m2

t

slide6

Simulated data

ABC (500 000 iter, tol=0.02, logit transf, sstats=9 ):

Simulation 8:

Mig1

Mig2

Tev

Ne1

Ne2

Neanc

average relMISE: (10 data sets)

slide7

Simulated data: optimized ABC method

ABC (2500 000 iter, tol=0.004, log transf, sstats=9):

Simulation 8:

Mig1

Mig2

Tev

Ne1

Ne2

Neanc

average relMISE: (10 data sets)

slide8

Simulated data: adding summary stats

ABC (2500 000 iter, tol=0.004, log transf, sstats=21)

Simulation 8:

Mig1

Mig2

Tev

Ne1

Ne2

Neanc

average relMISE: (10 data sets)

slide9

Popanc

Popanc

Pop2

Pop2

Pop1

Pop1

Model-choice: migration present/absent

ABC (1000 000 iter, tol=0.004, log transf, sstats=21):

Population model 1 (M = M1)

Population model 2 (M = M2)

or

x

pM1 = 2%

pM2 = 98%

(10 data sets)

slide10

Simulated data: using model-choice step

ABC (2500 000 iter, tol=0.004, log transf, sstats=21):

Simulation 8:

Mig1

Mig2

Tev

Ne1

Ne2

Neanc

average relMISE: (10 data sets)

slide11

Simulated data: 10 vs 200 datasets

ABC (2500 000 iter, tol=0.004, log transf, sstats=21):

Simulation 8:

Mig1

Mig2

Tev

Ne1

Ne2

Neanc

average relMISE: (10 data sets) and (200 data sets)

slide12

Conclusions:

  • Comparison between ABC and MCMC methods:
    • ABC up to 2 orders of magnitude faster than MCMC method for single locus
    • ABC modes are similar to MCMC (full likelihood method)
    • Can easily incorporate more complex population models with relaxed assumptions
    • Using a model-framework comes just naturally from the ABC approach
    • Easily handles multi-modal Posterior distributions
    • Does not have problems associated with Local Maximums in Likelihood distributions
  • ABC improves with:
    • parameters transformation
    • more iterations
    • more summary statistics
    • model-choice framework
slide13

Take home message:

  • Phylogenetic methods based on gene trees using The Coalescence are being greatly explored.
  • These methods will be available in a near by future
slide14

Acknowledgements

I would like to acknowledge David Balding for providing frequent meetings on the subject. And also a special thanks to Mark Beaumont for advice and comments on the work.

Support for this work was provided by EPSRC.

joao.lopes@rdg.ac.uk

http://www.rdg.ac.uk/~sar05sal