Approximate Bayesian Computation

1 / 14

# Approximate Bayesian Computation - PowerPoint PPT Presentation

Approximate Bayesian Computation. Studying demographic parameters. Joao Lopes, Mark Beaumont University of Reading joao.lopes@rdg.ac.uk. ABC algorithm:. Assumptions: Discordance between gene and species trees is not expected Mutation rate is variable in space, but not in time Features:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Approximate Bayesian Computation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Approximate Bayesian Computation

Studying demographic parameters

Joao Lopes, Mark Beaumont

joao.lopes@rdg.ac.uk

ABC algorithm:

• Assumptions:
• Discordance between gene and species trees is not expected
• Mutation rate is variable in space, but not in time
• Features:
• Based on construction of gene trees using The Coalescent model
• Easily applied to 4 or 5 populations/species
• Some tweaks are necessary to use in more populations
• But most importantly:
• Handles large datasets (typically hundreds of samples per population/species)
• Complex population/species models can be used (e.g. presence of gene flow)
• Assumptions can be greatly relaxed (e.g. variable mutation rate over time)

Popanc

Pop2

Pop1

• ABC algorithm

ABC algorithm:

F = {Ne1, Ne2, NeA, m1, m2, t}

• Sample from prior(s): Fi ~ p(F)
• Simulate data, given Fi: Di ~ p(D | Fi)
• Summarize Di with set of Summary Statistics obtaining Si; go to 1. until N points (S,F) have been created.
• _
• Acceptthe points whose S is within a distance d from s’ the real data summarized by the same set.
• _
• Correct the values F according to their distance from the real data by performing a local linear regression

NeA

t

m2

Ne1

m1

Ne2

The population model

Simulated data

DNA sequence data (1 locus)

Pop1: 45 samples

Pop2: 55 samples

ABC: 200 data sets

Comparison with MCMC: 10 data sets

• Summary Statistics used:
• mean of pairwise differences
• in each population
• both populations joined together
• number of segregating sites
• in each population
• both populations joined together
• number of haplotypes
• in each population
• both populations joined together

Relative Mean Integrated Square Error (relMISE):

,

where n is the number of accepted points, fi is the value

of a determined parameter for the ith point and f‘ is the

true value of the parameter.

“real” data

ABC

prior distribution

MCMC

• Simulated data

‘Real’ data and Prior information

10000

20000

5000

0

0

5000

0 12500

0 40000

0 10000

0 0.0005

0 0.0005

0 10000

Ne1

Ne2

NeA

m1

m2

t

Simulated data

ABC (500 000 iter, tol=0.02, logit transf, sstats=9 ):

Simulation 8:

Mig1

Mig2

Tev

Ne1

Ne2

Neanc

average relMISE: (10 data sets)

Simulated data: optimized ABC method

ABC (2500 000 iter, tol=0.004, log transf, sstats=9):

Simulation 8:

Mig1

Mig2

Tev

Ne1

Ne2

Neanc

average relMISE: (10 data sets)

ABC (2500 000 iter, tol=0.004, log transf, sstats=21)

Simulation 8:

Mig1

Mig2

Tev

Ne1

Ne2

Neanc

average relMISE: (10 data sets)

Popanc

Popanc

Pop2

Pop2

Pop1

Pop1

Model-choice: migration present/absent

ABC (1000 000 iter, tol=0.004, log transf, sstats=21):

Population model 1 (M = M1)

Population model 2 (M = M2)

or

x

pM1 = 2%

pM2 = 98%

(10 data sets)

Simulated data: using model-choice step

ABC (2500 000 iter, tol=0.004, log transf, sstats=21):

Simulation 8:

Mig1

Mig2

Tev

Ne1

Ne2

Neanc

average relMISE: (10 data sets)

Simulated data: 10 vs 200 datasets

ABC (2500 000 iter, tol=0.004, log transf, sstats=21):

Simulation 8:

Mig1

Mig2

Tev

Ne1

Ne2

Neanc

average relMISE: (10 data sets) and (200 data sets)

Conclusions:

• Comparison between ABC and MCMC methods:
• ABC up to 2 orders of magnitude faster than MCMC method for single locus
• ABC modes are similar to MCMC (full likelihood method)
• Can easily incorporate more complex population models with relaxed assumptions
• Using a model-framework comes just naturally from the ABC approach
• Easily handles multi-modal Posterior distributions
• Does not have problems associated with Local Maximums in Likelihood distributions
• ABC improves with:
• parameters transformation
• more iterations
• more summary statistics
• model-choice framework

Take home message:

• Phylogenetic methods based on gene trees using The Coalescence are being greatly explored.
• These methods will be available in a near by future

Acknowledgements

I would like to acknowledge David Balding for providing frequent meetings on the subject. And also a special thanks to Mark Beaumont for advice and comments on the work.

Support for this work was provided by EPSRC.

joao.lopes@rdg.ac.uk

http://www.rdg.ac.uk/~sar05sal