- 192 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'ABC' - hammer

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### ABC

The method: practical overview

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Characteristics of an ABC methodology
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with MCMC
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Characteristics of an ABC methodology
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with MCMC
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Characteristics of an ABC methodology
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with MCMC
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

- Two processes are usually considered important in determining population structure:
- Gene flow;

- Population splitting.

- Most often these processes are modelled and inferred separately;
- Recent advances by Nielsen and Wakeley (2001) and Hey and Nielsen (2004) for two-population scenario using Markov Chain Monte Carlo (MCMC) can study both processes at the same time;
- An Approximate Bayesian Computation (ABC) method developed by (Beaumont, 2006) deals with the same problem but in a three-population scenario.
The idea is to avoid problems associated with MCMC such as poor-mixing and long convergence times. But it relies in a couple of approximations.

The aim of this study is to see how good these approximations are.

Background using MCMC:

- Wakeley, Hey (1997, Genetics) - developed an algorithm to estimate historic demographic parameters.
- Nielsen, Wakeley (2001, Genetics) - developed a MCMC algorithm to infer about demographic parameters in a “Isolation with Migration” model.
- Hey, Nielsen (2004, Genetics) - presents the IM program (software that uses the MCMC algorithm previously developed).
- Hey et al (2004, Mol. Ecol.) - introduce changes in IM software (HapSTR data can be used).
- Won, Hey (2005, Mol. Biol. Evol.) - presents a case study in 3 populations of chimpanzees.
- Hey (2005, PLoS. Biol.) – the peopling of the Americas. Introduce changes in IM software (founder population size can be inferred).

Background using ABC:

- Tavaré et al. (1997, Genetics) – presented a simulation based-algorithm to infer about specific demographic parameters
- Pritchard et al. (1999, MBE) - introduce the first ABC approach with a rejection method step to estimate demographic parameters.
- Beaumont et al. (2002, Genetics) – introduce a regression method within a ABC framework to estimate demographic parameters.
- Marjoram et al (2003, PNAS) – uses MCMC without likelihoods within an ABC framework.
- Beaumont (2006, “Simulation, Genetics, and Human Prehistory”) - uses regression based ABC to estimate demographic parameters within a “Isolation with Migration” model for microsatellites in three populations.
- Hickerson et al (2006, in press) – compares ABC with IM in two-population studies for sequence data.

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Characteristics of an ABC methodology
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with MCMC
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

- Characteristics of an ABC methodology

Replace the data with summary statistics:

- Summarize a large amount of data into a few representative values
- By replacing the data with summary statistics, it is easier to decide how ‘similar’ data sets are to each other.

Get the posterior distribution by sampling values from it:

- Simulate samplesFi, Difrom the joint densityp(F,D):
- First sample from the prior: Fi ~ p(F)
- Then simulate the data, given Fi: Di ~ p(D | Fi)

- The posterior distribution,
- p(F|D) = p(D,F) / p(D) , for any givenD,
- can be estimate by the proportion of all simulated points that correspond to that particularDand Fdivided by the proportion of points corresponding toD(ignoringF).

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Bayesian inference on population genetics
- Characteristics of an ABC methodology
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with MCMC
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

SummStats, S

- ABC approach

- Algorithm of an ABC inference

Obtained genetic data

F1

Get summary statistics(S)

F2

F3

F4

s’

in (Nordborg, 2001)

Joint distribution (S,F)

Set of priors (F)

Parameter, F1

SummStats, S

- ABC approach

- Algorithm of an ABC inference

By extracting the points near the real data set we obtain the posterior:

p

s’

Joint distribution (S,F)

Posterior distribution – p(F1| S=s’)

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Characteristics of an ABC methodology
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with MCMC
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

ABC approach

Natural limitation due to lack of information in data sets

Limitation on the number of summary statistics used

Limitation on the calculation of summary statistic (time consuming)

Limitation on the time consumption of the simulation step

- Limitations

ABC approach

Natural limitation due to lack of information in data sets

Limitation on the number of summary statistics used

Limitation on the calculation of summary statistic (time consuming)

Limitation on the time consumption of the simulation step

- Limitations

ABC approach

Limitation on the number of summary statistics used

F

F

S2

s’2

s’ (F, S1 = s’1, S2 = s’2)

S

s’ (F, S = s’)

s’1

S1

Summary Statistics = 2

Summary Statistics = 1

ABC approach

Natural limitation due to lack of information in data sets

Limitation on the number of summary statistics used

Limitation on the calculation of summary statistic (time consuming)

Limitation on the time consumption of the simulation step

- Limitations

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Bayesian inference on population genetics
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with MCMC
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

- Typical ABC run

Step3 - estimation

Step1 - simulation

Step2 – getting posterior distribution

- Choosing the priors
- Choosing the summary statistics
- Choosing a “rejection” method of the simulated data

SummStats, S

- ABC approach

- Typical ABC run

Rejection method (Pritchard et al, 1999):

d - tolerance

Posterior distribution – p(F| S)

s’ – “real” data

SummStats, S

- ABC approach

- Typical ABC run

Local Linear Multiple Regression adjustment and Weighting (Beaumont et al, 2002):

Regression

s’ - “real” data

Posterior distribution – p(F| S)

Weighting

- Typical ABC run

E [P(F|S=s)]

Correlation coefficients vector

Linear multiple regression:

Vector of standardized summstats

Local weighting

We want to minimize

Least square error

where

Spherical acceptance region

Epanechnikov kernel

- Typical ABC run

Least squares gives an estimate of

the posterior mean

To obtain samples from the posterior distribution we adjust the parameter values as

I.e. we are assuming that the conditional mean of the parameter is a linear function of the summary statistics, but all other moments remain the same.

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Characteristics of an ABC methodology
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with MCMC
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

Neanc

Popanc

m

tev1

One simple case:

- 6 parameters to be estimated
+

m (mutation rate)

m2

t

m1

Ne2

Ne1

Pop1

Pop2

Summary Statistics used

- Sequence Data:
- mean of pairwise differences
- in each population
- both populations joined together

- number of segregating sites
- in each population
- both populations joined together

- number of haplotypes
- in each population
- both populations joined together

Simulated “real” data and Prior information

1000

1000

1000

0.01

0.01

500

0 10000

0 10000

0 10000

0 0.05

0 0.05

0 5000

Ne1

Ne2

Neanc

Mig1

Mig2

Tev

“real” data

ABC method

prior distribution

MCMC method

ABC vs MCMC:

Data 1 (no migration); Simulation 7:

Ne1

Ne2

Neanc

Tev

Data 2 (migration = 0.01); Simulation 9:

Ne1

Ne2

Neanc

Mig1

Mig2

Tev

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Characteristics of an ABC methodology
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with MCMC
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

Summary Statistics used

- Sequence Data:
- mean of pairwise differences
- in each population
- both populations joined together

- number of segregating sites
- in each population
- both populations joined together

- number of haplotypes
- in each population
- both populations joined together

- variance of pairwise differences
- in each population
- both populations joined together

- Shanon’s index
- in each population
- both populations joined together

- number of singletons
- in each population
- both populations joined together

Simulated “real” data and Prior information

1000

1000

1000

0.01

0.01

500

0 10000

0 10000

0 10000

0 0.05

0 0.05

0 5000

Ne1

Ne2

Neanc

Mig1

Mig2

Tev

previous + Shanon’s

MCMC based method

“real” data

standard

previous + singletons

prior distribution

previous + var pairwise dif

Summary Statistics (500 000 iter, tol=0.02):

Data 1 (no migration); Simulation 7:

Ne1

Ne2

Neanc

Tev

Data 2 (migration = 0.01); Simulation 9:

Ne1

Ne2

Neanc

Mig1

Mig2

Tev

Summary Statistics (7 000 000 iter, tol=0.02):

Data 1 (no migration); Simulation 7:

Ne1

Ne2

Neanc

Tev

Data 2 (migration = 0.01); Simulation 9:

Ne1

Ne2

Neanc

Mig1

Mig2

Tev

Summary Statistics (7 000 000 iter, tol=0.02):

MISE: No migration

MISE: Migration = 0.01

Summary Statistics (7 000 000 iter, tol=0.02):

Adjusted R2: No migration

Adjusted R2: Migration = 0.01

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Characteristics of an ABC methodology
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with MCMC
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

Neanc1

Popanc1

tev1

m

manc

Neanc2

m3

Popanc2

- 11 parameters to be estimated
+

topology

+

m (mutation rate)

tev2

m1

m2

Ne2

Ne3

Ne1

Pop3

Pop1

Pop2

Simulated “real” data and Prior information

0.01

1000

1000

1000

1000

1000

0 0.05

0 10000

0 10000

0 10000

0 10000

0 10000

Mig1

Ne1

Ne2

Ne3

Neanc2

Neanc1

1500

0.01

0.01

0.01

500

free top

fixed top

0 5000

0 0.05

0 0.05

0 0.05

0 5000

Tev1

Mig2

Mig3

Miganc

Tev2

Three Populations model (no migration):

Data 1 (no migration); Simulation 7:

Ne1

Ne2

Ne3

Neanc2

Neanc1

Topology:

(2,3)1)

Tev1

Tev2

Three Populations model (migration = 0.01):

Data 2 (migration = 0.01); Simulation 6:

Mig1

Ne1

Ne2

Ne3

Neanc2

Neanc1

Topology:

(1,2)3)

Tev1

Mig2

Mig3

Miganc

Tev2

- ABC up to 2 orders of magnitude faster for single locus
- ABC modes are similar to MCMC but overall precision is lower
- No substantial improvement with more summary statistics
- No substantial improvement with more iterations
- ABC is able to consider more complex scenarios,
but ability to infer parameters is reduced when considering migration

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Characteristics of an ABC methodology
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with a MCMC one
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

The user-friendly version of the program (initial stage)

- Present Work

- Features of the program
- Use of heredity scalars for each locus
- Use different types of DNA data at the same time (Microsatellite and DNA sequence)
- Use an unlimited number of populations within an IM model
- Use of different combinations of 7 different summary statistics for each DNA data type

- Freeware and source code available (soon)

- Applications of ABC in population genetics
- Motivation for the application of ABC
- ABC approach
- Characteristics of an ABC methodology
- Algorithm of an ABC inference
- Limitations of the ABC approach
- Typical ABC run

- Present work
- Compare the ABC algorithm with a MCMC one
- Study the use of different summary statistics
- Study the use of ABC in more complex scenario
- “State of art” of the software

- Future developments

- Current Goals
- Currently addressing the method to a published data set (Won & Hey, 2005)
- Continue to improve the accuracy of ABC (e.g. identify better summary statistics)
- Obtain better estimations for MISE (e.g. using more simulated ‘real’ data)

- Future Goals
- Add recombination
- Create a user-friendly interface
- Use a variable migration rate through time
- Improve ABC: sequential method
non-linear regression

I would like to acknowledge David Balding for helpful discussion on the methods used. And also a special thanks to Mark Beaumont for advice and comments on the work.

Support for this work was provided by EPSRC.

http://www.rdg.ac.uk/~sar05sal

Download Presentation

Connecting to Server..