Loading in 5 sec....

Markov Chain Monte CarloPowerPoint Presentation

Markov Chain Monte Carlo

- By
**nardo** - Follow User

- 113 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Markov Chain Monte Carlo' - nardo

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Motivation

- Models built to represent diverse types of genetic data tend to have a complex structure. Computation on these structures are often intractable.
- We see this, for instance, in pedigree analysis.
- Stochastic simulation techniques are good ways to obtain approximation.
- Markov Chain Monte Carlo methods have proven to be quite useful for such simulations.

S1

S2

S3

Si-1

Si

Si+1

X1

X2

X3

Xi-1

Xi

Xi+1

Markov chain- Markov chains are random processes X1,X2,…,having a discrete 1-dimensional index set (1,2,3…)
- domainD of m states {s1,…,sm} from which it takes it’s values
- Satisfying the markov property:
- The distribution of xt+1 given all the previous states xt xt-1 … depends only on the immediately preceding state xt

- Satisfying the markov property:

S1

S2

S3

Si-1

Si

Si+1

X1

X2

X3

Xi-1

Xi

Xi+1

- A markov chain also consists of
- An m dimensional initial distribution vector
- An m×m transition probabilities transition matrix P= (Aij) s.t P(i,j)=Pr[xt+1=si|xt=sj]

Notice that the transition probability between Xt and Xt+2 are also derived in the following way:

- The multiplications shown in the previous slide are equivalent to multiplying M by itself .
- We conclude that P2ik= Pr(xt+2=sk|xt=si)
more generally Pnik = Pr(xt+n=sk|xt=si)

Stationary distribution

A distribution P = (p1,…, pm) is stationary if P P = P

- When P is unique for a transition matrix P , the rows of all coinside with P.
- in other words , the long-run probability of being in state j is Pj, regardless of the initial state.

The gambler example

- a gambler wins 1 dollar with probability
- losses 1 dollar with probability 1-p
- the gambler plays 5 rounds.
- The transition matrix of that markov chain is:

- there is no unique stationary distribution for this example
- any distribution p for which p1+p6=1 will satisfy pP=p.
- For example:

- For p=0.5 we can see that
- The long run frequency of being in a given state depends on the initial state

Some properties of Markov chains

- A Markov chain is aperiodic if the chances of getting from state to state are not periodic in the number of states needed.
- States i,j communicate if state j is reachable from state i, i.e there exists a finite number n s.t pn(i,j)>0.
- A Markov chain is said to be irreducible if all its states communicate.
- A Markov chain is said to be ergodic if it is irreducible and aperiodic.
- An ergodic Markov chain always has a unique stationary distribution

Calculating an Expectatoin Value

When the probabilities iare given up to an unknown normalization factor Z.

- Examples :
- Boltzman distribution:
- where
- is the canonical Partitioning function,
- which is sometimes incalculable.

2. Constructing the likelihood function …

Prob(data| 2) = P(x11 ,x12 ,x13 ,x21 ,x22 ,x23) =

Probability of data (sum over all states of all hidden variables)

Prob(data| 2) = P(x11 ,x12 ,x13 ,x21 ,x22 ,x23) = l11m, l11f … s23f [P(l11m) P(l11f) P(x11 | l11m, l11f,) … P(s13m) P(s13f) P(s23m |s13m, 2) P(s23m |s13m, 2) ]

Method: Simulate don’t go over all space.

Calculate an estimator to the expectation value.

We want to calculate the expectation value

Instead, we calculate an estimator to the expectation value, by sampling from the space .

MCMC tells us how to carry out the sampling to create a good (unbiased, low variance) estimator.

We need to define an ergodic transition matrix

for which is a unique stationary distibution.

If P satisfies detailed balance, i.e.

for all , then is stationary for P:

MCMC: create an ergodic (=irreducible, aperiodic) Markov Chain and simulate from it.

Metropolis (1953) suggested a simulating algorithm, in which one defines P indirectly via a symmetric transition matrix Q in the following way:

If you are in state i, propose a state j with probability

Move from i to j with probability .

This assures detailed balance of P :

Hastings (1970) generalized for a non-symmetric Q :

- Start with any state . one defines P indirectly via a symmetric transition matrix Q in the following way:
- Step t: current state is .
- Propose state j with probablity .
- compute
- Accept state j if .
- else, draw uniformly from [0,1] a random number c. If , accept state j else , stay in state i.

Metropolis – Hastings Algorithm:

Simulated Annealing one defines P indirectly via a symmetric transition matrix Q in the following way:

Goal: Find the most probable state of a Markov Chain.

Use the Metropolis Algorithm with a different acceptance probability:

Where is the temperature.

Cool gradually, i.e., allow the chain to run m times with an initial temperature , then set a new temperature and allow the chain to run again. Repeat n times, until .

Simulated Annealing (Kirkpatrick,1983) one defines P indirectly via a symmetric transition matrix Q in the following way:

Goal: Find the global minimum of a multi -dimensional surface , when all states have an equal probability.

Use another variation of the Metropolis Algorithm.

Suppose you are in state i. Draw a new state j, and accept using the acceptance probabilty:

Where T is the temperature.

Cool gradually.

Parameters to play with: one defines P indirectly via a symmetric transition matrix Q in the following way:

- n (number of loops)
- m (number of iterations in each loop)
- T0 (initial temperature, shouldn’t be too high or too low)
- T (cooling speed, shouldn’t be too fast)
- k (“Boltzman constant” – scaling parameter)
- C (first step scaling parameter)
- (initial state)
- qij (how to draw the next step)

MCMC Example: Descent Graph one defines P indirectly via a symmetric transition matrix Q in the following way:

- Corresponds to a specific inheritance vector.
- Vertices: the individuals’ genes (2 genes for each individual in the pedigree).
- Edges: represent the gene flow specified by the inheritance vector. A child’s gene is connected by an edge to the parent’s gene from which it flowed.

(Taken from tutorial 6)

1 one defines P indirectly via a symmetric transition matrix Q in the following way:

2

11

12

13

14

Descent Graph

3

4

5

6

a/b

a/b

1

2

7

8

21

22

23

24

(a,b)

(a,b)

a/b

a/b

b/d

a/c

(a,b)

(a,b)

(a,c)

(b,d)

Assume that the descent graph vertices below represent the pedigree on the left.

(Taken from tutorial 6)

Descent Graph one defines P indirectly via a symmetric transition matrix Q in the following way:

3

4

5

6

1

2

7

8

(a,b)

(a,b)

(a,b)

(a,b)

(a,c)

(b,d)

- Assume that paternally inherited genes are on the left.
- Assume that non-founders are placed in increasing order.
- A ‘1’ (‘0’) is used to denote a paternally (maternally) originated gene.
- The gene flow above corresponds to the inheritance vector: v = ( 1,1; 0,0; 1,1; 1,1; 1,1; 0,0 )

(Taken from tutorial 6)

Creating a Markov Chain of descent graphs: one defines P indirectly via a symmetric transition matrix Q in the following way:

- A Markov Chain state is a (legal) descent graph.

- We need to define transition rules from one state to another.

- We will show 4 possible transition rules : T0 , T1 , T2a , T2b .

Transition rule T one defines P indirectly via a symmetric transition matrix Q in the following way:0 :

Choose parent, child.

Transition rule T one defines P indirectly via a symmetric transition matrix Q in the following way:0 :

Choose parent, child.

Swap paternal arc to maternal arc (or vice versa)

Small change in the graph slow mixing

Transition rule T one defines P indirectly via a symmetric transition matrix Q in the following way:1 :

Choose person i.

Transition rule T one defines P indirectly via a symmetric transition matrix Q in the following way:1 :

Choose person i.

Perform a T0 transition from i to each one of his/her children.

Transition rule T one defines P indirectly via a symmetric transition matrix Q in the following way:2a :

Choose a couple i.j with common children.

Transition rule T one defines P indirectly via a symmetric transition matrix Q in the following way:2a :

Choose a couple i.j with common children.

Exchange the subtree rooted at the maternal node of i with the subtree rooted at the maternal node of j. exchange also the paternally rooted subtrees.

Transition rule T one defines P indirectly via a symmetric transition matrix Q in the following way:2b :

Choose a couple i.j with common children.

Transition rule T one defines P indirectly via a symmetric transition matrix Q in the following way:2b :

Choose a couple i.j with common children.

Exchange the subtree rooted at the maternal node of i with the subtree rooted at the paternal node of j, and vice versa.

b one defines P indirectly via a symmetric transition matrix Q in the following way:

1/1

2/2

1/1

2/2

1/1

2/2

1/1

2/2

3/3

1/2

1/2

Two legal graphs of inheritance.

No way to move from a to b using allowed transitions

(the states don’t communicate).

Problem :

a

3/3

3/3

3/3

b one defines P indirectly via a symmetric transition matrix Q in the following way:

1/1

1/1

2/2

2/2

1/1

1/1

2/2

2/2

1/1

2/2

1/1

2/2

3/3

1/2

1/2

3/3

3/3

1/2

Solution: allow passage through illegal graphs (tunneling)

3/3

3/3

3/3

Applying tunneling in MCMC: one defines P indirectly via a symmetric transition matrix Q in the following way:

In order to move from state i to state j (one step of the chain),

(1) Select a transition (+person+locus) and perform it.

(2) Toss a coin. If it says “head”, stop. If it says “tail”, goto (1).

(3) If the proposed state j is legal, accept it with the Metropolis probability

This assures that the Markov Chain is irreducible.

One use of descent graphs MCMC is the calculation of the location score for location d, given marker evidence M and trait phenotypes T :

The difficult term to calculate is the conditional probability .

Assuming linkage equilibrium we can write ,where is a descent graph.

MCMC: the stationary distribution of the chain is the conditional distribution .If a sequence of legal descent graphs is generated by running the chain, then for n large enough the estimate is: . The term can be calculated quickly, using MENDEL.

The Gibbs sampler location score for location d, given marker evidence M and trait phenotypes T :

- Special case of the Metropolis - Hastings algorithm for Cartesian product state spaces
- Suppose that each sample point i=(i1,i2,i3..im) has m components, for instance, a pedigree of m individuals
- The Gibbs sampler updates one component of i at a time.
- If the component is chosen randomly and resampled conditional on the components, then the acceptance probability is 1 .

The Gibbs sampler location score for location d, given marker evidence M and trait phenotypes T :

Proof:

- Let ic be the uniformly chosen component
- Denote the remaining components by i-c
- If j is a neighbor of i reachable by changing only component ic then j-c=i-c.
- The proposal probability is
and satisfies piqij=pjqji

- The ratio appearing in the acceptance probability is

Example: location score for location d, given marker evidence M and trait phenotypes T : MCMC with ordered genotypes

- Suppose we want to generate a pedigree for which we know only the unordered genotype of 4 individuals in the last generation .
- We make the following simplifying assumptions:
- The population is in HW equilibrium
- The three loci are in linkage equilibrium
- There is no crossover interference between the three loci

1 location score for location d, given marker evidence M and trait phenotypes T :

2

11

12

13

14

21

22

23

24

2,4

1,2

1,2

2,4

1,2

2,2

3,4

1,2

1,3

1,2

1,3

1,3

- One option is to first simulate the founders genotype according to population frequencies, then “drop” genes along the tree according to Mendel’s law and the recombination fractions

2|1 location score for location d, given marker evidence M and trait phenotypes T :

1|2

1|3

1|4

3|1

2|1

1

2

11

12

13

14

2|1

1|3

1|2

4|2

2|3

2|3

1|3

3|2

3|3

2|4

2|1

3|1

21

22

23

24

2|4

1|2

1|2

2|1

1|3

1|3

2|4

1|2

2|2

4|3

1|2

1|3

- One realization of an ordered genotype consisting with the data we have is the following pedigree :

- Computing the probability of a certain realization of this pedigree is easy but even for this small tree the proposal we saw is just one of ((43)2)4x(23)12=284 options.
- It is obvious we need a better method .

The Gibbs sampler pedigree is easy but even for this small tree the proposal we saw is just one of ((4

- The Gibbs sampler updates the genotype of one individual at one locus at a time.
- This is easy given all other data due to the local nature of the dependency.
We need to compute * Pr[xim(l), xip(l)|xim(-l), xip(-l), x-i,y]

Where xim(l), xip(l) are the maternal and paternal alleles of locus l of individual i,

xim(-l), xip(-l) are the paternal and maternal alleles of all other loci of individual i,

y is the phenotype data,

and x-i denotes the data of all other individuals in the pedigree.

- This calculation involves only the phenotype of i and the genotypes of his parents and children
- under the no crossover assumption the genotype of on locus across a chromosome depends only on it’s adjacent loci
- We get the following expression to calculate *

- Lets return to the genotypes of his parents and childrenpedigree we showed earlier. Assume that the ordered genotype is the current state, and we want to update individual 13 at locus number 2.
- There are 3 genotypes consistent with the genotype configuration of other individuals in the pedigree 1|1,2|1, 1|3
- We need to calculate the probability of each one of these states according to the equation shown in the last slide

- We first compute the paternal and maternal transmission probabilities for each allele in these three genotypes.
- Then we need the transmission probabilities from individual 13 to his kids.
- Once the three have been computed ,they are normalized by dividing each one by the total sum.
- The genotype is sampled according to that distribution .

Problems : probabilities for each allele in these three genotypes.

- Updating is done a single genotype at a time. For a large pedigree this is a very slow process
- The Gibbs sampler may also be reducible, i.e. some legal states will never be reached.For example:

A|C

A|B

A|B

A|C

A|A

A|A

B|C

C|B

by Gibbs

- One solution to this problem is allowing illegal states in the chain which will later be discarded.
- Another is to identify the irreducible components called islands and to define metropolis states allowing to jump from one island to another.
- However the whole process of finding the islands and setting up proposal for the metropolis jumps is not fully automated
- the work involved for each different pedigree prevents the widespread use of this method.

Other MCMC that use the Gibbs Sampler the chain which will later be discarded.

- The L-sampler (Heath, 1997)
- A Whole Locus Gibbs sampler combining Monte Carlo simulation with single-locus peeling

Other MCMC that use the Gibbs Sampler the chain which will later be discarded.

- The M-sampler (Thompson and Heath, 1999;
Thompson, 2000a)

- A whole-meiosis Gibbs sampler which jointly updates an entire meiosis from the full conditional distribution

References the chain which will later be discarded.

- K. Lange, Mathematical and Statistical Methods for Genetic Analysis, chapter 9, Springer (1997).
- A. Bureau, T.P. Speed, Course stat 260: Statistics in Genetics, week 7.
- A.W. George, E. Thompson, Discovering Disease Genes: Multipoint Linkage Analysis via a New Markov Chain Monte Carlo Approach, Statistical Science, Vol. 18, No.4 (2003)
- Lecture notes.
- Tutorial notes.

The the chain which will later be discarded.

End!!

Download Presentation

Connecting to Server..