1 / 27

# Hidden Markov Models For Genetic Linkage Analysis Lecture #4 - PowerPoint PPT Presentation

Hidden Markov Models For Genetic Linkage Analysis Lecture #4. Prepared by Dan Geiger. S 1. S 2. S 3. S i-1. S i. S i+1. R 1. R 2. R 3. R i-1. R i. R i+1. X 1. X 1. X2. X2. X3. X3. Xi-1. Xi-1. Xi. Xi. Xi+1. Xi+1. Hidden Markov Models in General.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Hidden Markov Models For Genetic Linkage Analysis Lecture #4' - jerod

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Hidden Markov Models For Genetic Linkage AnalysisLecture #4

.

Prepared by Dan Geiger.

S1

S2

S3

Si-1

Si

Si+1

R1

R2

R3

Ri-1

Ri

Ri+1

X1

X1

X2

X2

X3

X3

Xi-1

Xi-1

Xi

Xi

Xi+1

Xi+1

Hidden Markov Models in General

Which depicts the factorization:

X1

X1

X2

X2

X3

X3

Xi-1

Xi-1

Xi

Xi

Xi+1

Xi+1

Hidden Markov Model In our case

S1

S2

S3

Si-1

Si

Si+1

X1

X2

X3

Yi-1

Xi

Xi+1

The compounded variable Si = (Si,1,…,Si,2n)is called the inheritance vector. It has 22n states where n is the number of persons that have parents in the pedigree (non-founders). The compounded variable Xi = (Xi,1,…,Xi,2n) is the data regarding locus i. Similarly for the disease locus we use Yi.

To specify the HMM we need to write down the transition matrices from Si-1 to Si and the matrices P(xi|Si). Note that these quantities have already been implicitly defined.

H1

H2

HL-1

HL

X1

X2

XL-1

XL

An efficient solution, assuming local probability tables (“the parameters”) are known, is called the Viterbi Algorithm.

Same problem if replaced by maximizing the joint distribution p(h1,…,hL,x1,..,xL)

Queries of interest (MAP)

The Maximum A Posteriori query :

An answer to this query gives the most probable inheritance vector for all locations.

H1

H2

HL-1

HL

Hi

X1

X2

XL-1

XL

Xi

Queries of interest (Belief Update) Posterior Decoding

1. Compute the posteriori belief in Hi (specific i) given the evidence {x1,…,xL} for each of Hi’s values hi, namely, compute p(hi | x1,…,xL).

2. Do the same computation for every Hibut without repeating the first task L times.

Local probability tables are assumed to be known. An answer to this query gives the probability distribution of inheritance vectors at an arbitrary location.

H1

H2

HL-1

HL

Hi

X1

X2

XL-1

XL

Xi

= P(x1,…,xi,hi) P(xi+1,…,xL | hi)

 f(hi) b(hi)

Decomposing the computation of Belief update (Posterior decoding)

Belief update: P(hi | x1,…,xL) = (1/K) P(x1,…,xL,hi)

where K= hiP(x1,…,xL,hi).

P(x1,…,xL,hi) = P(x1,…,xi,hi) P(xi+1,…,xL | x1,…,xi,hi)

Equality due to Ind({xi+1,…,xL}, {x1,…,xi} | Hi}

H1

H2

Hi

X1

X2

Xi

The task: Compute f(hi) = P(x1,…,xi,hi) for i=1,…,L (namely, considering evidence up to time slot i).

P(x1, h1) = P(h1) P(x1|h1) {Basis step}

P(x1,x2,h2) =  P(x1,h1,h2,x2) {Second step}

=  P(x1,h1) P(h2 | x1,h1) P(x2 | x1,h1,h2)

h1

h1

Last equality due to conditional independence

=  P(x1,h1) P(h2 | h1) P(x2 | h2)

h1

{step i}

P(x1,…,xi,hi) =  P(x1,…,xi-1, hi-1) P(hi | hi-1 ) P(xi | hi)

hi-1

The forward algorithm

Hi

Hi+1

HL-1

HL

Xi+1

XL-1

XL

P(xL| hL-1) =  P(xL ,hL |hL-1) =  P(hL |hL-1) P(xL |hL-1,hL )=

hL

hL

Last equality due to conditional independence

=  P(hL |hL-1) P(xL |hL ) {first step}

hL

{step i}

P(xi+1,…,xL|hi) =  P(hi+1 | hi) P(xi+1 | hi+1) P(xi+2,…,xL| hi+1)

hi+1

=b(hi)=

=b(hi+1)=

The backward algorithm

The task: Compute b(hi) = P(xi+1,…,xL|hi) for i=L-1,…,1 (namely, considering evidence after time slot i).

H1

H2

HL-1

HL

Hi

X1

X2

XL-1

XL

Xi

1. To Compute the posteriori belief in Hi (specific i) given the evidence {x1,…,xL} run the forward algorithm and compute f(hi) = P(x1,…,xi,hi), run the backward algorithm to compute b(hi) = P(xi+1,…,xL|hi), the product f(hi)b(hi) is the answer (for every possible value hi).

2. To Compute the posteriori belief for every Hisimply run the forward and backward algorithms once, storing f(hi) and b(hi) for every i (and value hi). Compute f(hi)b(hi) for every i.

H1

H2

HL-1

HL

Hi

X1

X2

XL-1

XL

Xi

Consequence I: Likelihood of evidence

• To compute the likelihood of evidence P(x1,…,xL), do one more step in the forward algorithm, namely,

•  f(hL) =  P(x1,…,xL,hL)

• 2. Alternatively, do one more step in the backward algorithm, namely,

•  b(h1) P(h1) P(x1|h1) =  P(x2,…,xL|h1) P(h1) P(x1|h1)

hL

hL

h1

h1

Evaluate likelihood of evidence for all locations of the disease marker

And choose the best.

H1

H2

HL-1

HL

Hi

X1

X2

XL-1

XL

Xi

Consequence II: The E-step

Recall that belief update has been computed via

P(x1,…,xL,hi) = P(x1,…,xi,hi) P(xi+1,…,xL | hi)  f(hi) b(hi)

Now we wish to compute (for the E-step)

p(x1,…,xL,hi ,hi+1)=

p(x1,…,xi,hi) p(hi+1|hi)p(xi+1| hi+1)p(xi+2,…,xL |hi+1)

= f(hi) p(hi+1|hi) p(xi+1| hi+1) b(hi+1)

H1

H2

Hi+1

HL

Hi

X1

X2

Yi+1

XL

Xi

The Expectation Maximization algorithm

Iterate the two steps:

E-step: compute p(x1,…,xL,hi ,hi+1) where i+1 is the disease locus

M-step:

Until  convergences.

Comment: use random restarts, other convergence criteria, other ending schemes

H1

H2

HL-1

HL

Hi

X1

X2

XL-1

XL

Xi

Time and Space Complexity of the forward/backward algorithms

Time complexity is linear in the length of the chain, provided the number of states of each variable is a constant. More precisely, time complexity is O(k2L) where k is the maximum domain size of each variable.

Space complexity is also O(k2L).

H1

H2

HL-1

HL

Hi

X1

X2

XL-1

XL

Xi

• Recall that the query asking likelihood of evidence is to compute P(x1,…,xL) =  P(x1,…,xL, h1,…,hL)

• Now we wish to compute a similar quantity:

• P*(x1,…,xL) = MAXP(x1,…,xL, h1,…,hL)

(h1,…,hL)

(h1,…,hL)

The MAP query in HMM

P(x1,x2,x3) = P(h1)P(x1|h1) P(h2|h1)P(x2|h2) P(h3 |h2)P(x3|h3)

h1

h3

h2

= P(h1)P(x1|h1) b(h2)P(h1|h2)P(x2|h2)

h1

h2

=  b(h1) P(h1)P(x1|h1)

h1

Example: Revisiting likelihood of evidence

H1

H2

H3

X1

X2

X3

maximum = max P(h1)P(x1|h1) max P(h2|h1)P(x2|h2) max P(h3 |h2)P(x3|h3)

h1

h3

h2

x* (h1)

x* (h2)

h2

h3

= max P(h1)P(x1|h1) max b (h2)P(h1|h2)P(x2|h2)

h3

h1

h2

{Finding the maximum}

= max b (h1)P(h1)P(x1|h1)

h2

h1

{Finding the map assignment}

h1* = arg max b (h1)P(h1)P(x1|h1)

h2

h1

h2* = x* (h1*);

h3* = x* (h2*)

h2

h3

Example: Computing the MAP assignment

H1

H2

H3

X1

X2

X3

Replace sums with taking maximum:

H1

H2

HL-1

HL

Hi

Backward phase:

b (hL) = 1

hL+1

X1

X2

XL-1

XL

Xi

For i=L-1 downto 1 do

b (hi) = MAX P(hi+1 | hi) P(xi+1 | hi+1) b (hi+1)

hi+1

hi+2

hi+1

x* (hi) = ARGMAX P(hi+1 | hi) P(xi+1 | hi+1) b (hi+1)

hi+1

hi+2

hi+1

(Storing the best value as a function of the parent’s values)

Forward phase (Tracing the MAP assignment) :

h1* = ARG MAX P(h1) P(x1|h1) b (h1)

h2

h2

For i=1 to L-1 do

hi+1* = x* (hi *)

hi+1

Viterbi’s algorithm

H1

H2

HL-1

HL

Hi

X1

X2

XL-1

XL

Xi

Summary of HMM

• Belief update = posterior decoding

• Forward-Backward algorithm

• Maximum A Posteriori assignment

• Viterbi algorithm

H1

H2

Hi

X1

X2

Xi

{step i}

P(x1,…,xi,hi) =  P(x1,…,xi-1, hi-1) P(hi | hi-1 ) P(xi | hi)

hi-1

The forward algorithm for genetic linkage analysis

Note that in Step i of the forward algorithm, we multiply a transition matrix of size 22n x 22n with vectors of size 22n.

The transition matrix P(hi | hi-1 ) has a special form so we can multiply it by a vector faster than for arbitrary matrices.

The vector P(xi | hi) is not given explicitly, so we will see an efficient way to compute it.

For n non-founders, the transition matrix is the n-fold Kronecker product:

The transition matrix

Recall that:

Note that theta depends on I but this dependence is omitted. In our example, where we have one non-founder (n=1), the transition probability table size is 4  4 = 22n 22n,encoding four options of recombination/non-recombination for the two parental meiosis:

So, if we start with a matrix of size 22n, we will need 22n multiplications if we had matrix A in hands. Continuing recursively, at most 2n times, yields a complexity of O(2n22n), far less than O(24n) needed for regular multiplication.

With n=10 non-founders, we drop from non-feasible region to feasible one.

L21m

L21f

L22m

L22f

X21

S23m

X22

S23f

=  P(l21m)P(l21f)P(l22m)P(l22f) P(x21 | l21m, l21f)

P(x22 | l22m, l22f) P(x23 | l23m, l23f) P(l23m | l21m, l21f, S23m) P(l23f | l22m, l22f, S23f)

L23f

L23m

l21m,l21f,l22m,l22f l22m,l22f

X23

Model for locus 2

Probability of data in one marker locus given an inheritance vector

P(x21, x22 , x23 |s23m,s23f) =

The five last terms are always zero-or-one, namely, indicator functions.

L21m

L21f

L22m

L22f

=1

X21

S23m

X22

S23f

=0

L23f

L23m

X23

={A1,A2}

Model for locus 2

Assume only individual 3 is genotyped. For the inheritance vector (0,1), the founder alleles L21m and L22f are not restricted by the data while (L21f,L22m) have two possible joint assignments (A1,A2) or (A2,A1) only:

p(x21, x22 , x23 |s23m=1,s23f =0) = p(A1)p(A2) + p(A2)p(A1)

In general. Every inheritance vector defines a subgraph of the Bayesian network above. We build a founder graph

The five last terms are always zero-or-one, namely, indicator functions.

L21m

L21f

L22m

L22f

=1

X21

S23m

X22

S23f

=0

L23f

L23m

X23

={A1,A2}

Model for locus 2

In general. Every inheritance vector defines a subgraph as indicated by the black lines above. Construct a founder graph whose vertices are the founder variables and where there is an edge between two vertices if they have a common typed descendent. The label of an edge is the constraint dictated by the common typed descendent. Now find all consistent assignments for every connected component.

{A1,A2}

L21m

L21f

L22m

L22f

The five last terms are always zero-or-one, namely, indicator functions.

3

5

6

2

8

1

7

a,b

a,b

b,d

a,b

a,c

a,b

A Larger Example

Descent graph

Founder graph (An example of a constraint satisfaction graph)

{a,b}

{a,b}

5

5

3

3

6

6

4

4

{b,d}

{a,c}

{a,b}

2

1

8

7

Connect two nodes if they have a common typed descendant.

{a,b}

5

5

3

3

6

6

4

4

{b,d}

{a,c}

{a,b}

2

1

8

7

The Constraint Satisfaction Problem

The number of possible consistent alleles per non-isolated node is 0, 1 or 2. For example node 2 has all possible alleles, node 6 can only be b and node 3 can be assigned either a or b. namely, the intersection of its adjacent edges labels.

For each non-singleton connected component:

Start with an arbitrary node, pick one of its values. This dictates all other values in the component. Repeat with the other value if it has one.

So each non-singleton component yields at most two solutions.

{a,b}

5

5

3

3

6

6

4

4

{b,d}

{a,c}

{a,b}

2

1

8

7

Solution of the CSP

Since each non-singleton component yields at most two solutions.

The likelihood is simply the product of sums each of two terms at most. Each component contributes one term. Singleton components contribute the term 1

In our example: 1 * [ p(a)p(b)p(a) + p(b)p(a)p(b)] * p(d)p(b)p(a)p(c).

Complexity. Building the founder graph: O(f2+n).

While solving general CSPs is NP-hard.