Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Al...
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Yufeng Wu UC Davis RECOMB 2007 PowerPoint PPT Presentation


  • 75 Views
  • Uploaded on
  • Presentation posted in: General

Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms. Yufeng Wu UC Davis RECOMB 2007. Cases. Controls. Diploid: two sequences per individuals. Association Mapping of Diseases. 0. 1. SNPs.

Download Presentation

Yufeng Wu UC Davis RECOMB 2007

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Yufeng wu uc davis recomb 2007

Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms

Yufeng Wu

UC Davis

RECOMB 2007


Association mapping of diseases

Cases

Controls

Diploid: two sequences per individuals

Association Mapping of Diseases

0

1

SNPs

Problem: Where are (unobserved) disease mutations? This talk: Genealogy-based approach


Genealogy evolutionary history of genomic sequences

Disease mutation

Genealogy: Evolutionary History of Genomic Sequences

  • Tells how individuals in a population are related

  • Helps to explain diseases: disease mutations occur on branches and all descendents carry the mutations

  • Problem: How to determine the genealogy for “unrelated” individuals?

  • Not easy with recombination

Diseased (case)

Healthy (control)

Individuals in current population


Recombination

Suffix

Prefix

11000

0000001111

Breakpoint

Recombination

  • One of the principle genetic forces shaping sequence variations within species

  • Two equal length sequences generate a third new equal length sequence in genealogy

110001111111001

000110000001111


Ancestral recombination graph arg

00

1 0

0 1

10

1 1

Ancestral Recombination Graph (ARG)

Mutations

Recombination

10

01

00

10

11

01

00

S1 = 00

S2 = 01

S3 = 10

S4 = 11

Assumption:

At most one mutation per site

S1 = 00

S2 = 01

S3 = 10

S4 = 10


Mapping disease gene with inferred genealogy

Mapping Disease Gene with Inferred Genealogy

  • “..the best information that we could possibly get about association is to know the full coalescent genealogy…” – Zollner and Pritchard, 2005

  • But we do not know the true ARG!

  • Goal: infer ARGs from sequences for association mapping

    • Not easy and often approximation is used (e.g. Zollner and Pritchard)


The arg approaches

The ARG Approaches

  • First practical ARG association mapping method (Minichiello and Durbin, 2006)

    • Use plausible ARGs: heuristic

  • My work: Generate ARGs with a provable property, and works on a well-defined complex disease model

    • minARGs: Most parsimonious ARGs that use the minimum number of recombinations.

    • Uniform sampling of minARGs: generate one minARG from the space of all minARGs with equal probability. (Sampling is a scheme often used in genealogy-based approaches)


Counting minargs by dynamic programming this paper

N1=124

N2=32

Recursion

N = 124*1 + 32*2 = 188

00000

01000

01100

01101

11100

00010

00011

00000

01000

01100

11100

00010

11011

00011

It turns out no other row choices contribute to the minARG space.

11011

01101

Counting minARGs by Dynamic Programming(This paper)

00000

01000

01100

01101

11100

00010

11011

00011

Assume only input sequences are generated.

1

2


Yufeng wu uc davis recomb 2007

N1=124

N2=32

00000

01000

01100

01101

11100

00010

00011

00000

01000

01100

11100

00010

11011

00011

11011

01101

2. Pick 11011 as last row to derive

3. Move to reduced matrix

188 minARGs

00000

01000

01100

01101

11100

00010

11011

00011

Idea: Use counting of minARGs in selecting the order of sequences to generate.

1

2

Can be easily extend to weighted sampling, e.g. generate less frequent sequences later.

1. Random value Rnd = 0.3 < 0.66

Select 11011 with prob = 124/188 = 0.66, and 01101 with prob = 32*2/188 = 0.34


Args represent a set of marginal trees

Possible disease mutation

ARGs Represent a Set of Marginal Trees

  • Clear separation of cases/controls: NOT expected for complex diseases!

Case

Control


Realities of mappping complex diseases

1 2

Multiple disease mutations!

Cases

Controls

Diploid: two sequences per individuals

Incomplete penetrance

Realities of Mappping Complex Diseases

Trying to find one tree branch which clearly separate cases and controls may not work for complex diseases!

Solution: Inference on a well-defined disease model.

SNPs


Complex disease model how a disease affects population zollner pritchard 2005

Probability of disease mutations occur at the branch (computed from mutation rate and branch length)

Complex Disease Model: How A Disease Affects Population (Zollner & Pritchard, 2005)

A formal model of the complex disease is needed to assess the significance of a chosen marginal tree for real data.

0.02

0.1

0.05

Disease mutations: Poisson Process

Two alleles: wild-type and mutant

0.08

0.03

0.01

0.06

0.07


Disease penetrance zollner pritchard

Disease Penetrance (Zollner & Pritchard)

cAse

PA,1: probability of a mutant sequence becomes a case

PC,1 = 1.0 - PA,1

PA,0: probability of a wild-type sequence becomes a case

PC,0 = 1.0 - PA,0

Control

0.02

0.1

0.05

0.08

0.03

0.01

0.06

0.07

PA,1 = 0.8, PC,1 = 0.2

PA,0 = 0.1, PC,0 = 0.9


Phenotype likelihood how likely are phenotypes generated on a marginal tree zollner and pritchard

Phenotype Likelihood: How Likely are Phenotypes Generated on a Marginal Tree? (Zollner and Pritchard)

  • The disease model specifies a probabilistic way of assigning phenotypes for a given tree.

  • But we have many trees and at which tree disease mutations occurs?

  • Given a tree T and case/control phenotypes  of its leaves, what is the probability of observing  on T?

    • High phenotype likelihood: disease mutations may occur in T

    • Computable in linear time and adopted in this work


This paper expected phenotype likelihood

This Paper: Expected Phenotype Likelihood

  • We need to assess statistical significance of computed phenotype likelihood.

    • Null model: randomly permute case/control status of leaves in the given tree.

    • P-value by permutation tests: computational bottleneck!

  • My result: O(n3) algorithm computing expected value (and variance) of phenotype likelihood.

    • Exact, fully deterministic method.

    • But, computing P-value precisely and efficiently remains open.


This paper diploid penetrance is hard

Case

Control

This Paper: Diploid PenetranceIs Hard

Diploid (e.g. humans): two sequences per individual

Diploid penetrance:

PA,00: prob. Individual with two wild-type sequences becomes a case

PA,01 : prob. Individual with one wild-type and one mutant becomes a case

PA,11: …

Efficient computation of phenotype likelihood: stated but unresolved in Zollner and Pritchard

My result: computing phenotype likelihood with diploid penetrance is NP-hard


Yufeng wu uc davis recomb 2007

Simulation Results

  • Average mapping error for 50 simulated datasets from Zollner and Pritchard

  • Average over 50 genealogies

  • Date: January, 2007

Comparison: TMARG, LATAG (Z. P.),MARGARITA (M. D.).

TMARG (my program) and MARGRITA are much faster (20 times or more) than LATAG. Important for whole genome scan.


Acknowledgement

Acknowledgement

  • Software available at: http://wwwcsif.cs.ucdavis.edu/~wuyu

  • I want to thank

    • Dan Gusfield

    • Dan Brown

    • Chuck Langley

    • Yun S. Song


  • Login