Inferring local tree topologies for snp sequences under recombination in a population
Download
1 / 17

- PowerPoint PPT Presentation


  • 460 Views
  • Updated On :

Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population. Yufeng Wu Dept. of Computer Science and Engineering University of Connecticut, USA. Sites. 00100 01010 00101 00010 11101. Haplotypes. Genetic Variations. Sites.

Related searches for

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - erika


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Inferring local tree topologies for snp sequences under recombination in a population l.jpg

Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population

Yufeng Wu

Dept. of Computer Science and Engineering

University of Connecticut, USA

MIEP 2008


Genetic variations l.jpg

Sites Recombination in a Population

00100

01010

00101

00010

11101

Haplotypes

Genetic Variations

Sites

  • Single-nucleotide polymorphism (SNP): a site (genomic location) where two types of nucleotides occur frequently in the population.

    • Haplotype, a binary vector of SNPs (encoded as 0/1).

  • Haplotypes: offer hints on genealogy.

AATGTAGCCGA

AATATAACCTA

AATGTAGCCGT

AATGTAACCTA

CATATAGCCGT

AATGTAGCCGA

AATATAACCTA

AATGTAGCCGT

AATGTAACCTA

CATATAGCCGT

Each SNP induces a split

DNA sequences


Genealogy evolutionary history of genomic sequences l.jpg

Disease mutation Recombination in a Population

Genealogy: Evolutionary History of Genomic Sequences

  • Tells how individuals in a population are related

  • Helps to explain diseases: disease mutations occur on branches and all descendents carry the mutations

  • Problem: How to determine the genealogy for “unrelated” individuals?

  • Complicated by recombination

Diseased (case)

Healthy (control)

Individuals in current population


Recombination l.jpg

Suffix Recombination in a Population

Prefix

11000

0000001111

Breakpoint

Recombination

  • One of the principle genetic forces shaping sequence variations within species

  • Two equal length sequences generate a third new equal length sequence in genealogy

    • Spatial order is important: different parts of genome inherit from different ancestors.

110001111111001

000110000001111


Ancestral recombination graph arg l.jpg

0 Recombination in a Population0

10

Ancestral Recombination Graph (ARG)

Mutations

Recombination

1 0

0 1

1 1

10

01

00

10

11

01

00

S1 = 00

S2 = 01

S3 = 10

S4 = 11

Assumption:

At most one mutation per site

S1 = 00

S2 = 01

S3 = 10

S4 = 10


Local trees l.jpg
Local Trees Recombination in a Population

ARG

  • ARG represents a set of local trees.

  • Each tree for a continuous genomic region.

  • No recombination between two sites  same local trees for the two sites

  • Local tree topology: informative and useful

Local tree near site 2

Local tree to the right of site 3

Local tree near sites 1 and 2


Inference of local tree topologies l.jpg
Inference of Local Tree Topologies Recombination in a Population

  • Question: given SNP haplotypes, infer local tree topologies (one tree for each SNP site, ignore branch length)

  • Hein (1990, 1993)

  • Enumerate all possible tree topologies at each site

    • Song and Hein (2003,2005)

    • Parsimony-based

  • Local tree reconstruction can be formulated as inference on a hidden Markov model.


  • Local tree topologies l.jpg
    Local Tree Topologies Recombination in a Population

    • Key technical difficulty

      • Brute-force enumeration of local tree topologies: not feasible when number of sequences > 9

    • Can not enumerate all tree topologies

    • Trivial solution: create a tree for a SNP containing the single split induced by the SNP.

      • Always correct (assume one mutation per site)

      • But not very informative: need more refined trees!

    A: 0

    B: 0

    C: 1

    D: 0

    E: 1

    F: 0

    G: 1

    H: 0

    A

    C

    B

    E

    D

    F

    G

    H


    How to do better neighboring local trees are similar l.jpg
    How to do better? Neighboring Local Trees are Recombination in a PopulationSimilar!

    • Nearby SNP sites provide hints!

      • Near-by local trees are often topologically similar

      • Recombination often only alters small parts of the trees

    • Key idea: reconstructing local trees by combining information from multiple nearby SNPs


    Rent refining neighboring trees l.jpg
    RENT: REfining Neighboring Trees Recombination in a Population

    • Maintain for each SNP site a (possibly non-binary) tree topology

      • Initialize to a tree containing the split induced by the SNP

    • Gradually refining trees by adding new splits to the trees

      • Splits found by a set of rules (later)

      • Splits added early may be more reliable

    • Stop when binary trees or enough information is recovered


    A little background compatibility l.jpg
    A Little Background: Compatibility Recombination in a Population

    1 2 3 4 5

    a

    b

    c

    d

    e

    f

    g

    0 0 0 1 0

    1 0 0 1 0

    0 0 1 0 0

    1 0 1 0 0

    0 1 1 0 0

    0 1 1 0 1

    0 0 1 0 1

    Sites 1 and 2 are compatible, but 1 and 3 are incompatible.

    M

    • Two sites (columns) p, q are incompatible if columns p,q contains all four ordered pairs (gametes): 00, 01, 10, 11. Otherwise, p and q are compatible.

      • Easily extended to splits.

      • A split s is incompatible with tree T if s is incompatible with any one split in T. Two trees are compatible if their splits are pairwise compatible.


    Fully compatible region simple case l.jpg
    Fully-Compatible Region: Simple Case Recombination in a Population

    • A region of consecutive SNP sites where these SNPs are pairwise compatible.

      • May indicate no topology-altering recombination occurred within the region

    • Rule: for site s, add any such split to tree at s.

      • Compatibility: very strong property and unlikely arise due to chance.


    Split propagation more general rule l.jpg
    Split Propagation: More General Rule Recombination in a Population

    • Three consecutive sites 1,2 and 3. Sites 1 and 2 are incompatible. Does site 3 matter for tree at site 1?

      • Trees at site 1 and 2 are different.

      • Suppose site 3 is compatible with sites 1 and 2. Then?

      • Site 3 may indicate a shared subtreein both trees at sites 1 and 2.

    • Rule: a split propagates to both directions until reaching a incompatible tree.


    Unique refinement l.jpg
    Unique Refinement Recombination in a Population

    • Consider the subtree with leaves 1,2 and 3.

      • Which refinement is more likely?

      • Add split of 1 and 2: the only split that is compatible with neighboring T2.

    • Rule: refine a non-binary node by the only compatible split with neighboring trees

    ?

    1

    3

    2


    One subtree prune regraft spr event l.jpg
    One Subtree-Prune-Regraft (SPR) Event Recombination in a Population

    • Recombination: simulated by SPR.

      • The rest of two trees (without pruned subtrees) remain the same

    • Rule: find identicalsubtree Ts in neighboring trees T1 and T2, s.t. the rest of T1 and T2 (Ts removed) are compatible. Then joint refine T1- Ts and T2- Ts before adding back Ts.

    Subtree to prune

    More complex rules possible.


    Simulation l.jpg
    Simulation Recombination in a Population

    • Hudson’s program MS (with known coalescent local tree topologies): 100 datasets for each settings.

      • Data much larger and perform better or similarly for small data than Song and Hein’s method.

    • Test local tree topology recovery scored by Song and Hein’s shared-split measure

     = 15

     = 50


    Acknowledgement l.jpg
    Acknowledgement Recombination in a Population

    • Software available upon request.

    • More information available at: http://www.engr.uconn.edu/~ywu

    • I want to thank

      • Yun S. Song

      • Dan Gusfield


    ad