Inferring local tree topologies for snp sequences under recombination in a population
Download
1 / 17

Eastern Coachella Valley Social Change Collaborative - PowerPoint PPT Presentation


  • 458 Views
  • Updated On :

Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population. Yufeng Wu Dept. of Computer Science and Engineering University of Connecticut, USA. Sites. 00100 01010 00101 00010 11101. Haplotypes. Genetic Variations. Sites.

Related searches for Eastern Coachella Valley Social Change Collaborative

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Eastern Coachella Valley Social Change Collaborative' - erika


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Inferring local tree topologies for snp sequences under recombination in a population l.jpg

Inferring Local Tree Topologies for SNP Sequences Under Recombination in a Population

Yufeng Wu

Dept. of Computer Science and Engineering

University of Connecticut, USA

MIEP 2008


Genetic variations l.jpg

Sites Recombination in a Population

00100

01010

00101

00010

11101

Haplotypes

Genetic Variations

Sites

  • Single-nucleotide polymorphism (SNP): a site (genomic location) where two types of nucleotides occur frequently in the population.

    • Haplotype, a binary vector of SNPs (encoded as 0/1).

  • Haplotypes: offer hints on genealogy.

AATGTAGCCGA

AATATAACCTA

AATGTAGCCGT

AATGTAACCTA

CATATAGCCGT

AATGTAGCCGA

AATATAACCTA

AATGTAGCCGT

AATGTAACCTA

CATATAGCCGT

Each SNP induces a split

DNA sequences


Genealogy evolutionary history of genomic sequences l.jpg

Disease mutation Recombination in a Population

Genealogy: Evolutionary History of Genomic Sequences

  • Tells how individuals in a population are related

  • Helps to explain diseases: disease mutations occur on branches and all descendents carry the mutations

  • Problem: How to determine the genealogy for “unrelated” individuals?

  • Complicated by recombination

Diseased (case)

Healthy (control)

Individuals in current population


Recombination l.jpg

Suffix Recombination in a Population

Prefix

11000

0000001111

Breakpoint

Recombination

  • One of the principle genetic forces shaping sequence variations within species

  • Two equal length sequences generate a third new equal length sequence in genealogy

    • Spatial order is important: different parts of genome inherit from different ancestors.

110001111111001

000110000001111


Ancestral recombination graph arg l.jpg

0 Recombination in a Population0

10

Ancestral Recombination Graph (ARG)

Mutations

Recombination

1 0

0 1

1 1

10

01

00

10

11

01

00

S1 = 00

S2 = 01

S3 = 10

S4 = 11

Assumption:

At most one mutation per site

S1 = 00

S2 = 01

S3 = 10

S4 = 10


Local trees l.jpg
Local Trees Recombination in a Population

ARG

  • ARG represents a set of local trees.

  • Each tree for a continuous genomic region.

  • No recombination between two sites  same local trees for the two sites

  • Local tree topology: informative and useful

Local tree near site 2

Local tree to the right of site 3

Local tree near sites 1 and 2


Inference of local tree topologies l.jpg
Inference of Local Tree Topologies Recombination in a Population

  • Question: given SNP haplotypes, infer local tree topologies (one tree for each SNP site, ignore branch length)

  • Hein (1990, 1993)

  • Enumerate all possible tree topologies at each site

    • Song and Hein (2003,2005)

    • Parsimony-based

  • Local tree reconstruction can be formulated as inference on a hidden Markov model.


  • Local tree topologies l.jpg
    Local Tree Topologies Recombination in a Population

    • Key technical difficulty

      • Brute-force enumeration of local tree topologies: not feasible when number of sequences > 9

    • Can not enumerate all tree topologies

    • Trivial solution: create a tree for a SNP containing the single split induced by the SNP.

      • Always correct (assume one mutation per site)

      • But not very informative: need more refined trees!

    A: 0

    B: 0

    C: 1

    D: 0

    E: 1

    F: 0

    G: 1

    H: 0

    A

    C

    B

    E

    D

    F

    G

    H


    How to do better neighboring local trees are similar l.jpg
    How to do better? Neighboring Local Trees are Recombination in a PopulationSimilar!

    • Nearby SNP sites provide hints!

      • Near-by local trees are often topologically similar

      • Recombination often only alters small parts of the trees

    • Key idea: reconstructing local trees by combining information from multiple nearby SNPs


    Rent refining neighboring trees l.jpg
    RENT: REfining Neighboring Trees Recombination in a Population

    • Maintain for each SNP site a (possibly non-binary) tree topology

      • Initialize to a tree containing the split induced by the SNP

    • Gradually refining trees by adding new splits to the trees

      • Splits found by a set of rules (later)

      • Splits added early may be more reliable

    • Stop when binary trees or enough information is recovered


    A little background compatibility l.jpg
    A Little Background: Compatibility Recombination in a Population

    1 2 3 4 5

    a

    b

    c

    d

    e

    f

    g

    0 0 0 1 0

    1 0 0 1 0

    0 0 1 0 0

    1 0 1 0 0

    0 1 1 0 0

    0 1 1 0 1

    0 0 1 0 1

    Sites 1 and 2 are compatible, but 1 and 3 are incompatible.

    M

    • Two sites (columns) p, q are incompatible if columns p,q contains all four ordered pairs (gametes): 00, 01, 10, 11. Otherwise, p and q are compatible.

      • Easily extended to splits.

      • A split s is incompatible with tree T if s is incompatible with any one split in T. Two trees are compatible if their splits are pairwise compatible.


    Fully compatible region simple case l.jpg
    Fully-Compatible Region: Simple Case Recombination in a Population

    • A region of consecutive SNP sites where these SNPs are pairwise compatible.

      • May indicate no topology-altering recombination occurred within the region

    • Rule: for site s, add any such split to tree at s.

      • Compatibility: very strong property and unlikely arise due to chance.


    Split propagation more general rule l.jpg
    Split Propagation: More General Rule Recombination in a Population

    • Three consecutive sites 1,2 and 3. Sites 1 and 2 are incompatible. Does site 3 matter for tree at site 1?

      • Trees at site 1 and 2 are different.

      • Suppose site 3 is compatible with sites 1 and 2. Then?

      • Site 3 may indicate a shared subtreein both trees at sites 1 and 2.

    • Rule: a split propagates to both directions until reaching a incompatible tree.


    Unique refinement l.jpg
    Unique Refinement Recombination in a Population

    • Consider the subtree with leaves 1,2 and 3.

      • Which refinement is more likely?

      • Add split of 1 and 2: the only split that is compatible with neighboring T2.

    • Rule: refine a non-binary node by the only compatible split with neighboring trees

    ?

    1

    3

    2


    One subtree prune regraft spr event l.jpg
    One Subtree-Prune-Regraft (SPR) Event Recombination in a Population

    • Recombination: simulated by SPR.

      • The rest of two trees (without pruned subtrees) remain the same

    • Rule: find identicalsubtree Ts in neighboring trees T1 and T2, s.t. the rest of T1 and T2 (Ts removed) are compatible. Then joint refine T1- Ts and T2- Ts before adding back Ts.

    Subtree to prune

    More complex rules possible.


    Simulation l.jpg
    Simulation Recombination in a Population

    • Hudson’s program MS (with known coalescent local tree topologies): 100 datasets for each settings.

      • Data much larger and perform better or similarly for small data than Song and Hein’s method.

    • Test local tree topology recovery scored by Song and Hein’s shared-split measure

     = 15

     = 50


    Acknowledgement l.jpg
    Acknowledgement Recombination in a Population

    • Software available upon request.

    • More information available at: http://www.engr.uconn.edu/~ywu

    • I want to thank

      • Yun S. Song

      • Dan Gusfield


    ad