Generalized tree alignment the deferred path heuristic
Download
1 / 24

Generalized Tree Alignment: The Deferred Path Heuristic - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

Generalized Tree Alignment: The Deferred Path Heuristic. Stinus Lindgreen stinus@diku.dk. Overview: What is a phylogeny? The Generalized Tree Alignment problem Sequence Graphs and their algorithms The Deferred Path Heuristic. Phylogeny: Describes evolutionary model Common ancestor

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Generalized Tree Alignment: The Deferred Path Heuristic' - jude


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Generalized tree alignment the deferred path heuristic

Generalized Tree Alignment:The Deferred Path Heuristic

Stinus Lindgreen

stinus@diku.dk


Generalized tree alignment the deferred path heuristic

Overview:

What is a phylogeny?

The Generalized Tree Alignment problem

Sequence Graphs and their algorithms

The Deferred Path Heuristic


Generalized tree alignment the deferred path heuristic

Phylogeny:

Describes evolutionary model

  • Common ancestor

  • Mutations happen all the time

    • Insertions, deletions, substitutions, translocations, inversions, duplications …

      Most mutations happen in DNA replication

  • Corrected by cell mechanisms

    Mutations accumulate → new species diverge

    Only mutations in sex cells are inherited (obviously)


Generalized tree alignment the deferred path heuristic

Phylogeny:

Phylogenetic inference:

Given n sequences build a phylogenetic tree

Most methods base T on a multiple alignment

Likewise: Multiple alignments often based on guide trees

Can we solve both problems at the same time?


Generalized tree alignment the deferred path heuristic

Phylogeny:

Describes the evolutionary relationship between species

Notice root


Generalized tree alignment the deferred path heuristic

Phylogeny:

... or among a single taxon (here, human entovirus 71)


Generalized tree alignment the deferred path heuristic

The Problem:

Given n sequences s1,…,sn …

Multiple Alignment:

Make an ordering A of the sequences by inserting gaps such that homologous bases are put in the same column

Phylogenetic Inference:

Build a (binary) tree T with s1,…,sn in the leaves and possible ancestors sn+1,…,sn+k in internal nodes describing their evolutionary connection


Generalized tree alignment the deferred path heuristic

Generalized Tree Alignment:

Combines the two. The problem we want to solve is:

Given: A set of n sequences s1,…,sn from n different species (could be DNA, RNA or protein – for simplicity we focus on DNA)

Problem: Generate an unrooted phylogenetic tree T with sequences s1,…,sn in the leaves and a multiple alignment A of these sequences

Placing the root is not trivial and is best left to biologists.


Generalized tree alignment the deferred path heuristic

The given problem is proven to be MAXSNP-hard (Wang and Jiang, 1994)

→ Not possible to find an approximation algorithm.

Exact solutions to NP-hard problems are intractable

→ The best we can hope for is a heuristic

The given algorithm runs in time O(n2.ln)

  • n: The number of sequences

  • l: Their maximum length.


Generalized tree alignment the deferred path heuristic

Sequence graphs (Hein, 1989): Jiang, 1994)

Recall pairwise alignment.

Traceback ”spells” possible optimal alignments:


Generalized tree alignment the deferred path heuristic

Sequence graphs: Jiang, 1994)

Make graph with alignment columns as edge labels

→ represents all optimal alignments

We will get back to that shortly …

Right now, we want to represent sequences

Let us introduce sequence graphs.

For instance, s = ACTGTA is represented by:


Generalized tree alignment the deferred path heuristic

Sequence graphs: Jiang, 1994)

More formally:

  • Directed, acyclic graph.

  • Edge labels lfrom alphabet Σ. Here, Σ={A,C,G,T,-}

  • Source s: The unique node with no incoming edges

  • Sink t: The unique node with no outgoing edges.

  • Each path from s to t spells a sequence.


Generalized tree alignment the deferred path heuristic

Sequence graphs: Jiang, 1994)

Represents a set of sequences given by all paths from s to t:


Generalized tree alignment the deferred path heuristic

Sequence graphs: Jiang, 1994)

Any single sequence can be represented by a linear sequence graph

Any set of k sequences can be represented by making k paths from s to t

A given sequence s’ can be represented by more than one path

We can now represent sequences – but can we align them?


Generalized tree alignment the deferred path heuristic

Aligning sequence graphs: Jiang, 1994)

Dynamic programming algorithm inspired by basic

Pairwise Alignment:

  • Given two sequences p and q

  • Move one letter in p and move through q finding the optimal ”partial alignments”

    Sequence Graphs:

  • Given two sequence graphs G1 and G2

  • We can have many outgoing edges to choose from


Generalized tree alignment the deferred path heuristic

Aligning sequence graphs: Jiang, 1994)

Fill in a |V1|*|V2| score matrix

For each pair of nodes i from G1 and j from G2:

Should we:

  • Align the two characters we got by following e1 into i and e2 into j?

  • Stay in G1 and only move in G2?

  • Stay in G2 and only move in G1?

  • Or have we already found a better path into i and j?


Generalized tree alignment the deferred path heuristic

Optimal Alignment Graphs: Jiang, 1994)

Now we need a way to remember the optimal alignments

Recall graphs from before:

  • Directed, acyclic graphs

  • Nodes s and t defined as before

  • Edge labels of the form [la,lb] where la,lb∊Σ

    Backtrack through the matrix and consider each possible combination of edges.


Generalized tree alignment the deferred path heuristic

Optimal Alignment Graphs: Jiang, 1994)

An example of an OAG:

This one represents the alignments:

We denote such a graph A*

We have to convert the OAGs back to SGs


Generalized tree alignment the deferred path heuristic

Optimal Alignment Graphs: Jiang, 1994)

This is done easily by considering the edge labels:

If la= lb: Make a single edge in the SG with label la

If la≠lb: Make two edges in the SG: One with label la and one with label lb

The graph from before turns into the SG:


Generalized tree alignment the deferred path heuristic

Summing up Sequence Graphs: Jiang, 1994)

Final graph represents all sequences giving an optimal alignment between G1 and G2

We can:

  • Represent a set of sequences by a sequence graph

  • Align two such graphs producing a new SG

    We can now get on with the main algorithm


Generalized tree alignment the deferred path heuristic

The basic idea: Jiang, 1994)

  • Start by comparing all sequences

    • Find a closest pair.

  • Represent all sequences giving the optimal solution

    • Defer the choice of a single sequence

  • Repeat, but this time include the set of sequences

  • In the end: Choose a single sequence and backtrack

    This shows a need for:

  • A compact representation of many sequences

  • An algorithm for aligning sets of sequences


Generalized tree alignment the deferred path heuristic

The Deferred Path Heuristic: Jiang, 1994)

Similar to Kruskal’s algorithm for finding MSTs:

From sequences s1,…,sn,initialize n SGs G1,…,Gn.

Until only two SGs remain:

  • Align all pairs and choose a closest pair Gi and Gj

  • Create A*(Gi,Gj) and convert A* into a SG Gk.

  • Replace Gi and Gj with Gk

    Note that we remember all candidate sequences


Generalized tree alignment the deferred path heuristic

The Deferred Path Heuristic: Jiang, 1994)

When only two SGs Gi and Gj remain:

  • Align them and connect them in T

  • Choose some optimal alignment

    • This gives si and sj in the root of the two subtrees.

  • Backtrack through the subtrees

    • At each step: Align sk to the underlying SGs.

    • Choose some optimal alignment


Generalized tree alignment the deferred path heuristic

The Deferred Path Heuristic: Jiang, 1994)

We defer our choice of actual sequences until the last moment, thereby enlarging our solution space: