1 / 21

Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony

Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony. Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor. References: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1.

rhoslyn
Download Presentation

Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comput. Genomics, Lecture 5bCharacter Based Methods for Reconstructing Phylogenetic Trees:Maximum Parsimony Based on presentations by Dan Geiger, Shlomo Moran, and Ido Wexler. Modified by Benny Chor. References: Durbin et al 7.4, Gusfield 17.1-17.3, Setubal&Meidanis 6.1 .

  2. Phylogenetic Trees - Reminder • Leaves represent objects (genes, species) being compared • Internal nodes are hypothetical ancestral objects • In a rooted tree, path from root to a node corresponds to a path in evolutionary time • An unrooted tree specifies relationships among objects, but not evolutionary time

  3. Parsimony Based Approch • Input: Character data (aligned sequences) • Goal/Output: A labeled tree (labeled internal • nodes) that “explains” the data with a minimal • number of changes across edges

  4. AAA AAA AAA AGA AAA AAA GGA AGA AAA GGA AAG AAA AGA AAG Parsimony: An Example • Various trees that could explain the phylogeny of the following • four sequences: AAG, AAA, GGA, AGA. For example, • Parsimony prefers the second tree to the first, because it requires less substitution events(three vs. four changes).

  5. Big and Small Parsimony • Usually the approaches to finding a maximum parsimony • tree have two separate components: • A search through the space of trees (BIG parsimony) • Given a specific tree topology, find an assignment of “ancestral labels” to internal nodes as to the minimize the total number of changes across tree edges (small parsimony)

  6. Formally: Big Parsimony • Input: Character data (aligned sequences) • Goal/Output: A labeled tree (labeled internal • nodes) that minimizes number of changes • across edges (over all trees and internal labelings).

  7. Formally: Small Parsimony • Input: Character data (aligned sequences) • and a tree with sequences at leaves. • Goal/Output: A labeling of internal nodes that • minimizes number of changes across edges • (over all internal labelings).

  8. Big, Small, and Weighted Parsimony • Small parsimonyhas a linear time solution (Fitch’ algorithm). • BIG parsimony is NP hard • (easy reduction from vertex cover, VC). • Weighted small parsimony also has a linear time solution (Sankoff’s algorithm, dynamic programming).

  9. Small Parsimony: Fitch’s Algorithm • Traverse tree “up”, from leaves to root, finding sets of possible ancestral states (labels) for each internal node. • Traverse tree “down”, from root to leaves, determining ancestral states (labels) for internal nodes. • Key observation: Different sites are independent. Can solve one site at a time.

  10. Fitch’s Algorithm – Step 1 • Do a post-order (from leaves to root) traversal of tree • Find out possible statesRiof internal node i with children j and k

  11. Fitch’s Algorithm – Step 1 • # of changes = # union operations T T AGT CT GT C G T T A T

  12. Fitch’s Algorithm – Step 2 • Do a pre-order (from root to leaves) traversal of tree • Select state rj of internal node j with parent i

  13. T T T T T T T T T T T T AGT AGT AGT AGT AGT AGT CT CT CT CT CT CT GT GT GT GT GT GT C C C C C C G G G G G G T T T T T T T T T T T T A A A A A A T T T T T T Fitch’s Algorithm – Step 2

  14. Weighted Version • Instead of assuming all state changes are unit cost • ( equally likely), use different costs S(a,b)for • different changes • 1st step of algorithm is to propagate costs up through tree

  15. Weighted Version of Fitch’s Algorithm • Want to determine min. cost Ri(a) • of assigning character a to node i • for leaves:

  16. Weighted Version of Fitch’s Algorithm • want to determine min. cost Ri(a) • of assigning character a to node i • for internal nodes: a i j k b

  17. Weighted Version of Fitch’s Algorithm – Step 2 • do a pre-order (from root to leaves) traversal of tree • select minimal cost character for root • For each internal node j, select character that produced minimal cost at parent i

  18. Big Parsimony: Exploring the Space of Trees • We’ve considered small parsimony: How to find the minimum number of changes for a given tree topology • To solve big parsimony, need some search procedure for exploring the space of tree topologies • There are unrooted trees on n leaves

  19. Exploring the Space of Trees taxa (n) # trees 4 15 5 105 6 945 8 135,135 10 30,405,375

  20. Does This Implies Big MP is Hard? taxa (n) # trees 4 15 5 105 6 945 8 135,135 10 30,405,375 Not necessarily: There could be some smarter way to zoom directly to best topology. But: We will show hardness of Big MP by a (simple) reduction from vertex cover (VC).

  21. Big MP is NP Hard ! First, define VC and VC for triangle free graphs. Then… • You will show a poly time reduction from VC to VC for triangle free graphs as part of home assignment(easy). • In class,I will show a poly time reduction from • VC for triangle free graphs to Big MP • (old style, white board proof). • This establishes NP hardness of Big MP.

More Related