1 / 14

Introduction to Bioinformatics

Introduction to Bioinformatics. Phylogenetics Part II Distance-Based Methods. Distance Matrix . (Evolutionary) Distance Many possible measures Fraction of sites that differ between two sequences

axl
Download Presentation

Introduction to Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Bioinformatics Phylogenetics Part II Distance-Based Methods

  2. Distance Matrix • (Evolutionary) Distance • Many possible measures • Fraction of sites that differ between two sequences • # of changes needed to convert one sequence to another (count mismatches, substitution models, …) • Distance Matrix • Matrix of pairwise distances between all sequences • Used to generate tree • Varies with construction method, distance metric

  3. Distance in Phylogenetic Tree • Distances are ultrametric if • Same rate of change on all branches in tree (rare in practice) • All leaves equidistant from root • Also known as a “molecular clock” • Distance matrix must satisfy the following 3-point condition • For any three leaves i, j, k, distances dij, dik, djk • two of three distances are equal and ≥ third

  4. Distance in Phylogenetic Tree • Distances are additive if • Distance between any two leaves i & j on tree = sum of lengths of edges connecting i & j • Distance matrix must satisfy the following 4-point condition • For any four leaves i, j, k, m, two of the distances dij+dkm, dik+djm, dim+djk are equal and greater than the third • In fact, the difference is 2 x the length of the “bridge” edge(s)

  5. UPGMA • UPGMA (Unweighted Pair Group Method using Arithmetic Averages) [Sokal & Michener 1958] • Algorithm 1. Find pair of sequences (or clusters) A, B with smallest distance dAB 2. Insert join for A, B at tree height = ½ dAB. A and B thus form a new cluster. 3. Update distance of any other sequence/cluster X to new cluster as ½ (dAX + dBX) * 4. Repeat until all sequences / clusters joined 5. Produces rooted tree • Assumptions • Distances for tree are ultrametric • Branch lengths for 2 leaves same after join • Distances for tree are additive *: similar algorithms vary at this step

  6. UPGMA Example • Given sequences • Build distance matrix

  7. UPGMA Example • Form clusters • Next step?

  8. Transformed Distance Method • Weakness of UPGMA • Assume constant evolution rate across lineage • Example: Consider sequences A, B, C, and D is Figure 4.5. UPGMA cluster A and C first. • Transformed Distance Method [J. Farris, 1977] • Take advantage of the power of an outgroup • Similar to UPGMA except for the distance matrix • Algorithm • Select an outgroup D • Transformed distance between i and j: dij’ = (dij – diD – djD)/2 + (dkD)/n where n is #ingroups • Run UPGMA with matrix of dij’

  9. Transformed Distance Method • Example • Select D as the outgroup • Calculate transformed distance (dkD)/n = (dAD + dBD + dCD)/3 = (12 + 15 + 10)/3 = 37/3 dAB’ = (dAB – dAD – dBD)/2 + 37/3 = (9 – 12 – 15)/2 + 37/3 = 10/3 dAC’ = (dAC – dAD – dCD)/2 + 37/3 = (8 – 12 – 10)/2 + 37/3 = 16/3 dBC’ = (dBC – dBD – dCD)/2 + 37/3 = (11 – 15 – 10)/2 + 37/3 = 16/3 • Construct new distance matrix • Run UPGMA

  10. Transformed Distance Method • Example (cont’d) • How do you compute the length of a lineage?

  11. Neighbor-Joining Method • Goal • Join closest neighbors (nodes w / same parent) in tree • Avoids problem with UPGMA when rates of change differ • Example • Closest leaves not neighbors in correct tree, but joined first by UPGMA (see previous example) • Assumptions • Rate of change can differ • Branch lengths may differ after join • Branch lengths for tree are additive

  12. Neighbor-Joining Method • Approach • To find closest pair of neighbors • Reduce branch length for a node by (approximately) the average distance of the node from all other nodes • Find smallest distance between nodes (after reduction) • Definitions For all pairs of nodes A & B in set of all nodes L, let • dA,B = distance between A,B • RX =  dX,N where N  L (total distance from X to all N) • rX = RX / (n – 2),where n = # of nodes • (normalized divergence from X to all other nodes) • QA,B = (n – 2) dA,B – (RA + RB) (rate-corrected distance) • Key property - 2 nodes w/ minimum Q are always neighbors!

  13. Neighbor-Joining Method • Algorithm [Saitou & Nei 1987, Studier & Keppler 1988] 1. Begin with star tree & all sequences as nodes in L 2. Find pair of nodes A & B  L with minimum QA,B 3. Create & insert new join (node K) w/ branch lengths • dA,K = ½ (dA,B + rA – rB) • dB,K = ½ (dA,B + rB – rA) 4. For remaining nodes C  L, update distance to K as • dK,C = ½ (dA,C + dB,C – dA,B) 5. Insert K and remove A, B from L 6. Repeat steps 2-5 until only two nodes left K A B

  14. Neighbor-Joining Method • Example

More Related