ACCOUNT
|
|
Favorited Successfully!
Favorite Failed!
Already Added!
Cannot favorite your own presentation!
|
|
Please help keep this site FUN, CLEAR, and REAL..
Flag this presentation :
Feature This! -
Innappropiate
Please Login to flag this presentation!
Your inappropriate request is sent successfully!
Failed to send your inappropiate request!
Please login to send a feature request!
Your feature quest has been sent successfuly!
Error while send your feature request!
Slide 1:Phylogenetic Trees Presenter: Michael Tung
mtung@cs.stanford.edu
Slide 2:Overview Common definitions
Motivation
Phylogenetic Inference
Probabilistic model of evolution
The EM framework
Simultaneous Alignment and Phylogeny
Improvements to SATCHMO
Slide 3:Def: Phylogenetic Tree N current day (species || sequences || letters) form the leaves of a (binary) tree
N-1 internal nodes correspond to events of divergence (~ past-time species)
A phylogenetic tree (T,t) is parameterized by a topology T (simply the set of edges) and a vector t (edge lengths)
Slide 4:Motivation Lots of sequence data!
Cost of collecting additional data is decreasing
Little understanding of evolutionary divergence
Inferring the Tree of Life (species tree)
Slide 5:Probabilistic Model of Evolution Evolution of a single position
Standard Assumptions
Lack of Memory
Transitions can be described by a single matrix of conditionals
Reversibility
Assume a prior distribution over states
Slide 6:Probabilistic Interpretation What’s the probability of an observation given the tree?
Probability of a complete assignment of nodes:
But, we only see the leaves.
Slide 7:Probabilistic Interpretation Notice that the tree topology enforces local probabilistic influence
A set of sequences are composed of the individual letters
Problem: MSA required?
We can perform inference on the graphical model by dynamic programming
Slide 8:DP on trees We can exploit the tree structure to compute these probabilities in linear time
Slide 9:Maximum Likelihood We want the most likely tree (in the probabilistic sense)
The most likely tree is the one that maximizes the probability of your data occurring
Assumption: each observation is independently drawn from this distribution (true?)
Slide 10:Maximum Likelihood Now, simply find the parameters (T,t) that maximize this likelihood.
Well, not that simple.
Iterative algorithm: Expectation Maximization(EM)
Slide 11:EM Expectation-Maximization is a framework for optimizing a model.
E-step: estimate the posterior probability of the missing data using the current model
M-step: maximize the expected log-likelihood using the posterior probabilities
Slide 12:Expected Log-likelihood Rewrite the likelihood
Slide 13:Edge case:Transforming a tree into an equivalent bifurcating tree Maximum spanning tree could return non-phylogenetic trees
Simple transformation preserves likelihood
Slide 14:Avoiding local optima Greedy optimization (hill-climbing) doesn’t guarantee a global optima
One solution is Simulated Annealing
Temperature parameter
Perturbed edge weights W
Slide 15:Summary of the structural EM Start with a “good” tree topology (perhaps NJ)
Optimize edge lengths
Optimize Tree
Make T a binary tree
Perturb weights
Iterate
Slide 16:What's SATCHMO? SATCHMO = Simultaneous Alignment and Tree Construction using Hidden Markov mOdels!
{set of sequences}
to
{tree with MSA at each internal node}
Slide 17:
Slide 18:However... ...its too slow. SATCHMO is computationally prohibitive.
For ~200 seqs:
ClustalW takes around 3 minutes
SATCHMO takes 1 hour and 30 minutes
Solution:
Pre-cluster sequences to jumpstart the tree construction
Parallelization
Slide 19:Giving SATCHMO a jumpstart In a binary tree the work
increases exponentially
near the leaves.
If we can cut even a small
# of levels, we have done
almost all of the work.
How?
Use a computationally palatable cluster
Build down using BETE
Build up using SATCHMO
Slide 20:Parallelizing SATCHMO Idea: Let's utilize the whole cluster instead of just one CPU on one machine. Make interactive mode feasible.
Computation and data are distributed across machines. What is the parallelization architecture to optimize complexity/latency/caching behavior/bandwidth/paging/load-balancing ?
Parallel caveats:
Non-determinism
Fragility
Data type
Slide 21:Parallelizing the all-against-all: v1 An all-against-all
computation takes place
when we are computing
the initial distance matrix.
This is the most compute-
intensive portion of
SATCHMO.
Each MSA needs to be scored against each HMM. Distribute the HMMs and MSAs. Rotate the MSAs.
Slide 22:Parallelizing the all-against-all: v2 Implementing blocked communications...
Slide 23:Parallelizing the all-against-all: v3
Implementing Load-balancing..
Slide 24:Performance Results Combination of jumpstarting and parallelization can do 200 seqs in 29.90 seconds.
v3 performs the best (as expected)
More perf testing remains to be done
Perf testing being conducted on the
NERSC Seaborg.
Speedup plot!
Great
Business
Partnerships,
Partners,
Right
Business
Partner,
Creating
Partnerships
Automotive
Industry
Market
Research,
Marketing
Mark
Watch
Flipped
Online,
Flipped
Electronic
Cigarettes,
Make
Money
Editing
Skills,
Editor,
Sub
Editing,
Ways
To
Make
A
Good
Impression
Internet
ICT
Teaching
Puneuniversity
Sureshisave
BEd
MEd
NCTE
Education
Reusable
Dinnerware
Compost
Able
Products
Green
Non
Profit,
Organizations,
Immigration,
Socioeconomic,
Environmental
Small
Consulting
See More Tags
ACCOUNT
Copyright © 2006-2009 SlideServe. All rights reserved | 1160 Online Visitors
Powered By DigitalOfficePro

