inferring phylogenetic trees n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Inferring phylogenetic trees PowerPoint Presentation
Download Presentation
Inferring phylogenetic trees

Loading in 2 Seconds...

play fullscreen
1 / 23

Inferring phylogenetic trees - PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on

Inferring phylogenetic trees. Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington thabangh@gmail.com. One-minute responses. I did not understand anything in the Gibbs sampling and the second method.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Inferring phylogenetic trees' - zayit


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
inferring phylogenetic trees

Inferring phylogenetic trees

Prof. William Stafford Noble

Department of Genome SciencesDepartment of Computer Science and Engineering

University of Washington

thabangh@gmail.com

one minute responses
One-minute responses
  • I did not understand anything in the Gibbs sampling and the second method.
  • The class was quite OK now. Understood most important things.
  • I understood 50% of the Python part. But I am a bit confused about the goal of the programs.
  • Please send us the slides immediately after lecture.
    • I put the slides on the website during the Python half of the class. Hit “refresh” on the web browser to see them.
  • I didn’t understand clearly converting scores to p-values, more especially putting 1 and 2. Otherwise everything was clear.
  • I think we should go a little bit slower.
  • I didn’t understand the EM and Gibbs.
  • The concept of EM and Gibbs sampling are really very important. Please go in depth on them.
  • Python sessions are still fine as usual.
  • These algorithms are complex. Could you please explain them with a bit of some examples?
  • I didn’t understand the second Python problem.
  • Emile must not mark our assessment on the programming part.
revision gibbs
Revision - Gibbs

Randomly select

sequences

Motif occurrences

Scan discarded sequence with PSSM

Choose new occurrence according to resulting probabilities

  • Randomly discard one sequence
  • Build PSSM from remaining sequences
    • Counts
    • Add pseudocounts
    • Normalize

PSSM

revision em
Revision - EM

Randomly select

sequences

Motif occurrences

Scan each sequence with PSSM

Take top-scoring occurrence

Counts

Add pseudocounts

Normalize

Divide by background

Take log2

PSSM

phylogenetic inference
Phylogenetic inference

Rabbit

Dove

Lion

Donkey

?

outline
Outline
  • Parsimony
  • Distance methods
    • Computing distances
    • Finding the tree
  • Maximum likelihood
selecting a method
Selecting a method

Choose

set of

related

sequences

Obtain

multiple

sequence

alignment

Is there

strong

sequence

similarity?

Yes

Maximum

parsimony

methods

No

Is there clearly

recognizable

sequence

similarity

Yes

Distance

methods

No

Maximum

likelihood

methods

maximum parsimony
Maximum parsimony

Enumerating these trees can take a very long time

for each possible tree

compute the parsimony score

return the tree with the best score

Computing this score is straightforward

how many trees
How many trees?
  • With four sequences: 3 unrooted trees
  • With five sequences: 15 unrooted trees.
  • With seven sequences: 954 unrooted trees.

1

3

1

2

1

3

2

3

4

4

4

2

computing parsimony scores
Computing parsimony scores

Scer = A

Smik = A

Spar = G

Skud = A

computing parsimony scores1
Computing parsimony scores

Scer = A

Smik = A

A

A

Spar = G

Skud = A

Score = 1

computing parsimony scores2
Computing parsimony scores

Scer = A

Scer = A

Smik = A

Smik = A

A

A

Scer = A

A

A

Spar = G

A

A

Spar = G

Skud = A

Skud = A

Spar = G

Score = 1

Score = 1

Smik = A

Skud = A

Score = 1

This site is uninformative, because all the trees have the same score.

computing parsimony scores3
Computing parsimony scores

Scer =

Scer =

Smik =

Smik =

Scer =

Spar =

Spar =

Skud =

Skud =

Spar =

Score = ?

Score = ?

Smik =

Skud =

Score = ?

computing parsimony scores4
Computing parsimony scores

Scer = G

Scer = G

Smik = A

Smik = A

G

A

Scer = G

G

G

Spar = G

G

G

Spar = G

Skud = T

Skud = T

Spar = G

Score = 2

Score = 2

Smik = A

Skud = T

Score = 2

computing parsimony scores5
Computing parsimony scores

Scer =

Scer =

Smik =

Smik =

Scer =

Spar =

Spar =

Skud =

Skud =

Spar =

Score = ?

Score = ?

Smik =

Skud =

Score = ?

computing parsimony scores6
Computing parsimony scores

Scer = A

Scer = A

Smik = T

Smik = T

A

T

Scer = A

A

A

Spar = A

A

A

Spar = A

Skud = T

Skud = T

Spar = A

Score = 2

Score = 1

Smik = T

Skud = T

Score = 2

This tree is best.

computing parsimony scores7
Computing parsimony scores

Scer

Smik

Total = 26

Spar

Skud

computing parsimony scores8
Computing parsimony scores

Total = 28

Scer

Spar

Smik

Skud

parsimony software
Parsimony software
  • In general, the most widely used programs for phylogenetic analysis are
    • Phylip (Joe Felsenstein)
    • PAUP (Jim Swofford)
    • MacClade (David and Wayne Maddison)
  • All three do parsimony. Only Phylip is free.
previous one minute responses
Previous one-minute responses
  • How many sequences are usually analyzed by parsimony methods?
    • Exhaustively, probably tens of sequences. With heuristic search methods, you can analyze arbitrarily many, but you lose the guarantee that you’re finding the most parsimonious tree.
  • What do good parsimony scores look like?
    • It depends upon how many sequences are involved, and how divergent they are.
  • Why doesn’t the parsimony method take into account transitions versus transversions?
    • It can; I presented the simplest version.
jukes cantor model
Jukes-Cantor model
  • Assume the same probability of change at all positions and all times.
  • dAB is the proportion of changed sites in the alignment.
  • KAB is the distance between sequences A and B.
problem 1
Problem #1
  • Write a program jukes-cantor.py that takes as input a pairwise sequence alignment and prints the Jukes-Cantor distance. Skip sites that contain gaps.

> cat twoseqs.txt

ACGT

ACCG

> python jukes-cantor.pytwoseqs.txt

0.823959

problem 2
Problem #2
  • Generalize your previous program to work for a multiple sequence alignment.

> cat threeseqs.txt

ACGT

ACTG

ACGG

> python jukes-cantor-matrix.pythreeseqs.txt

0.000 0.824 0.304

0.824 0.000 0.304

0.304 0.304 0.000

> jukes-cantor-multiple.pymoreseqs.txt

0.000 0.233 0.383 0.233

0.233 0.000 0.824 0.572

0.383 0.824 0.000 0.107

0.233 0.572 0.107 0.000