1 / 9

# Phylogenetic Tree Construction - PowerPoint PPT Presentation

Phylogenetic Tree Construction. Mark Eldridge Andrew Larsen Michael Lollis Thomas Marley Michael Smith. Intro page (overview of talk):. Tom – Intro to the topic. Andrew -- Reading in objects from a FASTA file and MUSCLE compare. Mike S. -- Getting the Matrix

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Phylogenetic Tree Construction' - dreama

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Phylogenetic Tree Construction

Mark Eldridge

Andrew Larsen

Michael Lollis

Thomas Marley

Michael Smith

• Tom – Intro to the topic.

• Andrew -- Reading in objects from a FASTA file and MUSCLE compare.

• Mike S. -- Getting the Matrix

• Mark -- Determining the Matrix

• Mike L. -- Building the Tree

• Conclusion -- some examples of our program in action.

• Q & A

### We set out to...

Label

A

B

C

D

E

F

Sequence

GATTCCAG

GATTCTGG

GGTTCCGG

GGTTTCGG

GGCTCCGA

GGCCCCGG

into this:

UPGMA: Unweighted Pair Group Method with Arithmetic Mean

• Construct distance matrix (pairwise between groups)

• Merge two closest groups

• Repeat steps 1 and 2 until only two groups remain

• Note: distances for merged groups are calculated by taking the arithmetic mean of distances for all members

• Format,standards, and lots of data...

• We figured out how to read in "SeqIO objects"

• Now that we have the objects what do we do with them?

• MUSCLE power.

• So now what do we have?

• A pretty ideal way to access a semi-large dataset.

• We normalized the data for later functions and computing.

Have object with an ID to identify the gene, and the sequence

Muscle has already aligned the sequences to be the same length

Compare function does a character-to-character compare of similarities

Using NumPy, we create a matrix and filled the matrix with the first run of comparisons

It was then in a format for successive similarity calls

A

A

B

B

C

C

D

D

E

E

Initial Formula

Weighted Formula

A

A

BDC

BDC

E

E

A

BD

C

E

A

-1

-1

-1

-1

-1

A

-1

-1

-1

-1

-1

A

-1

-1

-1

-1

B

4

-1

-1

-1

-1

B

4

-1

-1

-1

-1

A

A

-1

-1

-1

-1

-1

-1

BD

3

-1

-1

-1

C

4

3

-1

-1

-1

C

4

3

-1

-1

-1

BDC

3.5

-1

-1

BDC

3.33

-1

-1

C

4

2.5

-1

-1

D

2

1

2

-1

-1

D

2

1

2

-1

-1

E

E

3

3

4.25

4

-1

-1

E

3

3.5

5

-1

E

3

4

5

3

-1

E

3

4

5

3

-1

First Matrix

First List

0: ‘A’

1: ‘B’

2: ‘C’3: ‘D’4: ‘E’

Min = 1Min = (3, 1) -> (B, D)

For new matrix, append D onto B.

BD to A =

BD to C =

BD to E =

Min = 2.5Min = (2, 1) -> (BD, C)

Second Matrix

Second List

0: ‘A’

1: ‘(B, D)’

2: ‘C’3: ‘E’

• Dendropy is a library of functions for python that allow the user to create phylogenetic tree structures and display them.

• Phylo vs. Dendropy

• Phylo was "too powerful" and didn't allow for much "under the hood" code.

• Dendropy provided more basic functionality.

How did we build the tree?

• Build upon a 'newick' formatted string each time Mark's algorithm recuresed.

• Draw an ASCII representation of the phylogenetic tree.