1 / 9

# Phylogenetic Tree Construction - PowerPoint PPT Presentation

Phylogenetic Tree Construction. Mark Eldridge Andrew Larsen Michael Lollis Thomas Marley Michael Smith. Intro page (overview of talk):. Tom – Intro to the topic. Andrew -- Reading in objects from a FASTA file and MUSCLE compare. Mike S. -- Getting the Matrix

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Phylogenetic Tree Construction

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Phylogenetic Tree Construction

Mark Eldridge

Andrew Larsen

Michael Lollis

Thomas Marley

Michael Smith

### Intro page (overview of talk):

• Tom – Intro to the topic.

• Andrew -- Reading in objects from a FASTA file and MUSCLE compare.

• Mike S. -- Getting the Matrix

• Mark -- Determining the Matrix

• Mike L. -- Building the Tree

• Conclusion -- some examples of our program in action.

• Q & A

Turn this:

## We set out to...

Label

A

B

C

D

E

F

Sequence

GATTCCAG

GATTCTGG

GGTTCCGG

GGTTTCGG

GGCTCCGA

GGCCCCGG

into this:

### How?

UPGMA: Unweighted Pair Group Method with Arithmetic Mean

• Construct distance matrix (pairwise between groups)

• Merge two closest groups

• Repeat steps 1 and 2 until only two groups remain

• Note: distances for merged groups are calculated by taking the arithmetic mean of distances for all members

### FASTA file and MUSCLE compare

• Format,standards, and lots of data...

• We figured out how to read in "SeqIO objects"

• Now that we have the objects what do we do with them?

• MUSCLE power.

• So now what do we have?

• A pretty ideal way to access a semi-large dataset.

• We normalized the data for later functions and computing.

### Getting the Matrix

Have object with an ID to identify the gene, and the sequence

Muscle has already aligned the sequences to be the same length

Compare function does a character-to-character compare of similarities

Using NumPy, we create a matrix and filled the matrix with the first run of comparisons

It was then in a format for successive similarity calls

Recursive Function to Determine Next Matrix

A

A

B

B

C

C

D

D

E

E

Initial Formula

Weighted Formula

A

A

BDC

BDC

E

E

A

BD

C

E

A

-1

-1

-1

-1

-1

A

-1

-1

-1

-1

-1

A

-1

-1

-1

-1

B

4

-1

-1

-1

-1

B

4

-1

-1

-1

-1

A

A

-1

-1

-1

-1

-1

-1

BD

3

-1

-1

-1

C

4

3

-1

-1

-1

C

4

3

-1

-1

-1

BDC

3.5

-1

-1

BDC

3.33

-1

-1

C

4

2.5

-1

-1

D

2

1

2

-1

-1

D

2

1

2

-1

-1

E

E

3

3

4.25

4

-1

-1

E

3

3.5

5

-1

E

3

4

5

3

-1

E

3

4

5

3

-1

First Matrix

First List

0: ‘A’

1: ‘B’

2: ‘C’3: ‘D’4: ‘E’

Min = 1Min = (3, 1) -> (B, D)

For new matrix, append D onto B.

BD to A =

BD to C =

BD to E =

Min = 2.5Min = (2, 1) -> (BD, C)

Second Matrix

Second List

0: ‘A’

1: ‘(B, D)’

2: ‘C’3: ‘E’

What is Dendropy and why did we use it?

• Dendropy is a library of functions for python that allow the user to create phylogenetic tree structures and display them.

• Phylo vs. Dendropy

• Phylo was "too powerful" and didn't allow for much "under the hood" code.

• Dendropy provided more basic functionality.

How did we build the tree?

• Build upon a 'newick' formatted string each time Mark's algorithm recuresed.

• Draw an ASCII representation of the phylogenetic tree.