dna variation in ecology and evolution iv clustering methods and phylogenetic reconstruction
Download
Skip this Video
Download Presentation
DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction

Loading in 2 Seconds...

play fullscreen
1 / 20

DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction - PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on

DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction. Maria Eugenia D’Amato. BCB 705:Biodiversity. Organization of the presentation. Distance ML MP. Phylogenetic reconstruction Networks Multivariate analysis.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' DNA variation in Ecology and Evolution IV- Clustering methods and Phylogenetic reconstruction' - thaddeus-cortez


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
dna variation in ecology and evolution iv clustering methods and phylogenetic reconstruction

DNA variation in Ecology and EvolutionIV- Clustering methods and Phylogenetic reconstruction

Maria Eugenia D’Amato

BCB 705:Biodiversity

organization of the presentation
Organization of the presentation

Distance

ML

MP

  • Phylogenetic reconstruction
  • Networks
  • Multivariate analysis
characters independent homologous
Characters:independent homologous
  • Continuous
  • Discrete

Binary

Multistate

dna sequence characters
DNA sequence characters

Alignment = hypothesizing of a homology relationship for each site

Sequence comparison BLAST search - GenBank

Coding sequenceblastn

blastx

Non-coding DNA blastn

blast search results
Blast search results

Score E

Sequences producing significant alignments:(Bits) Value

gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5...101 3e-18

gi|343991|dbj|D10368.1|FRGMTURF2 Rana catesbeiana mitochondri...97.6 5e-17

gi|14209845|gb|AF314017.1|AF314017 Rana sylvatica NADH dehydr... 93.7 8e-16

The lower the E-value,

the better the alignment

GeneBank Accession numbers

for the sequence

Species that match the query

blast search results1
Blast search results

>gi|87299397|dbj|AB239568.1| Mantella baroni mitochondrial ND5, ND1, ND2 genes for NADH dehydrogenase

subunit 5, NADH dehydrogenase subunit 1, NADH dehydrogenase

subunit 2, complete cds

Length=10814

Score = 101 bits (51), Expect = 3e-18

Identities = 99/115 (86%), Gaps = 0/115 (0%)

Strand=Plus/Minus

Query 451 TTAGTTGAGGATTAAATTTTAGGATAATAACTATTCAGCCGAGGTGGCTGATGGAAGAAA 510

||||||||||||||||||||| ||||||| ||||||||| ||||| | |||||||| |

Sbjct 10203 TTAGTTGAGGATTAAATTTTAAAATAATAAGTATTCAGCCCAGGTGACCAATGGAAGAGA 10144

Query 511 AAGCTAAAATTTTACGTAGTTGTGTTTGGCTAATGCCGCCTCATCCGCCTACAAG 565

| |||| ||||||||||||||| |||||| |||| || ||||| || ||||||||

Sbjct 10143 AGGCTATAATTTTACGTAGTTGAGTTTGGTTAATACCCCCTCAACCTCCTACAAG 10089

Description of the genes contained in the sequence with this Accession number

Strands aligned

 5’end

alignment

phylogenetic reconstruction distance methods
Phylogenetic reconstructionDistance methods

C1 C2 C3 C4 C5 C6 C7

1

2

3

4

5

5 X 7

Distance criterion

5 x 5

Similarity / dissimilarity criterion

dendrogram

distances criterion for binary data
Distances criterion for binary data

a

a + b + c

a = bands common to a and b

b = bands exclusive to a

c = bands exclusive to b

J =

Jaccard’s distance

P1

(x2, y2)

Manhattan distance

M =

Euclidean distance

 (x1-x2) 2 + (x2-y2) 2

P2

(x1, y1)

distance criterion for dna data models of dna susbstitution
Distance criterion for DNA data-Models of DNA susbstitution

fAA fAC fAG fAT

fCA fCC fCG fCT

fGA fGC fGG fGT

fTA fTC fTG fTT

Fxy =

a b c d

e f g h

i j k l

m n o p

Fxy =

p = n of different nucleotides/ total n nucleotides

models of dna susbstitution
Models of DNA susbstitution

1

1-2P-Q

1 ln 1

4 1-2Q

1 ln

2

+

dxy =

D = 1 – ( a + f + k + p)

Equal rate

Jukes and Cantor

dxy = - ¾ ln (1- 4/3 D)

B = 1 – ( 2A + 2C + 2G + 2T)

F81

Unequal base freqs

dxy = - B ln (1- D/B)

P = c + h + i + nTransitions

Q = b + d + e + g + j + l + m + oTransversions

K2P

distances criterion for diploid data
Distances criterion for diploid data

I

Nei 1972

Jx = xi2

Jx = yi2

Jxy = xiyi

Dn -ln Jxiyi

 JxiJyi

=

Cavalli Sforza 1967

Darc =  (1/L)  (2/)2

 = cos-1xiyi

phylogenetic reconstruction criterion for distance data
Phylogenetic reconstruction criterion for distance data

Ultrametric tree (UPGMA)

Additive tree (NJ)

A

C

A

V1

V1

V4

B

V3

V3

V2

V2

V5

D

V4

C

B

Properties

Properties

dAB = v1 + v2

dAC = v1 + v3 + v4

dAD = v1 + v3 + v5

dBC = v2 + v3 + v5

dCD = v4 + v5

dAB = v1 + v2 + v3

dAC = v1 + v2+ v4

dBC = v3 + v4

v3 = v4

v1 = v2 + v3 = v2 = v4

maximum likelihood
Maximum Likelihood

3

1

2

4

1

2

3

4

C

C

C

A

A

A

G

G

G

C

C

C

5

+ Prob…….

A

+ Prob

Lj = Prob

C

A

A

6

LD = Pr (DH)

Tree after rooting at an internal node

Unrooted tree

1 J n

  • C….GGACACGTTTA….C
  • C….AGACACCTCTA….C
  • C….GGATAAGTTAA….C
  • C….GGATAGCCTAG….C

L = L1 x L2 x L3…x LN. =  Lj

LnL = ln L1+ ln L2 + …. LN = ln Lj

hypothesis testing likelihood ratio test
Hypothesis testingLikelihood ratio test

Rate variation

= log L1 – log L0

Appropriate substitution Model

22 distribution

d.f. = N sequences in the tree –2; or

d.f = difference number of parameters H1 and H0

maximum parsimony
Maximum Parsimony

Minimize tree length

To obtain rooted trees (and character polarity) use an outgroup . The ingroup is monophyletic.

Tree (first site)

5 changes

1 change

G

A

  • ATATT
  • ATCGT
  • GCAGT
  • GCCGT

A

G

3

1

A

G

G

A

G

A

2

A

4

G

maximum parsimony example
Maximum Parsimony-example

Site 2

Site 3

T

C

A

A

A

A

C

T

A

A

C

C

T

C

C

C

C

C

Site 5 No changes

Site 4

Tree length

T

G

T

T

L = ki=1li

T

T

G

G

T

G

T

G

maximum parsimony example1
Maximum parsimony:example

Sites

1 2 3 4 5 Total

Tree

((1,2),(3,4)) 1 1 2 1 0 5

((1,3),(2,4)) 2 2 1 1 0 6

((1,4),(2,3)) 2 2 2 1 0 7

Phylogenetically informative sites

networks
Networks
  • Phylogenetic representation allowing reticulation
  • More appropriate for intraespecific data
  • Ancestor is alive
  • hybridization, recombination, horizontal transfer, polyploidization

agct

1

acat

agct

ac

ct

2

3

4

5

7

6

acat

acct

agct

multivariate clustering
Multivariate clustering

C1 C2 C3 C4 C5 C6 C7

1

2

3

4

5

5 X 7

Y

2nd axis

similarity criterion

correlations

Z 3rd axis

7 x 7

X 1st axis

Calculate eigenvectors with highest eigenvalues

Project data onto new axes (eigenvectors)

ad