Introduction to bioinformatics
This presentation is the property of its rightful owner.
Sponsored Links
1 / 31

Introduction to Bioinformatics PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on
  • Presentation posted in: General

Introduction to Bioinformatics. Tutorial 4 Multiple Alignment and Phylogeny. ClustalW Input. Alignment format. Fast alignment?. Fast alignment options. Scoring matrix. Gap scoring. Input sequences. Phylogenetic trees. ClustalW Output (1). Input sequences. Pairwise alignment scores.

Download Presentation

Introduction to Bioinformatics

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Introduction to bioinformatics

Introduction to Bioinformatics

Tutorial 4

Multiple Alignment

and

Phylogeny


Clustalw input

ClustalW Input

Alignment format

Fast alignment?

Fast alignment options

Scoring matrix

Gap scoring

Input sequences

Phylogenetic trees


Clustalw output 1

ClustalW Output (1)

Input sequences

Pairwise alignment scores

Building alignment

Final score


Clustalw output 2

ClustalW Output (2)

Sequence names

Sequence positions

Match strength in decreasing order: * : .


Phylogenetic trees

Distance representation

chimp

monkey

human

Phylogenetic Trees

  • Represent closeness between many entities

    • In our case, genomic or protein sequences

Unobserved commonality

Observed entity


Rooting trees

Rooting Trees

  • A tree can be hung from a root

    • Adds directional information

    • Requires addition of ‘outgroup’

So we hang the tree from where it joins

We know this is furthest

pig

monkey

human

chimp


Phylogeny and evolution

Speciation

Number of mutations

Phylogeny and Evolution

Common Ancestor

Evolutionary Time


Tree reconstruction

Tree Reconstruction

  • Build tree based on organism sequences

  • Distance-based methods

    • Use pairwise alignment scores to build tree

    • Ignores sequences after initial alignments

  • Character-based methods

    • Learn a tree with intermediate sequences that minimizes total number of mutations

    • Slower but generally better results


Distance based example 1

1

2

3

4

Distance-based Example (1)


Distance based example 2

Distance-based Example (2)

1

2

3

4


Distance based example 3

Distance-based Example (3)

3

4

1

2


Newick tree format

Newick Tree Format

(CFTR_SHEEP:0.01457,

(CFTR_HUMAN:0.16153,

(CFTR_MOUSE:0.70599,

(CFTR_RABIT:2.76042,

(CFTR_SQUAC:1.27192,

CFTR_XENLA:0.28818)

:3.42183)

:0.77076)

:0.65873)

:0.73937,

CFTR_BOVIN:0.00953);


Phylodendron input

Phylodendron Input

Graphical style

Newick tree description

Tree size

Orientation


Calculation of hiv siv neighbor joining tree

Calculation of HIV/SIV Neighbor-joining tree

Why phylogenetic analyses? Mutations accumulate in the genomes of pathogens, especially viruses, during a spread of an infection. This can be used to document the history of transmission events. Phylogenetic analysis of these mutations may not only be used to reconstruct the history of a pathogen's spread through host populations but can also be used to make predictions about it's future progress.

The unsolved HIV/SIV relationshipOne interesting case, where phylogenetic treebuilding is useful, is the unsolved HIV/SIV relationship: HIV-1, HIV-2 and SIV.AIDS (acquired immunodeficiency syndrome) is caused by two different human viruses:

HIV-1, group M and O

HIV-2, subtypes A to E

There are many related viruses in a variety of non-human primates. These related viruses are called SIV (simian immunodeficiency viruses).


Calculation of hiv siv neighbor joining tree1

Calculation of HIV/SIV Neighbor-joining tree

  • Phylogenetic studies have shown that primate lentiviruses are all in the same clade. Within this clade there are five major lineages (the subscripts denotes the host) :

    • HIV-1 and SIVCPZ (Chimpanzee)

    • HIV-2, SIVSM (Sooty mangabey) and SIVMAC (Captive macaque)

    • SIVAGM (African green monkey)

    • SIVMND (Mandrill)

    • SIVSYK (Sykes´ monkey)

The NJ tree in our example is based on the poly protein sequence from HIV-1, HIV-2 and SIV with HTLV-1 as an outgroup. HTLV-1 (human T-lymphotropic virus type 1) is another human retroviral pathogen that has originated from related simian viruses.


Calculation of hiv siv neighbor joining tree2

Calculation of HIV/SIV Neighbor-joining tree

  • Step by step summary:

  • Define all taxa and calculate all pairwise distances.

  • Pick two nodes in the star (i and j) for which the distance is minimal.

  • Define a new node (x) and calculate ri and rj.

  • Calculate dix and djx, thereby joining x to i and j respectively.

  • Remove i and j from the star and insert x instead.

  • Calculate dxm for all m in the star.

  • Continue until the star has been resolved and root the tree in a final step.


Step1

minimum

Step1


Step1 cont

Step1(cont.)

The calculation starts with the star:

The branch lengths between node 5 and 10 and between

6 and 10 are calculated with these formulas:

In this case L = 9 New node x = 10

ri=r5=Σd5k/(L-2) = 3.22406/(9-2) = 0.46058

rj=r6=Σd6k/(L-2) = 3.22758/(9-2) = 0.461083

dix=d510=(d5 6 + r5 - r6)/2 = (0.06088 + 0.46058 - 0.461083)/2 = 0.0301886

djx=d6 10 = d5 6 - d5 10 = 0.06088 - 0.0301886 = 0.0306914


Step1 cont1

Step1(cont.)


Step2

minimum

Step2


Step2 cont

Step2(cont.)

Calculation of the new branches: In this case L = 8 New node x = 11

ri=r3=Σd3k/(L-2) = 2.715455/(8-2) = 0.452576

rj= r4=Σd4k/(L-2) = 2.50096/(8-2)=0.416827

dix=d3 11=(d3 4 + r3 - r4)/2 = (0.125 + 0.452576 - 0.416827)/2 = 0.080375

djx=d4 11 = d3 4 - d3 11 = 0.125 - 0.080375 = 0.044625


Step1 cont2

Step1(cont.)


Step3

minimum

Step3


Step3 cont

Step3 (cont.)

Calculation of the new branches: In this case L = 7 New node x = 12ri=r2=Σd2k/(L-2) = 2.252265/(7-2) = 0.450453

rj=r11=Σd11k/(L-2) = 2.108208/(7-2)=0.4216415

dix=d212=(d211 + r2 - r11)/2 = (0.109705 + 0.450453 - 0.4216415)/2 = 0.069258

djx=d1112 = d211 - d212 = 0.109705 - 0.069258 = 0.040447


Step1 cont3

Step1(cont.)


Step 7

Step 7

In this case L = 3 New node x = 16:r13= 0.843684

r15=0.728574

d1316 = 0.131758

d1516 = 0.016648


Step 7 cont

Step 7 (cont.)

Because node 9 is the outgroup, the root will be placed between node 9 and the other nodes. The distance between node 9 and the first internal node is 0.563519.


Conclusions

This means that HIV-1 and HIV-2 have originated independently from two different SIV strains.

Conclusions

HIV-1 seems to be more closely related to SIV from chimpanzee.

There also seems to have been a cross-species transmission from human to MAN/MAC.

There must have been a cross-species transmission from chimpanzee SIV to human HIV-1.

HIV-2 (H2) is more closely related to SIV (S) from sooty mangabey than to HIV-1 (H1).


Conclusions1

Conclusions

As one can see the branch between the H2-ROD A and the to SIV taxa has a low support. Only 56% of the trees have this topology. Therefore the transmission events from human to non-human primates are very uncertain.


Exercise

Exercise

In this exercise you will perform a phylogenetic analysis of the human globin sequences. You will compare your results to current prevalent knowledge on the globin family, according to the following summary on the globin sequences:

Myoglobin and hemoglobins diverged from one another before the emergence of worms, about 800 million year ago.

The hemoglobins diverged into two families (the α-family and β-family) following a gene duplication, about 450 million years ago, which is before the emergence of mammals.

The α-family diverged into the zeta, teta and alpha genes, and the β-family diverged into the beta, gamma_G, gamma_A, delta and epsilon genes, all following a series of gene duplications.

The most recent duplication was that gamma_G from gamma_A, which occurred around the separation of the simians (humans, chimp, gorilla, etc.) from the pro-siminas (such as lemurs and lorises), about 55 million years ago.

(adapted from Graur and Li, 1999)


Exercise cont

Exercise (cont.)

  • Reconstruct the phylogenetic tree of the human globins using Neighbor joining. Make sure tree is properly rooted (by defining an outgroup) according to the information in the above summary. Point out where the hemoglobins and myoglobin diverged, and where the α-family and β-family diverged.

  • Which of the following groups are monophyletic according to the tree you obtained:

  • (i) alpha, beta, delta,

  • (ii) alpha, teta, zeta, (iii) epsilon, beta, delta

  • Bootstrap the tree you built with 1000 bootstrap iterations. Display the tree with the bootstrap values displayed. On which branch was the lowest bootstrap value obtained? Explain what this means.


  • Login