Multiple alignments and multivariate analysis
Download
1 / 69

multiple alignments and multivariate analysis - PowerPoint PPT Presentation


  • 150 Views
  • Updated On :

Multiple Alignments and Multivariate Analysis. Clustal: 1988-2006. Multiple Alignments. Human beta --------VHLT PEEKSAVTALWGKV N–- VDEVGGEALGRLLVV YP WTQR FFESFGDLST Horse beta --------VQLS GEEKAAVLALWDKV N–- EEEVGGEALGRLLVV YP WTQR FFDSFGDLSN

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'multiple alignments and multivariate analysis' - Mia_John


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Slide2 l.jpg

Multiple Alignments

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Phylogenetic Analysis Secondary Str. Prediction

Homology Detection Profile Analysis

Homology Modeling


Slide3 l.jpg

VHLTPEEKSAVTALWGKVN--VDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNP

-VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-----HGSA

* * * * * **** * * *** * * * * * *** *

KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF

QVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHL

** ***** * ** * ** ** ** *** ** ** * ** *

GKEFTPPVQAAYQKVVAGVANALAHKYH

PAEFTPAVHASLDKFLASVSTVLTSKYR

**** * * * * * * **

  • Dynamic Programming

    • Needleman and Wunsch, 1970

    • O(L2) algorithm

  • Maximise score (or minimise distance)

    • Gap penalties

    • Amino acid weight matrix


Slide4 l.jpg

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Weighted Sums of Pairs: WSP

Time O(LN)


Slide5 l.jpg

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Weighted Sums of Pairs: WSP

Sequences Time

Time O(LN)

2 1 second

3 150 seconds

4 6.25 hours

5 39 days

6 16 years

7 2404 years


Slide6 l.jpg

Horse beta

Human beta

Horse alpha

Human alpha

Whale myoglobin

Lamprey cyanohaemoglobin

Lupin leghaemoglobin

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

  • Progressive Alignment:

    • Feng and Doolittle, 1987

    • Barton and Sternberg, 1987

    • Willie Taylor, 1987, 1988

    • Hogeweg and Hesper, 1984


Slide7 l.jpg

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Horse beta

Human beta

Horse alpha

Human alpha

Whale myoglobin

Lamprey cyanohaemoglobin

Lupin leghaemoglobin


Slide8 l.jpg

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Horse beta

Human beta

Horse alpha

Human alpha

Whale myoglobin

Lamprey cyanohaemoglobin

Lupin leghaemoglobin


Slide9 l.jpg

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Horse beta

Human beta

Horse alpha

Human alpha

Whale myoglobin

Lamprey cyanohaemoglobin

Lupin leghaemoglobin


Clustal l.jpg
Clustal

  • 35000 citations

  • Clustal1-Clustal4 1988

    • Paul Sharp, Dublin

  • Clustal V 1992

    • EMBL Heidelberg,

      • Rainer Fuchs

      • Alan Bleasby

  • Clustal W 1994-2006, Clustal X 1997-2006

    • Toby Gibson, EMBL, Heidelberg

    • Julie Thompson, ICGEB, Strasbourg

  • Clustal W and Clustal X 2.0 early 2007

    • University College Dublin


Since 1994 l.jpg
Since 1994?

Benchmarks

Protein structure alignments and superpositions

  • Barton and Sternberg; Fitch and McLure

  • Dali

  • BaliBase

  • Homstrad

  • Oxbench

  • Prefab etc. etc.

  • Protein structure analysis

    • APDBO'Sullivan O, Zehnder M, Higgins D, Bucher P, Grosdidier A, Notredame C. (2003) APDB: a novel measure for benchmarking sequence alignment methods without reference alignments. Bioinformatics. 2003;19 Suppl 1:i215-21.

  • RNA alignments

    • Bralibase (Gardner PP, Wilm A & Washietl S (2005) NAR.)


Which method is best l.jpg
Which Method is Best?

  • Clustal W????

  • MSA (Lipman, Altschul, Kececioglu)

    • DCA (Stoye), PRRP (Gotoh) , SAGA (Notredame)

  • T-Coffee (Notredame)

    • 3-D Coffee M-Coffee

  • MAFFT (Katoh) and MUSCLE (Edgar)

  • Probcons (Do, Brudno, Batzoglu)

For Global Protein alignments!!!


Clustal w and x 2 0 l.jpg
Clustal W and X 2.0?

  • Jan 2007

  • Re-engineered in C++

  • Aim to increase accuracy

    • Iteration (Wallace, I. M., O'Sullivan, O. and Higgins, D. G., 2005 Evaluation of iterative alignment algorithms for multiple alignment. Bioinformatics 21:1408.)

  • Reduce run times



Slide16 l.jpg

ADE-4

http://pbil.univ-lyon1.fr/ADE-4/

Thioulouse J., Chessel D., Dolédec S., & Olivier J.M. (1997) ADE-4: a multivariate analysis and graphical display software. Statistics and Computing, 7, 1, 75-83.


Slide17 l.jpg

Between Group Analysis BGA

Dolédec, S. & Chessel, D. (1987) Acta Oecologica, Oecologica Generalis, 8, 3, 403-426.Supervised Correspondence Analysis or PCA

  • MADE4

    • Culhane, A., Thiolouse, J., Perriere, G., Higgins, D.G. (2005) MADE4: an R package for multivariate analysis of gene expression data. Bioinformatics. 21(11):2789-2790.

  • CO-Inertia Analysis CIA

  • Dolédec, S. & Chessel, D. (1994) Freshwater Biology, 31, 277-294.

    • Thioulouse, J. & Lobry, J.R. (1995) CABIOS, 11, 321-329

    • 2 datasets; Simultaneous CA or PCA


Use ca pca for sequences l.jpg
Use CA, PCA for Sequences?

PCOORD on sequence distances:

Higgins, D.G. (1992) Sequence ordinations: a multivariate analysis approach to analysing large sequence data sets. CABIOS, 8, 15-22.

PCA on dipeptide composition:

Van Heel, M. (1991)A new family of powerful multivariate statistical sequence analysis techniques.J. Mol Biol.220(4): 877-887.

PCA on alignment columns:

Casari G, Sander C, Valencia A. (1995)A method to predict functional residues in proteins.Nat Struct Biol. 2(2):171-8.


Supervised pca or ca l.jpg
Supervised PCA or CA?

Malate Dehydrogenases

Lactate Dehydrogenases


Slide20 l.jpg

Between Group Analysis

samples

genes

GSVD

N


Slide21 l.jpg

15 Chymotrypsins

Trypsin-like serine proteases

10 Elastases

31 Trypsins




Bga with ca or pca l.jpg
BGA With CA or PCA?

  • CA:

    • Pretty pictures

    • Sequences/residues plots

    • Finds any clear/simple patterns

      • Binary aa variables

  • PCA:

    • Use continuous variables

      • e.g. aa properties: size, charge, hydrophobicity etc.


Slide26 l.jpg

15 Chymotrypsins

31 Trypsins

Sequences

BGA with PCA

using

5 amino acid properties (A-E)

10 Elastases

Residue weights


Bga on alignments l.jpg
BGA on Alignments

  • Focus on any split in the data

  • Binary or Property coding

    • CA or PCA

  • Sequence Weighting

  • Pseudocounts


Slide29 l.jpg

Iteration

Benchmarking

Clustal W 2.0

Gordon Blackshields

Mark Larkin

Paul McGettigan

Iain Wallace

Clustal

Toby Gibson, EMBL

Julie Thompson, ICGEB, Strasbourg

BGA, CIA, MADE4

Aedín Culhane

Guy Perriere

Jean Thiolouse

Ian Jeffery

Ailís Fagan


Slide31 l.jpg

SeqA GARFIELD THE LAST FAT CAT

SeqB GARFIELD THE FAST CAT

SeqC GARFIELD THE VERY FAST CAT

SeqD THE FAT CAT

SeqA GARFIELD THE LAST FA-T CATSeqB GARFIELD THE FAST CA-T ---SeqC GARFIELD THE VERY FAST CATSeqD -------- THE ---- FA-T CAT


Slide32 l.jpg

Weighted Sums of Pairs

MSA Branch and Bound Lipman, Altschul and Kececioglu, 1989

FastMSA Tweaked MSA Gupta, Kececioglu and Schaeffer, 1995

DCA Divide and Conquer Stoye, Moulton and Dress, 1997

SAGA Genetic Algorithm Notredame and Higgins, 1996

PRRP Iteration Gotoh, 1996


Slide33 l.jpg

Genetic Algorithm

Selection (WSP)

MutationRecombination (cross-overs)


Slide34 l.jpg

Genetic Algorithm

Selection (WSP)

MutationRecombination (cross-overs)


Slide35 l.jpg

Genetic Algorithm

Selection (WSP)

MutationRecombination (cross-overs)


Slide36 l.jpg
SAGA

  • Cedric Notredame

  • Sequence Alignment by Genetic Algorithm

  • Optimise any objective function

  • Notredame, C. and Higgins, D.G. (1996)SAGA: Sequence alignment by genetic algorithm. Nucleic Acids Research, 24:1515-1524.




Which method is best39 l.jpg
Which method is best?

  • Best score?

  • Empirical tests?

    • Sets of test cases

      • Fitch and McLure

      • BaliBase

      • Homstrad

      • Oxbench

      • Prefab etc. etc.

    • APDBO'Sullivan O, Zehnder M, Higgins D, Bucher P, Grosdidier A, Notredame C. (2003) APDB: a novel measure for benchmarking sequence alignment methods without reference alignments. Bioinformatics. 2003;19 Suppl 1:i215-21.


Coffee l.jpg
COFFEE

  • Consistency based Objective Function For Evaluation of Ehhhh things

  • Maximum Weight Trace (John Kececioglu)

  • Maximise similarity to a LIBRARY of residue pairs

  • Notredame, C., Holm, L. and Higgins, D.G. (1998) COFFEE: An objective function for multiple sequence alignments. Bioinformatics 14: 407-422.


Slide41 l.jpg

e.g.

Seq N, Residue I

Seq M, Residue J

Weight = w

Pairs of Residues

Human beta VHLTPEEKSAVTALWGKVN–-VDEVGGEALHorse beta VQLSGEEKAAVLALWDKVN–-EEEVGGEALHuman alpha –VLSPADKTNVKAAWGKVGAHAGEYGAEALHorse alpha –VLSAADKTNVKAAWSKVGGHAGEYGAEAL




T coffee l.jpg
T-Coffee

  • Heuristic approximation to COFFEE

    • Uses progressive alignment (Trees)

  • Heterogenous data

    • Sequences

    • Structures

    • Genomes

    • ESTs

  • Notredame, C, Higgins, DG and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J.Mol.Biol., 302: 205-217.


T coffee45 l.jpg
T-Coffee

  • Mixed data sources

    • Primary library from

      • Lalign (SIM):

        • 10 best local alignments

      • Clustalw

        • All pairwise alignments

      • SAP (Willie Taylor, Structure Superposition)

      • Multiple alignments

  • Check library for CONSISTENCY

    • Upweight pairs of residues that agree with other pairs

Default


Slide46 l.jpg

Mixing Heterogenous Information

Local Alignment

Global Alignment

Multiple Alignment

Specialist

Structural

T-Coffee

Multiple Sequence Alignment

Copyright Cédric Notredame, 2000, all rights reserved


Slide47 l.jpg

Mixing Heterogenous Information

e.g. SAP

Taylor and Orengo

Structure Superposition

Weighted Residue Pairs

Copyright Cédric Notredame, 2000, all rights reserved



Including structures in an alignment l.jpg

66.49

80

60

38.39

35.24

%accuracy

40

20

0

clustalw

T_Coffee Default

T_Coffee plus all

structures

Including Structures in an Alignment

3D-Coffee

O’Sullivan, O., Suhre, K., Abergel, C., Higgins, DG and Notredame, C

(2004) J.Mol.Biol.


Recent developments l.jpg
Recent Developments

  • 20-30 new programs in past 2 years

  • MUSCLE

    • Bob Edgar, ISMB, 2004

    • Iteration/progressive alignment

      • FAST

      • Big Alignments

  • PROBCONS

    • Tom Do, Michael Brudno, Serafim Batzoglou

    • ISMB 2004

    • “P-Coffee”

      • VERY accurate


Iteration revisited l.jpg
Iteration Revisited

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE


Iteration revisited52 l.jpg
Iteration Revisited

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE


Iteration revisited53 l.jpg
Iteration Revisited

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-


Iteration revisited54 l.jpg
Iteration Revisited

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-


Slide55 l.jpg

Iterate

Iterate

Iterate

Remove EACH Sequence RF

Remove BEST Sequence RB

Random Random

Tree based Tree


Iteration on homstrad 184 l.jpg
Iteration on HomStrad 184

Wallace, O’Sullivan and Higgins, 2004, Bioinformatics, 21:1408


Slide57 l.jpg

Combining Multiple Alignment Methods

Clustal W

T-Coffee

Probcons

Specialist

Muscle

T-Coffee

Multiple Sequence Alignment

Copyright Cédric Notredame, 2000, all rights reserved


The wisdom of crowds james surowiecki l.jpg
The Wisdom of CrowdsJames Surowiecki

Crowds are surprisingly good at accurate decisions

Better than “experts”

Only if they do not form a “mob”



Slide62 l.jpg

Iteration

Benchmarking

Clustal W 2.0

Gordon Blackshields

Mark Larkin

Paul McGettigan

Iain Wallace

Clustal

Toby Gibson, EMBL

Julie Thompson, ICGEB, Strasbourg

BGA, CIA, MADE4

Aedín Culhane

Guy Perriere

Jean Thiolouse

Ian Jeffery

Ailís Fagan


Slide63 l.jpg

BaliBASE

Thompson, JD, Plewniak, F. and Poch, O. (1999)NAR and Bioinformatics

  • ICGEB Strasbourg

  • 141 manual alignments using structures

    • 5sections

    • core alignment regions marked

3. Two groups (12)

1. Equidistant(82)

4. Long internal gaps(13)

5. Long terminal gaps(11)

2. Orphan(23)


Compare methods l.jpg
Compare Methods

  • Sam HMMHughey and Krogh, 1996

  • Dialign Local multiple alignmentsMorgenstern, 1999

  • ClustalW Progressive alignmentThompson, Higgins and Gibson, 1994

  • Prrp Iterative WSPGotoh, 1996

  • T-Coffee Pairwise libraryNotredame, Higgins and Heringa, 2000


Slide65 l.jpg

% alignment columns correct

Core alignment blocks only


Slide66 l.jpg

% alignment columns correct

Core alignment blocks only


Clustal67 l.jpg
Clustal

  • Clustal, Clustal1-4 TCD

    • Higgins DG, Sharp PM. (1988)CLUSTAL: a package for performingmultiple sequence alignment on a microcomputer.

      Gene. 73(1):237-44.

    • Higgins DG, Sharp PM. (1989)Fast and sensitive multiple sequence alignments on a microcomputer.

      Comput Appl Biosci. 5(2):151-3. 

  • ClustalV Heidelberg

    • Higgins DG, Bleasby AJ, Fuchs R. (1992)CLUSTAL V: improved software for multiple sequence alignment.

      Comput Appl Biosci. 8(2):189-91.

  • ClustalW Hinxton

    • Thompson JD, Higgins DG, Gibson TJ. (1994)CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22):4673-80.

  • ClustalX UCC

    • Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. (1997)TheCLUSTAL_X windows interface: flexible strategies for multiple sequence

      alignmentaidedby quality analysis tools.

      Nucleic Acids Res. 25(24):4876-82.


Clustal re engineering in c l.jpg
Clustal re-engineering in C++

  • Problems:

    • Code has become very complex.

    • 18 code files (up to 5229 lines).

    • 400 Global variables.

    • 500 functions

  • Wish to:

    • Simplify the code.

    • Improve structure of code (modularisation)

    • Make easier to make functional changes.

    • Make easier to understand code.

    • Improve portability

      • Qt Cross platform C++ GUI toolbox.


  • Slide69 l.jpg

    The Local Minimum Problem: Clustal is “Greedy”

    local minimum

    Energy

    Location

    Global minimum


    ad