Multiple Alignments and Multivariate Analysis - PowerPoint PPT Presentation

Multiple alignments and multivariate analysis l.jpg
Download
1 / 69

  • 228 Views
  • Updated On :
  • Presentation posted in: Pets / Animals

Multiple Alignments and Multivariate Analysis. Clustal: 1988-2006. Multiple Alignments. Human beta --------VHLT PEEKSAVTALWGKV N–- VDEVGGEALGRLLVV YP WTQR FFESFGDLST Horse beta --------VQLS GEEKAAVLALWDKV N–- EEEVGGEALGRLLVV YP WTQR FFDSFGDLSN

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Multiple Alignments and Multivariate Analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Multiple alignments and multivariate analysis l.jpg

Multiple Alignments and Multivariate Analysis

Clustal: 1988-2006


Slide2 l.jpg

Multiple Alignments

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Phylogenetic AnalysisSecondary Str. Prediction

Homology DetectionProfile Analysis

Homology Modeling


Slide3 l.jpg

VHLTPEEKSAVTALWGKVN--VDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNP

-VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-----HGSA

* * * * * **** * * *** * * * * * *** *

KVKAHGKKVLGAFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHF

QVKGHGKKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHL

** ***** * ** * ** ** ** *** ** ** * ** *

GKEFTPPVQAAYQKVVAGVANALAHKYH

PAEFTPAVHASLDKFLASVSTVLTSKYR

**** * * * * * * **

  • Dynamic Programming

    • Needleman and Wunsch, 1970

    • O(L2) algorithm

  • Maximise score (or minimise distance)

    • Gap penalties

    • Amino acid weight matrix


Slide4 l.jpg

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Weighted Sums of Pairs: WSP

Time O(LN)


Slide5 l.jpg

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Weighted Sums of Pairs: WSP

SequencesTime

Time O(LN)

21 second

3150 seconds

46.25 hours

539 days

616 years

72404 years


Slide6 l.jpg

Horse beta

Human beta

Horse alpha

Human alpha

Whale myoglobin

Lamprey cyanohaemoglobin

Lupin leghaemoglobin

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

  • Progressive Alignment:

    • Feng and Doolittle, 1987

    • Barton and Sternberg, 1987

    • Willie Taylor, 1987, 1988

    • Hogeweg and Hesper, 1984


Slide7 l.jpg

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Horse beta

Human beta

Horse alpha

Human alpha

Whale myoglobin

Lamprey cyanohaemoglobin

Lupin leghaemoglobin


Slide8 l.jpg

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Horse beta

Human beta

Horse alpha

Human alpha

Whale myoglobin

Lamprey cyanohaemoglobin

Lupin leghaemoglobin


Slide9 l.jpg

Human beta --------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

Horse beta --------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

Human alpha ---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

Horse alpha ---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

Whale myoglobin ---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

Lamprey globin PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

Lupin globin --------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

*: : : * . : .: * : * : .

Human beta PDAVMGNPKVKAHGKKVLGAFSDGLAHLDN-----LKGTFATLSELHCDKLHVDPENFRL

Horse beta PGAVMGNPKVKAHGKKVLHSFGEGVHHLDN-----LKGTFAALSELHCDKLHVDPENFRL

Human alpha ----HGSAQVKGHGKKVADALTNAVAHVDD-----MPNALSALSDLHAHKLRVDPVNFKL

Horse alpha ----HGSAQVKAHGKKVGDALTLAVGHLDD-----LPGALSNLSDLHAHKLRVDPVNFKL

Whale myoglobin EAEMKASEDLKKHGVTVLTALGAILKKKGH-----HEAELKPLAQSHATKHKIPIKYLEF

Lamprey globin ADQLKKSADVRWHAERIINAVNDAVASMDDT--EKMSMKLRDLSGKHAKSFQVDPQYFKV

Lupin globin VP--QNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKNLGSVHVSKGVAD-AHFPV

. .:: *. : . : *. * . : .

Human beta LGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVANALAHKYH------

Horse beta LGNVLVVVLARHFGKDFTPELQASYQKVVAGVANALAHKYH------

Human alpha LSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR------

Horse alpha LSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVSTVLTSKYR------

Whale myoglobin ISEAIIHVLHSRHPGDFGADAQGAMNKALELFRKDIAAKYKELGYQG

Lamprey globin LAAVIADTVAAG---D------AGFEKLMSMICILLRSAY-------

Lupin globin VKEAILKTIKEVVGAKWSEELNSAWTIAYDELAIVIKKEMNDAA---

: : .: . .. . :

Horse beta

Human beta

Horse alpha

Human alpha

Whale myoglobin

Lamprey cyanohaemoglobin

Lupin leghaemoglobin


Clustal l.jpg

Clustal

  • 35000 citations

  • Clustal1-Clustal4 1988

    • Paul Sharp, Dublin

  • Clustal V 1992

    • EMBL Heidelberg,

      • Rainer Fuchs

      • Alan Bleasby

  • Clustal W 1994-2006, Clustal X 1997-2006

    • Toby Gibson, EMBL, Heidelberg

    • Julie Thompson, ICGEB, Strasbourg

  • Clustal W and Clustal X 2.0 early 2007

    • University College Dublin


Since 1994 l.jpg

Since 1994?

Benchmarks

Protein structure alignments and superpositions

  • Barton and Sternberg; Fitch and McLure

  • Dali

  • BaliBase

  • Homstrad

  • Oxbench

  • Prefab etc. etc.

  • Protein structure analysis

    • APDBO'Sullivan O, Zehnder M, Higgins D, Bucher P, Grosdidier A, Notredame C. (2003) APDB: a novel measure for benchmarking sequence alignment methods without reference alignments. Bioinformatics. 2003;19 Suppl 1:i215-21.

  • RNA alignments

    • Bralibase (Gardner PP, Wilm A & Washietl S (2005) NAR.)


Which method is best l.jpg

Which Method is Best?

  • Clustal W????

  • MSA (Lipman, Altschul, Kececioglu)

    • DCA (Stoye), PRRP (Gotoh) , SAGA (Notredame)

  • T-Coffee (Notredame)

    • 3-D Coffee M-Coffee

  • MAFFT (Katoh) and MUSCLE (Edgar)

  • Probcons (Do, Brudno, Batzoglu)

For Global Protein alignments!!!


Clustal w and x 2 0 l.jpg

Clustal W and X 2.0?

  • Jan 2007

  • Re-engineered in C++

  • Aim to increase accuracy

    • Iteration (Wallace, I. M., O'Sullivan, O. and Higgins, D. G., 2005 Evaluation of iterative alignment algorithms for multiple alignment. Bioinformatics 21:1408.)

  • Reduce run times


Multivariate analysis l.jpg

Multivariate Analysis?


Slide16 l.jpg

ADE-4

http://pbil.univ-lyon1.fr/ADE-4/

Thioulouse J., Chessel D., Dolédec S., & Olivier J.M. (1997) ADE-4: a multivariate analysis and graphical display software. Statistics and Computing, 7, 1, 75-83.


Slide17 l.jpg

Between Group Analysis BGA

Dolédec, S. & Chessel, D. (1987) Acta Oecologica, Oecologica Generalis, 8, 3, 403-426.Supervised Correspondence Analysis or PCA

  • MADE4

    • Culhane, A., Thiolouse, J., Perriere, G., Higgins, D.G. (2005) MADE4: an R package for multivariate analysis of gene expression data. Bioinformatics. 21(11):2789-2790.

  • CO-Inertia Analysis CIA

  • Dolédec, S. & Chessel, D. (1994) Freshwater Biology, 31, 277-294.

    • Thioulouse, J. & Lobry, J.R. (1995) CABIOS, 11, 321-329

    • 2 datasets; Simultaneous CA or PCA


Use ca pca for sequences l.jpg

Use CA, PCA for Sequences?

PCOORD on sequence distances:

Higgins, D.G. (1992) Sequence ordinations: a multivariate analysis approach to analysing large sequence data sets. CABIOS, 8, 15-22.

PCA on dipeptide composition:

Van Heel, M. (1991)A new family of powerful multivariate statistical sequence analysis techniques.J. Mol Biol.220(4): 877-887.

PCA on alignment columns:

Casari G, Sander C, Valencia A. (1995)A method to predict functional residues in proteins.Nat Struct Biol. 2(2):171-8.


Supervised pca or ca l.jpg

Supervised PCA or CA?

Malate Dehydrogenases

Lactate Dehydrogenases


Slide20 l.jpg

Between Group Analysis

samples

genes

GSVD

N


Slide21 l.jpg

15 Chymotrypsins

Trypsin-like serine proteases

10 Elastases

31 Trypsins


Slide22 l.jpg

Trypsin


Slide23 l.jpg

Trypsin


Bga with ca or pca l.jpg

BGA With CA or PCA?

  • CA:

    • Pretty pictures

    • Sequences/residues plots

    • Finds any clear/simple patterns

      • Binary aa variables

  • PCA:

    • Use continuous variables

      • e.g. aa properties: size, charge, hydrophobicity etc.


Slide26 l.jpg

15 Chymotrypsins

31 Trypsins

Sequences

BGA with PCA

using

5 amino acid properties (A-E)

10 Elastases

Residue weights


Bga on alignments l.jpg

BGA on Alignments

  • Focus on any split in the data

  • Binary or Property coding

    • CA or PCA

  • Sequence Weighting

  • Pseudocounts


Slide29 l.jpg

Iteration

Benchmarking

Clustal W 2.0

Gordon Blackshields

Mark Larkin

Paul McGettigan

Iain Wallace

Clustal

Toby Gibson, EMBL

Julie Thompson, ICGEB, Strasbourg

BGA, CIA, MADE4

Aedín Culhane

Guy Perriere

Jean Thiolouse

Ian Jeffery

Ailís Fagan


Slide31 l.jpg

SeqA GARFIELD THE LAST FAT CAT

SeqB GARFIELD THE FAST CAT

SeqC GARFIELD THE VERY FAST CAT

SeqD THE FAT CAT

SeqA GARFIELD THE LAST FA-T CATSeqB GARFIELD THE FAST CA-T ---SeqC GARFIELD THE VERY FAST CATSeqD -------- THE ---- FA-T CAT


Slide32 l.jpg

Weighted Sums of Pairs

MSABranch and BoundLipman, Altschul and Kececioglu, 1989

FastMSATweaked MSAGupta, Kececioglu and Schaeffer, 1995

DCADivide and ConquerStoye, Moulton and Dress, 1997

SAGAGenetic AlgorithmNotredame and Higgins, 1996

PRRPIterationGotoh, 1996


Slide33 l.jpg

Genetic Algorithm

Selection (WSP)

MutationRecombination (cross-overs)


Slide34 l.jpg

Genetic Algorithm

Selection (WSP)

MutationRecombination (cross-overs)


Slide35 l.jpg

Genetic Algorithm

Selection (WSP)

MutationRecombination (cross-overs)


Slide36 l.jpg

SAGA

  • Cedric Notredame

  • Sequence Alignment by Genetic Algorithm

  • Optimise any objective function

  • Notredame, C. and Higgins, D.G. (1996)SAGA: Sequence alignment by genetic algorithm. Nucleic Acids Research, 24:1515-1524.


Slide37 l.jpg

Structure Test Cases

MSA

SAGA


Slide38 l.jpg

Structure Test Cases

MSA

SAGA


Which method is best39 l.jpg

Which method is best?

  • Best score?

  • Empirical tests?

    • Sets of test cases

      • Fitch and McLure

      • BaliBase

      • Homstrad

      • Oxbench

      • Prefab etc. etc.

    • APDBO'Sullivan O, Zehnder M, Higgins D, Bucher P, Grosdidier A, Notredame C. (2003) APDB: a novel measure for benchmarking sequence alignment methods without reference alignments. Bioinformatics. 2003;19 Suppl 1:i215-21.


Coffee l.jpg

COFFEE

  • Consistency based Objective Function For Evaluation of Ehhhh things

  • Maximum Weight Trace (John Kececioglu)

  • Maximise similarity to a LIBRARY of residue pairs

  • Notredame, C., Holm, L. and Higgins, D.G. (1998) COFFEE: An objective function for multiple sequence alignments. Bioinformatics 14: 407-422.


Slide41 l.jpg

e.g.

Seq N, Residue I

Seq M, Residue J

Weight = w

Pairs of Residues

Human beta VHLTPEEKSAVTALWGKVN–-VDEVGGEALHorse beta VQLSGEEKAAVLALWDKVN–-EEEVGGEALHuman alpha –VLSPADKTNVKAAWGKVGAHAGEYGAEALHorse alpha –VLSAADKTNVKAAWSKVGGHAGEYGAEAL


Slide42 l.jpg

% Match


Slide43 l.jpg

% Match


T coffee l.jpg

T-Coffee

  • Heuristic approximation to COFFEE

    • Uses progressive alignment (Trees)

  • Heterogenous data

    • Sequences

    • Structures

    • Genomes

    • ESTs

  • Notredame, C, Higgins, DG and Heringa, J. (2000) T-Coffee: A novel method for fast and accurate multiple sequence alignment. J.Mol.Biol., 302: 205-217.


T coffee45 l.jpg

T-Coffee

  • Mixed data sources

    • Primary library from

      • Lalign (SIM):

        • 10 best local alignments

      • Clustalw

        • All pairwise alignments

      • SAP (Willie Taylor, Structure Superposition)

      • Multiple alignments

  • Check library for CONSISTENCY

    • Upweight pairs of residues that agree with other pairs

Default


Slide46 l.jpg

Mixing Heterogenous Information

Local Alignment

Global Alignment

Multiple Alignment

Specialist

Structural

T-Coffee

Multiple Sequence Alignment

Copyright Cédric Notredame, 2000, all rights reserved


Slide47 l.jpg

Mixing Heterogenous Information

e.g. SAP

Taylor and Orengo

Structure Superposition

Weighted Residue Pairs

Copyright Cédric Notredame, 2000, all rights reserved


Increasing structure numbers l.jpg

Increasing Structure Numbers


Including structures in an alignment l.jpg

66.49

80

60

38.39

35.24

%accuracy

40

20

0

clustalw

T_Coffee Default

T_Coffee plus all

structures

Including Structures in an Alignment

3D-Coffee

O’Sullivan, O., Suhre, K., Abergel, C., Higgins, DG and Notredame, C

(2004) J.Mol.Biol.


Recent developments l.jpg

Recent Developments

  • 20-30 new programs in past 2 years

  • MUSCLE

    • Bob Edgar, ISMB, 2004

    • Iteration/progressive alignment

      • FAST

      • Big Alignments

  • PROBCONS

    • Tom Do, Michael Brudno, Serafim Batzoglou

    • ISMB 2004

    • “P-Coffee”

      • VERY accurate


Iteration revisited l.jpg

Iteration Revisited

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE


Iteration revisited52 l.jpg

Iteration Revisited

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE


Iteration revisited53 l.jpg

Iteration Revisited

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-


Iteration revisited54 l.jpg

Iteration Revisited

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

--------VHLTPEEKSAVTALWGKVN–-VDEVGGEALGRLLVVYPWTQRFFESFGDLST

--------VQLSGEEKAAVLALWDKVN–-EEEVGGEALGRLLVVYPWTQRFFDSFGDLSN

---------VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF-DLS-

---------VLSEGEWQLVLHVWAKVEADVAGHGQDILIRLFKSHPETLEKFDRFKHLKT

PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQEFFPKFKGLTT

--------GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKDLFSFLKGTSE

---------VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHF-DLS-


Slide55 l.jpg

Iterate

Iterate

Iterate

Remove EACH Sequence RF

Remove BEST SequenceRB

RandomRandom

Tree basedTree


Iteration on homstrad 184 l.jpg

Iteration on HomStrad 184

Wallace, O’Sullivan and Higgins, 2004, Bioinformatics, 21:1408


Slide57 l.jpg

Combining Multiple Alignment Methods

Clustal W

T-Coffee

Probcons

Specialist

Muscle

T-Coffee

Multiple Sequence Alignment

Copyright Cédric Notredame, 2000, all rights reserved


The wisdom of crowds james surowiecki l.jpg

The Wisdom of CrowdsJames Surowiecki

Crowds are surprisingly good at accurate decisions

Better than “experts”

Only if they do not form a “mob”


Slide61 l.jpg

M-Coffee combine 8 methods


Slide62 l.jpg

Iteration

Benchmarking

Clustal W 2.0

Gordon Blackshields

Mark Larkin

Paul McGettigan

Iain Wallace

Clustal

Toby Gibson, EMBL

Julie Thompson, ICGEB, Strasbourg

BGA, CIA, MADE4

Aedín Culhane

Guy Perriere

Jean Thiolouse

Ian Jeffery

Ailís Fagan


Slide63 l.jpg

BaliBASE

Thompson, JD, Plewniak, F. and Poch, O. (1999)NAR and Bioinformatics

  • ICGEB Strasbourg

  • 141 manual alignments using structures

    • 5sections

    • core alignment regions marked

3. Two groups (12)

1. Equidistant(82)

4. Long internal gaps(13)

5. Long terminal gaps(11)

2. Orphan(23)


Compare methods l.jpg

Compare Methods

  • SamHMMHughey and Krogh, 1996

  • DialignLocal multiple alignmentsMorgenstern, 1999

  • ClustalWProgressive alignmentThompson, Higgins and Gibson, 1994

  • PrrpIterative WSPGotoh, 1996

  • T-CoffeePairwise libraryNotredame, Higgins and Heringa, 2000


Slide65 l.jpg

% alignment columns correct

Core alignment blocks only


Slide66 l.jpg

% alignment columns correct

Core alignment blocks only


Clustal67 l.jpg

Clustal

  • Clustal, Clustal1-4TCD

    • Higgins DG, Sharp PM. (1988)CLUSTAL: a package for performingmultiple sequence alignment on a microcomputer.

      Gene. 73(1):237-44.

    • Higgins DG, Sharp PM. (1989)Fast and sensitive multiple sequence alignments on a microcomputer.

      Comput Appl Biosci. 5(2):151-3. 

  • ClustalVHeidelberg

    • Higgins DG, Bleasby AJ, Fuchs R. (1992)CLUSTAL V: improved software for multiple sequence alignment.

      Comput Appl Biosci. 8(2):189-91.

  • ClustalWHinxton

    • Thompson JD, Higgins DG, Gibson TJ. (1994)CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 22(22):4673-80.

  • ClustalXUCC

    • Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. (1997)TheCLUSTAL_X windows interface: flexible strategies for multiple sequence

      alignmentaidedby quality analysis tools.

      Nucleic Acids Res. 25(24):4876-82.


Clustal re engineering in c l.jpg

Clustal re-engineering in C++

  • Problems:

    • Code has become very complex.

    • 18 code files (up to 5229 lines).

    • 400 Global variables.

    • 500 functions

  • Wish to:

    • Simplify the code.

    • Improve structure of code (modularisation)

    • Make easier to make functional changes.

    • Make easier to understand code.

    • Improve portability

      • Qt Cross platform C++ GUI toolbox.


  • Slide69 l.jpg

    The Local Minimum Problem: Clustal is “Greedy”

    local minimum

    Energy

    Location

    Global minimum


  • Login