Multiple sequence alignments
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Multiple Sequence Alignments PowerPoint PPT Presentation


  • 142 Views
  • Uploaded on
  • Presentation posted in: General

Multiple Sequence Alignments. Multiple Alignments. Generating multiple alignments Web servers Analyzing a multiple alignment what makes a ‘good’ multiple alignment? what can it tell us, why is it useful? Adjusting a multiple alignment Alignment editors and HowTo Demonstration and practice.

Download Presentation

Multiple Sequence Alignments

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Multiple sequence alignments

Multiple Sequence Alignments


Multiple alignments

Multiple Alignments

  • Generating multiple alignments

    • Web servers

  • Analyzing a multiple alignment

    • what makes a ‘good’ multiple alignment?

    • what can it tell us, why is it useful?

  • Adjusting a multiple alignment

    • Alignment editors and HowTo

    • Demonstration and practice


What is a multiple alignment

What is a Multiple Alignment?

  • A comparison of sequences

    • “multiple sequence alignment”

  • A comparison of equivalents:

    • Structurally equivalent positions

    • Functionally equivalent residues

    • Secondary structure elements

    • Hydrophobic regions, polar residues


Generating multiple alignments

Generating multiple alignments

  • Pairwise sequence alignment is easy with sufficiently closely related sequences.

  • Below a certain level of identity sequence alignment may become uncertain :

    • twilight zone for aa sequences ~ 30%.

  • In or below the twilight zone it is good to make use of additional information, eg, from evolution.

  • A multiple alignment of diverse sequences is more informative than a pairwise alignment:

    • residues conserved over longer period of time are under stronger evolutionary constraints.


Multiple sequence alignments algorithms

Multiple Sequence Alignments Algorithms

  • Multiple sequence alignment uses heuristic methods only:

    • With dynamic programming, computational time quickly explodes as the number of sequences increases.

  • Different methods/algorithms:

    • Segment-based (DiAlign, …).

    • Iterative (HMMs, DiAlign, PRRP, …).

    • Progressive (Clustalw, T-Coffee, MUSCLE, …).


Progressive alignment

Progressive Alignment

  • Step1: Calculate all pairwise alignments and calculate distances for all pairs of sequences.

  • Step 2: Construct guide tree joining the most similar sequences using Neighbour Joining.

Step 1

Step 2


Progressive alignment1

Progressive Alignment

  • Step 3: From the tree assign weights for each sequence:

    • We want to down-weight nearly identical sequences and up-weight the most divergent ones.

  • Step 4: Align sequences, starting at the leaves of the guide tree:

    • Pairwise comparisons as well as comparison of single sequence with a group of sequences (Profile)

  • Caveat: errors introduced early cannot be corrected by subsequent information


Web servers

Web servers

  • ClustalW: http://www.ebi.ac.uk/Tools/clustalw2/

  • T-Coffee: http://www.ebi.ac.uk/Tools/t-coffee/

  • MUSCLE: http://www.ebi.ac.uk/Tools/muscle/

  • DiAlign: http://dialign.gobics.de/

  • ... and more at http://helix.nih.gov/apps/bioinfo/msa.html.


Clustalw features

Clustalw features

  • Amino acid substitution matrices are varied at different alignment stages according to the divergence of the sequences to be aligned.

  • Reduced gap penalties in hydrophilic regions encourage new gaps in potential loop regions rather than regular secondary structure.

Insertions and deletions are more common in loop regions than in the core of the protein!


T coffee features

T-Coffee features

  • More accurate than ClustalW

  • Instead of amino acid substitution matrices, uses consistency in a library of pairwise alignments

Vertices represent positions in protein

sequence. Edges represent pairwise

alignments between protein sequences.

If residues I and J have many common

neighbours, their consistency is high.

j

i


Muscle

MUSCLE

  • Fast implementation

  • Sometimes more accurate than ClustalW or T-Coffee


Example

Example

  • Let’s build a multiple alignment for the following sequences :

    >query MKNTLLKLGVCVSLLGITPFVSTISSVQAERTVEHKVIKNETGTISISQLNKNVWVHTELGYFSGEAVPSNGLVLNTSKGLVLVDSSWDDKLTKELIEMVEKKFKKRVTDVIITHAHADRIGGMKTLKERGIKAHSTALTAELAKKNGYEEPLGDLQSVTNLKFGNMKVETFYPGKGHTEDNIVVWLPQYQILAGGCLVKSASSKDLGNVADAYVNEWSTSIENVLKRYGNINLVVPGHGEVGDRGLLLHTLDLLK>gi|2984094 MGGFLFFFLLVLFSFSSEYPKHVKETLRKITDRIYGVFGVYEQVSYENRGFISNAYFYVADDGVLVVDALSTYKLGKELIESIRSVTNKPIRFLVVTHYHTDHFYGAKAFREVGAEVIAHEWAFDYISQPSSYNFFLARKKILKEHLEGTELTPPTITLTKNLNVYLQVGKEYKRFEVLHLCRAHTNGDIVVWIPDEKVLFSGDIVFDGRLPFLGSGNSRTWLVCLDEILKMKPRILLPGHGEALIGEKKIKEAVSWTRKYIKDLRETIRKLYEEGCDVECVRERINEELIKIDPSYAQVPVFFNVNPVNAYYVYFEIENEILMGE>gi|115023|sp|P10425|MKKNTLLKVGLCVSLLGTTQFVSTISSVQASQKVEQIVIKNETGTISISQLNKNVWVHTELGYFNGEAVPSNGLVLNTSKGLVLVDSSWDNKLTKELIEMVEKKFQKRVTDVIITHAHADRIGGITALKERGIKAHSTALTAELAKKSGYEEPLGDLQTVTNLKFGNTKVETFYPGKGHTEDNIVVWLPQYQILAGGCLVKSAEAKNLGNVADAYVNEWSTSIENMLKRYRNINLVVPGHGKVGDKGLLLHTLDLLK>gi|115030|sp|P25910|MKTVFILISMLFPVAVMAQKSVKISDDISITQLSDKVYTYVSLAEIEGWGMVPSNGMIVINNHQAALLDTPINDAQTEMLVNWVTDSLHAKVTTFIPNHWHGDCIGGLGYLQRKGVQSYANQMTIDLAKEKGLPVPEHGFTDSLTVSLDGMPLQCYYLGGGHATDNIVVWLPTENILFGGCMLKDNQATSIGNISDADVTAWPKTLDKVKAKFPSARYVVPGHGDYGGTELIEHTKQIVNQYIESTSKP>gi|282554|pir||S25844 MTVEVREVAEGVYAYEQAPGGWCVSNAGIVVGGDGALVVDTLSTIPRARRLAEWVDKLAAGPGRTVVNTHFHGDHAFGNQVFAPGTRIIAHEDMRSAMVTTGLALTGLWPRVDWGEIELRPPNVTFRDRLTLHVGERQVELICVGPAHTDHDVVVWLPEERVLFAGDVVMSGVTPFALFGSVAGTLAALDRLAELEPEVVVGGHGPVAGP EVIDANRDYLRWVQRLAADAVDRRLTPLQAARRADLGAFAGLLDAERLVANLHRAHEELLGGHVRDAMEIFAELVAYNGGQLPTCLA


Clustalw at ebi

ClustalW at EBI

  • Many options:

    • CPU mode,

    • full/fast alignment,

    • window length in fast mode,

    • gap penalties.


Clustalw at ebi1

ClustalW at EBI

  • Automatic display of:

    • Score table

    • Alignment (optional colouring)

    • Tree guide

  • Link to Jalview alignment editor!


A note on the example

A note on the example

  • It is atypical:

    • It uses only three sequences.

    • One should use more in order to extract reliable informations.

  • It illustrates a common mistake:

    • It uses too closely related sequences.

    • One should use as divergent and diverse sequences as possible in order to extract relevant informations.


A good multiple alignment

A Good Multiple Alignment?

  • Difficult to define…

  • Good ones look pretty!

    • Aligned secondary structures

    • Strongly conserved residues / regions

    • Comparison with known structure helps

  • Bad ones look chaotic and random.


A good multiple alignment1

conservation

quality

consensus

?

A Good Multiple Alignment?


Multiple alignment features

Multiple Alignment Features

  • Barton (1993)

    • “The position of insertions and deletions suggests regions where surface loops exist…


Multiple alignment features1

Multiple Alignment Features


Multiple alignment features2

Multiple Alignment Features

  • Barton (1993)

    • “The position of insertions and deletions suggests regions where surface loops exist…

    • Conserved glycine or proline suggests aβ-turn...


Multiple alignment features3

Multiple Alignment Features


Multiple alignment features4

Multiple Alignment Features

  • Barton (1993)

    • “The position of insertions and deletions suggests regions where surface loops exist…

    • Conserved glycine or proline suggests aβ-turn…

    • Residues with hydrophobic properties conserved at i, i+2, i+4 (etc) separated by unconserved or hydrophilic residues suggests a surface β-strand…


Multiple alignment features5

Multiple Alignment Features


Multiple alignment features6

Multiple Alignment Features

  • Barton (1993)

    • “The position of insertions and deletions suggests regions where surface loops exist…

    • Conserved glycine or proline suggests aβ-turn…

    • Residues with hydrophobic properties conserved at i, i+2, i+4 (etc) separated by unconserved or hydrophilic residues suggests a surfaceβ-strand…

    • A short run of hydrophobic amino acids (4 or 5 residues) suggests a buriedβ-strand…


Multiple alignment features7

Multiple Alignment Features


Multiple alignment features8

Multiple Alignment Features

  • Barton (1993)

    • Pairs of conserved hydrophobic amino acids separated by pairs of unconserved or hydrophilic residues suggests anα-helix with one face packed in the protein core. Similarly, an i, i+3, i+4, i+7 pattern of conserved residues.”


Multiple alignment features9

Multiple Alignment Features


Multiple alignment features10

Multiple Alignment Features

  • Cysteine is a rare amino acid, and is often used in disulphide bonds ( pairs of conserved cysteines )

  • Charged residues ( histidine, aspartate, glutamate, lysine, arginine ) and other polar residues embedded in a conserved region indicate functional importance


Multiple alignment features11

Multiple Alignment Features


Quality assessment

Quality Assessment

  • Bad residues

    • Large distance from column consensus

  • Bad columns

    • Average distance from consensus is high – “entropy”

  • Bad regions

    • Profile scores

  • Bad quality doesn’t always mean badly aligned!

L

I

M

I

I

L

V

E

I

V

L

A

M

P

E

R

M

K

I

D

Q

G

Q

N

M

W

D

L

V

T

W

D

Y

A

A

S

L

D

F

D

N

P

G

G

A

C

R

T

T

L

I

D

R

I

N

A

I

E

V

M

A

K

L

I

Q


Quality assessment1

Quality Assessment

  • Profiles

    • A profile holds scores for each residue type (plus gaps) over every column of a multiple alignment

    • Concepts:

      • Consensus sequence

      • Amino acid similarity

    • Some multiple alignment programs use profiles to build or add to an alignment

    • Any alignment, or even one sequence, can be a profile (one sequence isn’t a very good one…)


What can we do with a multiple alignment

What can we do with a multiple alignment?

  • Identify subgroups (phylogeny)

    • Intra-group sequence conservation

    • Evolutionary relatedness (view tree)

  • Identify motifs (functionality)

    • Evolutionary signals

    • Highly conserved residues indicate functional or structural significance!

  • Widen search for related proteins

    • MA better than single sequence

    • Consensus sequence / profile useful

RPDDWHLHLR

GGIDTHVHFI

GFTLTHEHIC

PFVEPHIHLD

PKVELHVHLD


What do we want to do

What do we want to do?

  • Build a homology model?

    • Accuracy

  • Perform phylogenetic analysis?

    • Completeness

  • Functional analysis of a protein family?

    • Diversity


Building the initial alignment

Building the initial alignment

  • Fetch related sequences and run alignment

    • Clustal, Dialign, TCoffee, Muscle …

  • Fetch a multiple alignment from a database and add sequences of interest

    • Pfam, ProDom, ADDA …

  • Start from a motif-finding procedure

    • MEME, Pratt, Gibbs Sampler …


Adjusting the alignment

Adjusting the alignment

  • Filter alignment:

    • Remove any redundancy

    • Remove unrelated sequences

    • Remove unwanted domains

    • Recalculate alignment if necessary

  • Look for conserved motifs, adjust any misalignments. Try different colour schemes and thresholds.

  • One step at a time…


Jalview alignment editor

Jalview Alignment Editor

Clamp, M., Cuff, J., Searle, S. M. and Barton, G. J. (2004), "The Jalview Java Alignment Editor", Bioinformatics, 20, 426-7.


Colouring your alignment

HYDROPHOBIC

/ POLAR

hydrophobic

polar

BURIED INDEX

buried

surface

β-STRAND

LIKELIHOOD

probable

unlikely

HELIX LIKELIHOOD

probable

unlikely

Colouring your alignment


Colouring your alignment1

Colouring your alignment

  • By conservation thresholds:


Colouring your alignment2

Colouring your alignment

  • Conservation index

Amino Acid Property Classification Schema, eg: Livingstone & Barton 1993


Sequence features

Sequence Features


Check pdb structures

Check PDB Structures

  • Load MA with sequence(s) for known PDB structure

    • View >> Feature Settings >> Fetch DAS Features (wait...) OR

    • Right-click >> Associate Structure with Sequence >> Discover PDB ids (quicker)

  • Right-click sequence name >> View PDB Entry

  • Structure opens in new window – residues acquire MA colours

  • Highlight residues by hovering mouse over alignment or structure

  • Label residues by clicking on structure


Compare alignment to structure

Compare Alignment to Structure


Compare alignment to structure1

Compare Alignment to Structure

  • Crucial way of checking alignment!

  • Where are gaps / insertions /deletions ?

    • In secondary structures: bad

    • In surface loops: okay

  • Where are our key / functional residues?

    • Are they in probable active site?

    • Check they are clustered

    • Check they are accessible, not buried


Demonstration and practice

Demonstration and Practice

  • Start Jalview (click here)

  • Tools >> Preferences >>

    Visual

    select Maximise Window, unselect Quality, set Font Size to 8 or 9, Colour >> Clustal, uncheck Open File

    Editing

    check Pad Gaps When Editing

  • File >> Input Alignment >> from URL (use this one)

  • Get used to the controls – selecting and deselecting sequences/groups (drag mouse), dragging sequences/groups (use shift/ctrl), selecting sequence regions, hiding sequences/groups, removing columns and regions… Then explore menus and tools.

  • Now load this alignment – I’ve messed up a good alignment, and now I’d like you to correct it! There are two groups of sequences and one single sequence to adjust.


Demonstration and practice1

Demonstration and Practice

  • View >> Feature Settings >> DAS Settings

    • select Uniprot, dssp, cath, Pfam, PDBsum_ligands, PDBsum_DNAbinding, then click ‘Save as default’

    • click Fetch DAS Features (then click yes at prompt) ...

    • Move mouse over alignment and read information about features

    • Move mouse over sequence names to check for PDB ids

  • Open a PDB structure (choose any)

  • View >> uncheck Show All Chains, then use up-arrow key to increase structure size.

  • Hover mouse over structure (see how residues are highlighted in the sequence), then do same for sequence. Select residues in the structure by clicking them – a label will appear. Click again to remove label.

  • Check position of insertions & deletions using this method.


  • Login