multiple sequence alignment n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Multiple Sequence Alignment PowerPoint Presentation
Download Presentation
Multiple Sequence Alignment

Loading in 2 Seconds...

play fullscreen
1 / 58

Multiple Sequence Alignment - PowerPoint PPT Presentation


  • 417 Views
  • Uploaded on

Multiple Sequence Alignment. Julie Thompson Laboratory of Integrative Bioinformatics and Genomics IGBMC, Strasbourg, France julie@igbmc.fr. Multiple Sequence Alignment. Introduction: what is a multiple alignment? Multiple alignment construction Traditional approaches: optimal, progressive

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Multiple Sequence Alignment' - greta


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
multiple sequence alignment

Multiple Sequence Alignment

Julie Thompson

Laboratory of Integrative Bioinformatics and Genomics

IGBMC, Strasbourg, France

julie@igbmc.fr

slide2

Multiple Sequence Alignment

  • Introduction: what is a multiple alignment?
  • Multiple alignment construction
    • Traditional approaches: optimal, progressive
    • Alignment parameters
    • Iterative and co-operative approaches
  • Multiple alignment analysis
    • Quality analysis/error detection
    • Conserved/homologous regions
  • Multiple alignment applications
slide3

lnk_rat

crk1_mouse

nck_human

ht16_hydat

pip5_human

fer_human

1ab2

1mil

1blj

1shd

1lkkA

1csy

1bfi

1gri

What is a multiple alignment?

  • a representation of a set of sequences, where equivalent residues (e.g. functional, structural) are aligned in rows or more usually columns

Example: part of an alignment of SH2 domains from 14 sequences

* conserved identical residues

: conserved similar residues

slide4

What is a multiple alignment?

conserved residues

secondary structure

conservation profile

slide5

Multiple Sequence Alignment

  • Introduction: what is a multiple alignment?
  • Multiple alignment construction
    • Traditional approaches: optimal, progressive
    • Alignment parameters
    • Iterative and co-operative approaches
  • Multiple alignment analysis
    • Quality analysis/error detection
    • Conserved/homologous regions
  • Multiple alignment applications
multiple alignment construction
Multiple Alignment Construction
  • Optimal multiple alignment

example : MSA (Lipman et al. 1989, Gupta et al. 1995)

optimal multiple alignment
Optimal multiple alignment

Extension of dynamic programming for 2 sequences => N dimensions

Example : alignment of 3 sequences

Problem : calculation time and memory requirements

Time proportional to Nk for k sequences of length N => limited to less than 10 sequences

Alignment of 5 sulfate binding proteins, length 224-263 residues:

MSA OMA ClustalW

>12hours 62.9min 0.6sec

slide8

Multiple Alignment Construction

  • Optimal multiple alignment

MSA, OMA

  • Progressive multiple alignment

ClustalW (Thompson et al. NAR. 1994)

ClustalX (Thompson et al. NAR. 1997)

progressive multiple alignment

Problem :

Start with which sequences ? How to decide order of alignment ?

  • first align the most closely related sequences

How to measure the similarity of the sequences ?

  • align all the sequences pairwise
  • calculate the similarity between each pair from the alignment
Progressive multiple alignment

Idea :

Progressively align pairs of sequences (or groups of sequences)

progressive multiple alignment1
Progressive multiple alignment

1) Pairwise alignments of all sequences

The alignment can be obtained by :

- local or global method

- dynamic programming or heuristic method (eg. K-tuple count)

Hbb_human 3 LTPEEKSAVTALWGKV..NVDEVGGEALGRLLVVYPWTQRFFESFGDLST ...

|.| :|. | | |||| . | | ||| |: . :| |. :| | |||

Hba_human 2 LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF.DLS. ...

Ex : local pairwise alignments of globin sequences

Hbb_human 1 VHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLST ...

| |. |||.|| ||| ||| :|||||||||||||||||||||:||||||

Hbb_horse 1 VQLSGEEKAAVLALWDKVNEEEVGGEALGRLLVVYPWTQRFFDSFGDLSN ...

Hba_human 2 LSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHF.DLSH ...

|| :| | | | || | | ||| |: . :| |. :| | |||.

Hbb_horse 3 LSGEEKAAVLALWDKVNEE..EVGGEALGRLLVVYPWTQRFFDSFGDLSN ...

progressive multiple alignment2

Hbb_human

Hbb_horse

Hba_human

Hba_horse

Myg_phyca

Glb5_petma

Lgb2_lupla

1

2

3

4

5

6

7

Progressive multiple alignment

Example in ClustalW/X :

distance between 2 sequences = 1-

2) Construction of

a distance matrix

No. identical residues

No. aligned residues

-

.17 -

.59 .60 -

.59 .59 .13 -

.77 .77 .75 .75 -

.81 .82 .73 .74 .80 -

.87 .86 .86 .88 .93 .90 -

1

Ex : 7 globin sequences

2

3

4

5

6

7

progressive multiple alignment3

Progressive alignment following a guide tree

Progressive alignment using sequential branching

Hbb_human

.081

Hba_human

1

2

.226

Hbb_horse

.084

Hba_horse

2

.061

3

Hba_human

.055

Hbb_horse

3

1

.219

.015

4

Hba_horse

.065

4

Hbb_human

Myg_phyca

.062

.398

Glb5_petma

5

5

Glb5_petma

6

6

.389

Myg_phyca

Lgb2_lupla

.442

Lgb2_lupla

Progressive multiple alignment
  • Sequential branching
  • Construction of a ‘guide tree’
  • - Neigbor-Joining (NJ)
  • - UPGMA
  • - Maximum likelihood

3) Decide order of alignment

progressive multiple alignment4
Progressive multiple alignment

4) Progressive multiple alignment

The sequences are aligned progressively (global or local algorithm) :

- alignment of 2 sequences

- alignment of 1 sequence and a profile (group of sequences)

- alignment of 2 profiles (groups of sequences)

xxxxxxxxxxxxxxx

xxxxxxxxxxxxxxx

xxxxxxxxxxxxxxx

xxxxxxxxxxxxxxx

xxxxxxxxxxxxxxx

xxxxxxxxxxxxxxx

xxxxxxxxxxxxxxx

xxxxxxxxxxxxxxx

progressive multiple alignment6
Progressive multiple alignment

Global

Local

SB

SBpima

multal

NJ

clustalx

UPGMA

ML

multalign

pileup

MLpima

SB - sequential branching

UPGMA- Unweighted Pair Grouping Method

ML - maximum likelihood

NJ - neighbor-joining

slide16

A C G T

A 2 -2 -1 -2

C -2 2 -2 -1

G -1 -2 2 -2

T -2 -1 -2 2

Alignment parameters : similarity matrices

Dynamic programming methods score an alignment using residue similarity matrices, containing a score for matching all pairs of residues

For nucleotide sequences:

Transitions (A-G or C-T) are more frequent than transversions (A-T or C-G)

More complex matrices exist where matches between ambiguous nucleotides are given values whenever there is any overlap in the sets of nucleotides represented

slide17

Alignment parameters : similarity matrices

For proteins, a wide variety of matrices exist:

Identity, PAM, Blosum, Gonnet etc.

Matrices are generally constructed by observing the mutations in large sets of alignments, either sequence-based or structure-based

Matrices range from strict ones for comparing closely related sequences to soft ones for very divergent sequences.

e.g. PAM250 corresponds to an evolutionary distance of 250%, or approximately 80% residue divergence

PAM1 corresponds to less than 1% divergence

slide18

Alignment parameters : similarity matrices

A single best matrix does not exist!

  • Altschul, 1991 suggests PAM250 for related sequences, PAM120 when the sequences are not known to be related and PAM40 to search for short segments of highly similar sequences.
  • Henikoff, Henikoff, 1993 suggest Blosum62 as a good all-round matrix, Blosum45 for more divergent sequences and Blosum100 for strongly related sequences
  • ClustalW automatically selects a suitable matrix depending on the observed pairwise % identity:

By default: ID >35% Gonnet 80

35%>ID >25% Gonnet 250

<25%ID Gonnet 350

slide19

Alignment parameters : gap penalties

  • A gap penalty is a cost for introducing gaps into the alignment, corresponding to insertions or deletions in the sequences

SFGDLSNPGAVMG

HF-DLS-----HG

  • proportional gap costs charge a fixed penalty for each residue aligned with a gap - the cost of a gap is proportional to its length:

GAP_COST=ukwhere k is the length of gap

  • linear or ‘affine’ gap costs define a cost for introducing or ‘opening’ a gap, plus a length-dependent ‘extension’ cost

GAP_COST=v+ukwhere v is the gap opening cost,

u is the gap extension cost

slide20

HLTPEEKSAVTALWGKVN--VDEVGGEALGRLLVVYPWTQRFFESFGDL

QLSGEEKAAVLALWDKVN--EEEVGGEALGRLLVVYPWTQRFFDSFGDL

VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLS

VLSAADKTNVKAAWSKVGGHAGEYGAEALERMFLGFPTTKTYFPHFDLS

Alignment parameters : gap penalties

  • ClustalW uses position-specific gap penalties to make gaps more or less likely at different positions in the alignment
  • Gap penalties are lowered at existing gaps and increased near to existing gaps
  • Gap penalties are lowered in hydrophilic stretches
  • Otherwise, gap opening penalties are modified according to their observed relative frequencies adjacent to gaps (Pascarella & Argos, 1992)

Goal is to introduce gaps in sequence segments corresponding to flexible regions of the protein structure

slide21

Multiple Alignment Construction

  • Optimal multiple alignment

MSA, OMA

  • Progressive multiple alignment

ClustalW,ClustalX

  • Iterative multiple alignment

PRRP (Gotoh, 1993)

SAGA (Notredame et al. NAR. 1996)

DIALIGN (Morgenstern et al. 1999)

HMMER (Eddy 1998), SAM (Karplus et al. 2001)

iterative refinement

converged?

Iterative refinement

PRRP(Gotoh, 1993) refines an initial progressive multiple alignment by iteratively dividing the alignment into 2 profiles and realigning them.

divide sequences

into 2 groups

pairwise

profile

alignment

profile 1

refined

alignment

initial

alignment

Global

progressif

profile 2

no

genetic algorithms
Genetic Algorithms

SAGA (Notredame et al.1996) evolves a population of alignments in a quasi evolutionary manner, iteratively improving the fitness of the population

slide24

Segment-to-segment alignment

Dialign (Morgenstern et al. 1996) compares segments of sequences instead of single residues

1. construct dot-plots of all possible pairs of sequences

Sequence i

Sequence j

2. find a maximal set of consistent diagonals in all the sequences

.......aeyVRALFDFngndeedlpfkKGDILRIrdkpeeq...............WWNAedsegkr.GMIPVPYVek..........

........nlFVALYDFvasgdntlsitKGEKLRVlgynhnge..............WCEAqtkngq..GWVPSNYItpvns.......

ieqvpqqptyVQALFDFdpqedgelgfrRGDFIHVmdnsdpn...............WWKGachgqt..GMFPRNYVtpvnrnv.....

gsmstselkkVVALYDYmpmnandlqlrKGDEYFIleesnlp...............WWRArdkngqe.GYIPSNYVteaeds......

.....tagkiFRAMYDYmaadadevsfkDGDAIINvqaideg...............WMYGtvqrtgrtGMLPANYVeai.........

..gsptfkcaVKALFDYkaqredeltfiKSAIIQNvekqegg...............WWRGdyggkkq.LWFPSNYVeemvnpegihrd

.......gyqYRALYDYkkereedidlhLGDILTVnkgslvalgfsdgqearpeeigWLNGynettgerGDFPGTYVeyigrkkisp..

3. Local alignment - residues between the diagonals are not aligned

multiple alignment methods
Multiple alignment methods

Progressive

Global

Local

SB

SBpima

multal

NJ

clustalx

UPGMA

ML

multalign

pileup

MLpima

prrp

Genetic Algo.

HMM

dialign

saga

hmmt

Iterative

league table based on balibase benchmark database

m

u

l

t

a

l

m

u

l

t

a

l

i

g

n

p

i

l

e

u

p

c

l

u

s

ta

l

x

p

r

rp

s

a

ga

h

mmt

M

L

p

i

ma

SB

p

im

a

d

i

a

l

i

g

n

Comparison of programs

League Table based on BAliBASE benchmark database

Reference 1: < 6 sequences

Reference 5:

long insertions

Reference 4:

long N/C

terminal

extensions

Reference 3:

several

sub-families

Reference 2:

a family with

an orphan

< 100

résidues

> 400

résidues

Tous

All

N

/

A

N

/

A

N

/

A

N

/

A

GLOBAL

iterative

N

/

A

N

/

A

LOCAL

iterative

  • Iterative algorithms can improve alignment quality, but can be slow
  • Global algorithms work well when sequences are homologous over their full lengths, local algorithms are better for non-colinear sequences

Thompson et al. 1999

slide27

Multiple Alignment Construction

  • Optimal multiple alignment

MSA, OMA

  • Progressive multiple alignment

ClustalW,ClustalX

  • Iterative multiple alignment

PRRP, SAGA, DIALIGN, HMMER, SAM

  • Co-operative multiple alignment
    • T-COFFEE (Notredame et al. 2000) http://igs-server.cnrs-mrs.fr/Tcoffee/
    • DbClustal (Thompson et al. 2000) http://www-igbmc.u-strasbg.fr/BioInfo/
    • MAFFT (Katoh et al. 2002) http://www.biophys.kyoto-u.ac.jp/˜katoh/programs/align/mafft/
    • MUSCLE (Edgar, 2004) http://www.drive5.com/muscle
    • Probcons (Do et al. 2005)
    • Kalign (Lassmann et al. 2005)
dbclustal

Ballast Anchors

DbClustal Alignment

Query Sequence

Anchors

DbClustal

http://bips.u-strasbg.fr/PipeAlign/

Blast Database Search

Query Sequence

Database Hits

Domain A

Domain B

Domain C

mafft
MAFFT
  • Local homologous segments detected using a Fast Fourier Transform
  • Pairwise alignments are performed using restricted global dynamic programming
  • Multiple alignment is built up using a progressive algorithm, similar to ClustalW
  • Multiple alignment is then iteratively refined by dividing alignment into 2 parts and realigning
mafft1

GLWGKAAAEEEGLWLFF—-

--KGVFGAEQEGLFVFFGG

K=2

-GLWGKAAAEEEGLWLFF

KGVFGAEQEGLFVFFGG-

K=-1

MAFFT

Pairwise alignments

c(k)

k

-1

2

1. Fast Fourier Transform

to detect local conserved segments

2. Segment Level Dynamic Programming

to select ‘consistent’ segments

3. Fix residues at the centre of each segment pair and realign between fixed points (white regions only)

slide32

ClustalW (1994)

Dialign (1996)

Mafft (2002)

Probcons (2005)

State-of-the-art

  • Co-operative algorithms have led to significant improvements…

Ref 11

<20% ID

BAliBASE 3 :

Ref 12

20-40% ID

Ref 5

insertions

Ref 2

orphan

Ref 4

extensions

Ref 3

sub-families

… but none of the methods currently available are capable of producing high-quality alignments for all test cases

Thompson et al. 2005, 2006

rna alignment methods
RNA alignment methods
  • Comparison using ‘BRAliBASE’ RNA structure alignments (Gardner et al, 2005)
  • Some more recent methods:
    • Sequence: R-Coffee (Wilm, 2008), MAFFT (Katoh, 2008)
    • Structure: LARA (Bauer, 2007), FoldalignM (Torarinsson, 2007), SCARNA (Tabei, 2008)
  • Above 60% identity, sequence and structure based approaches have similar scores
  • Algorithms incorporating structural information outperform pure sequence methods. However, these algorithms are computationally demanding which severely limits their use in practice.
dna alignment methods
DNA alignment methods
  • Complete genomes
    • Local alignments (BlastZ, MultiZ, MUMmer,…)
    • Global alignments (MGA, Multi-LAGAN,MAVID, MAUVE,MAP2, Mulan,…)

Reviewed in Dewey and Pachter, Human Molecular Genetics, 2006

slide35

Multiple Sequence Alignment

  • Introduction: what is a multiple alignment?
  • Multiple alignment construction
    • Traditional approaches: optimal, progressive
    • Alignment parameters
    • Iterative and co-operative approaches
  • Multiple alignment analysis
    • Quality analysis/error detection
    • Conserved/homologous regions
  • Multiple alignment applications
slide36

Multiple alignment analysis

  • Are the sequences correctly aligned?
    • Quality analysis: alignment objective functions (SP, NorMD)
    • error detection and correction (RASCAL, Refiner)
  • Are the sequences in the alignment homologous?
    • Conserved/homologous regions (MCOFFEE, LEON)
    • Conserved (functional) residues
objective functions
Objective functions

Sum-of-pairs (Carrillo, Lipman, 1988) : Sum of scores for all pairs of sequences

Blosum62

N C

N 6 -3

C -3 9

Seq1-2 3 pairs N-N 3x6=18

Sequence 1 N N N

Sequence 2 N N N

Sequence 3 N N C

Sequence 4 N C C

Seq1-3 2 pairs N-N, 1 pair N-C 2x6+(-3)=9

Seq1-4 1 pair N-N, 2 pairs N-C 6+2x(-3)=0

Seq2-3 2 pairs N-N, 1 pair N-C 2x6+(-3)=9

Seq2-4 1 pair N-N, 2 pairs N-C 6+2x(-3)=0

Seq3-4 1 pair N-N, 1 pair N-C, 1 pair CC 6+(-3)+9=12

48

  • Information content (Hertz et al, 1999)
  • Entropy column scores (between 0 and 1), sum for all columns in the alignment
  • norMD (Thompson et al, 2001)
  • Column scores
  • normalisation for sequence set to be aligned (number, length, similarity)
  • <0.3 bad alignment
  • 0.3-0.7 some local errors
  • >0.7 good alignment
slide38

‘HIGH’

H8

‘KMSKS’

1exd

Archeal/

Eukaryotic

GluRS

+

GlnRS

Bacterial

GluRS

1gln

1.0

1gln

1exd

0.5

Objective functions: NorMD

Window length = 8

Window length = 40

slide39

Define sequence groups with the Secator program

Wicker N. et al. (2001).

Define core blocks : regions with average NorMD_sw above a specified threshold

Calculate a Gribskov profile for each block in each group

Error detection and correction

  • RASCAL (Thompson et al, 2003), Refiner (Chakrabati et al, 2006)

RASCAL

slide40

HExxH

Error detection and correction

  • RASCAL, errors within core blocks

metalloprotease

slide41

DxxxG[AST]GxF[ILV]

DxxxG[AST]GxF[ILV]

Error detection and correction

  • RASCAL, errors between core blocks

methyltransferase

slide42

Homology detection methods

  • Sequence percent identity:
    • >30% identity  sequences are homologous
    • 15-30% identity  ‘twilight zone’
  • local analysis of positional conservation
    • AL2CO (Pi, Grishin, 2001), SEGID (Wang,Zu,2003), NorMD
  • Conserved regions
    • LEON (Thompson et al, 2004), MCOFFEE (Moretti et al, 2007)
slide43

Homology analysis with LEON

  • vertical analysis :sequence clustering, intermediate sequences
  • horizontal analysis : residue conservation, motif context information
  • composition analysis : prediction of compositionally biased segments
  • Homologous regions are delineated
  • Removal of sequences non-homologous to query
slide44

Homology analysis with LEON

Query sequence: DKK1_HUMAN

BlastP results :

slide45

Pfam :

Dickkopf N-terminal domain

Colipase

Colipase C-terminal domain

Homology analysis with LEON

dkk1

dkk2

dkk3

Prokinecitin/

Intestinal toxin

Lipase protein cofactor

slide46

For a training set of 510 potential targets :

No. of targets with at least 1 PDB neighbour

BlastP (E<10-7)

142 (28%)

BlastP (E<10-4)

166 (33%)

PipeAlign (BlastP E<10)

196 (38%)

PipeAlign (PDB-Blast)

223 (44%)

Structural proteomics : target characterisation

Detection of structural homologs for targets in the SPINE (Structural Proteomics in Europe) project

conserved residue analysis
Conserved residue analysis
  • Active site residues are under evolutionary pressure to maintain their functional integrity and undergo fewer mutations than less functionally important amino acids
  • Methods:
    • Evolutionary trace (Lichtarge et al, 1996): sequence conservation patterns in homologous proteins are mapped onto the protein surface to generate clusters identifying functional interfaces
conserved residue analysis1
Conserved residue analysis
  • Comparison of sequence-based methods
  • FRcons combines information :
    • conservation at each site
    • amino acid distribution
    • predicted secondary structure (ss)
    • predicted relative solvent accessibility (rsa)

FRcons: Fischer et al. Bioinformatics 2008

slide49

OrdAli : Ordered Alignment Analysis

color scheme

  • residues conserved in all sequences in family
    • structural or functional importance: characteristic motifs
  • residues conserved within a sub-group of sequences
    • discriminant residues
slide50

Euc

Arc

Bac

Euc

Arc

Bac

Schematic alignment of aspartyl-tRNA synthetases

  • universal proteins, play a key role in traduction

180

200

220

240

260

280

300

320

Anticodon binding domain

340

360

380

400

420

440

460

480

500

520

540

560

P

L Q PQ KQ

R

Motif I

Flipping

Motif II

loop

Catalytic core I

Insertion domain

690

710

730

750

770

790

810

830

850

870

890

930

H

G

Euc

Family conserved

Archaea+Bacteria

Archaea+Eukaryote

Arc

Bac

Motif III

Catalytic core II

slide51

Ballast

BlastP search

RASCALED MACS

Multiple Alignment of Complete Sequences

Ballast Anchors

DbClustal Alignment

DbClustal

Query Sequence

Anchors

Homologous regions

Plewniak et al. (2000) Bioinformatics.

RASCAL

LMS (local maximum segments)

Thompson et al (2004) Nucl Acids Res.

Thompson et al. (2001) J Mol Biol.

Thompson et al. (2003) Bioinformatics.

Thompson et al (2000) Nucl Acids Res.

LEON

  • Secator/DPC : automatic clustering algorithms

Wicker et al. (2001) Mol Biol Evol.

Wicker et al. (2002)Nucl Acids Res.

Phylogeny

Conserved residues/domains

2D/3D structure prediction

Cellular location prediction

NorMD

CLUSTERS

quality

PipeAlign: automatic protein analysis

http://www-igbmc.u-strasbg.fr/PipeAlign/

slide53

Multiple sequence alignment editors

No automatic method is 100% reliable - manual verification and refinement is essential!

SeqLab GCG Wisconsin Package

SeaView(Gaultier et al, 1996) http://pbil.univ-lyon1.fr/software/seaview.html UNIX/Linux, Windows 95+, MAC OS 8,9,X

WEB servers :

GeneAlign(Kurukawa)http://www.gen-info.osaka-u.ac.jp/geneweb2/genealign/

Jalview (Clamp, 1998) http://www.ebi.ac.uk/~michele/jalview/

CINEMA (Lord et al, 2002) http://www.bioinf.man.ac.uk/dbbrowser/cinema-mx

slide54

Multiple Sequence Alignment

  • Introduction: what is a multiple alignment?
  • Multiple alignment construction
    • Traditional approaches: optimal, progressive
    • Alignment parameters
    • Iterative and co-operative approaches
  • Multiple alignment analysis
    • Conserved/homologous regions
    • Quality analysis/error detection
  • Multiple alignment applications
central role of multiple alignments

euk

bac

arc

Central role of multiple alignments

domain

structure

conserved, functional sites

slide56

Phylogenetic studies

Comparative genomics

Hierarchical function annotation:

homologs, domains, motifs

Gene identification, validation

Structure comparison, modelling

Interaction networks

RNA sequence, structure, function

Human genetics, SNPs

Therapeutics, drug design

insertion domain

DBD

Therapeutics, drug discovery

LBD

binding sites / mutations

Central role of multiple alignments

Multiple alignment

slide57

Phylogenetic studies

Comparative genomics

Hierarchical function annotation:

euk

euk

euk

arc

arc

arc

bac

bac

bac

eukaryotic extension

anticodon binding

Gene identification, validation

Multiple alignment

Structure comparison, modelling

U A GG

A

U GUC

GGUUC.A.UC

hinge region

catalytic domain

Interaction networks

RNA sequence, structure, function

amino acid acceptor stem

aspartate determinants are conserved in

prokaryotes and eukaryotes (Becker et al, 1996)

AspRS in complex with tRNAAsp

(Cavarelli et al, 1993)

A

B

B

A

E

E

Human genetics, SNPs

Therapeutics, drug design

anticodon loop and stem

cloverleaf representation

Westhof et al, 1988

anticodon-binding

domain

global alignment

Example: protein, RNA complexes

ASP tRNA :

ASP tRNA synthetase :

aspRS, tRNA interactions :

Ruff et al, 1991

slide58

Phylogenetic studies

Multiple alignment based analysis identified a new gene (BBS10) with a chaperonin-like fold

Comparative genomics

Hierarchical function annotation:

B

A

A

E

B

E

eukaryotic extension

anticodon binding

insertion 1

deletion

insertion 2

insertion 3

euk

euk

euk

catalytic domain

hinge region

arc

arc

arc

anticodon-binding

domain

global alignment

BBS10

bac

bac

bac

BBS6

Gene identification, validation

Multiple alignment

Structure comparison, modelling

chaperonin

Interaction networks

RNA sequence, structure, function

U GUC

U A GG

A

GGUUC.A.UC

Human genetics, SNPs

Therapeutics, drug design

Example: Bardet Biedl Syndrome

Identification of new genes responsible for BBS : a rare recessive autosomic genetic disease,

probably caused by a defect at the basal body of ciliated cells

Phenotypes : obesity, retinopathy, polydactyly,

mental retardation, hypogonadism, renal failure

9 genes are known to be involved : BBS1 – BBS9

In a comparative genomics study, Li et al, (2004) identified 688 genes implicated in cilia and flagella

BBS10 gene shows a high frequency of mutation (~20% of patients)

  • Clinical studies have identified a candidate chromosomic region of 8Mb with approx. 23 genes
  • including 4 genes from set of 688

J. Muller et al 2006