Edward N. Trifonov
Download
1 / 54

Edward N. Trifonov University of Haifa and Masaryk University, Brno - PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on

Edward N. Trifonov University of Haifa and Masaryk University, Brno Thrill of linking polymer statistics and sequence space with protein structure and function Oak Ridge, 2009. Two related sequences, aligned

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Edward N. Trifonov University of Haifa and Masaryk University, Brno' - natalie-brown


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Edward n trifonov university of haifa and masaryk university brno

Edward N. Trifonov

University of Haifa and

Masaryk University, Brno

Thrill of linking polymer statistics

and sequence space

with protein structure and function

Oak Ridge, 2009


Edward n trifonov university of haifa and masaryk university brno

Two related sequences,

aligned

33% match

Q816J5

DVNLPKFDGFYWCRQIRHESTCPIIFISARAGEMEQIMAIESGADDYITKPFHYDVVMAKIKGQLRR

|||||-|||----|--|--|----------------------||||---|||------|-----|||

DVNLPGIDGWDLLRRLRERSSARVMMLTGHGRLTDKVRGLDLGADDFMVKPFQFPELLARVRSLLRR

Q7DCC5


Edward n trifonov university of haifa and masaryk university brno

Methyltransferases

LEVALALSQADIIVRDALVSQ8UBQ7

| | || ||| || ||||

LHAANALRQADVIVHDALVNQ92P47

| | | ||||||||||

LRAQRVLMEADVIVHDALVPQ8YEV9

||| | ||||||||||||||

LRAHRLLMEADVIVHDALVPQ98GP6

| ||||||||

LKGQRLLQEADVILYADSLV Q8DLD2

|||| ||||| || |||

IKGQRIVKEADVIIYAGSLV Q8REX7

|||| |||||||||

VKGQRLIRQCPVIIYAGSLV Q88HF0

| | || ||| ||||||

VRGRDLIAACPVCLYAGSLVQ8UBQ5


Edward n trifonov university of haifa and masaryk university brno

No-match relatives

LEVALALSQADIIVRDALVS Q8UBQ7

VRGRDLIAACPVCLYAGSLV Q8UBQ5


Edward n trifonov university of haifa and masaryk university brno

Response regulators

CPIIFISARAGEMEQIMAIEQ816J5

|||||||| | | ||||

VPIIFISARDSDMDQVMAIE Q97IX4

|| ||||||| | | | |

VPVIFISARDADIDRVLGLE O32192

|| | |||| ||||||||

VPILFLSARDEEIDRVLGLE Q89D26

|| | || || | |||||

IPIIMLTARSEEFDKVLGLE Q8R9H7

| |||||| ||| |||

SRIMMLTARSRLADKVRGLE Q88RT2

| |||| || ||||||

ARVMMLTGHGRLTDKVRGLDQ7DCC5

CPIIFISARAGEMEQIMAIEQ816J5

ARVMMLTGHGRLTDKVRGLDQ7DCC5

No-match relatives


Edward n trifonov university of haifa and masaryk university brno

Existing most advanced

sequence alignment techniques

(e. g. BLAST)

would not be able to qualify

such fully dissimilar sequence fragments

as relatives

unless many intermediate sequences

are analyzed

(that amounts to a whole research project)


To be related the sequences do not have to be similar upto even complete mismatch
To be related the sequences do not have to be similar(upto even complete mismatch)


Edward n trifonov university of haifa and masaryk university brno

single walk

network

(of relatives)


Edward n trifonov university of haifa and masaryk university brno

One can make long

walks

from fragment to fragment in the

formatted protein sequence space

(sequence fragments of the same length, 20 residues,

gathered from all or many proteomes)

Pair-wise connected matching fragments make also

networks


Edward n trifonov university of haifa and masaryk university brno

Networks of fragments of aa-tRNA synthetases

at various thresholds of sequence match

Aa-tRNA synthase

module of lepA

A tyr trp B met C arg trp D cys

E leu F met leu ile val G ile H lepA


Edward n trifonov university of haifa and masaryk university brno

Network of GTP binding proteins

←GTP-binding

module of lepA

Sequence fragments with the same function

are found in the same network


Edward n trifonov university of haifa and masaryk university brno

1mh1 Rac (GTP-binding)

(Homo sapiens)

226

QAIKCVVVGDGAVGKTCLLISYTTN

| || |

AGDVISIIGSSGSGKSTFLRCINFL

31 55

1b0ua ATP-binding subunit

of the histidine permease

{Salmonella typhimurium}


Edward n trifonov university of haifa and masaryk university brno

1Putative peptidoglycan bound protein

2Collagen adhesion protein

3 Ribosomal protein L11

4Penicillin-binding protein 2x

5Penicillin-binding protein 1

6Penicillin binding protein 2A

7D-alanyl-D-alanine carboxypeptidase

8 cytochrome

9Beta-Lactamase

10 Mannitol-1-phosphate 5-dehydrogenase

11 glutaminase

12Beta-lactamase

13 Esterase EstB

Fragments of thesame network

have, essentially, the same structure.

Periferal fragments may be different


Edward n trifonov university of haifa and masaryk university brno

New definition of sequence relatedness:

Sequence fragments of the same network

in the sequence space

are relatives

They may be rather different sequence-wise.

Yet, their functions and structures

are, essentially, the same


Edward n trifonov university of haifa and masaryk university brno

Every fragment is tagged (protein, species)

It is also uniquely located in it´s family network.

The size of the network says

how many relatives the fragment has

Thus, one can take a sequence

and for all fragments of it

find their networks and plot the sizes


Edward n trifonov university of haifa and masaryk university brno

This generates the maps of modules

from which the protein is built

for example:

Modules of histidine permease, ATP binding subunit

(ABC transporter)


Edward n trifonov university of haifa and masaryk university brno

ABC transporters

GPS (Aleph) LTA (Dalet) LSG, LAD (Beth)IYV (Zayin)

(36) GPSGSGKsTmL (38) fVFQqfnLiPlLTALENV (40) QLSGGQQQRVAIARAL(6)iLADEPTgALD (22) vvVTHDi (30) 1F3O

(32-72)GPSGSGKTTLL(29-41)MVFQNYALFPHLTALENV(31-42)QLSGGQQQRVAIARAL(6 LLADEPTSALD(21-22)IYVTHDQ(28-263) consensus





Edward n trifonov university of haifa and masaryk university brno

When long sequences are compared

it is worth first to identify

which segments are more informative.

This is done by

mapping of the modules.


Edward n trifonov university of haifa and masaryk university brno

Specific functions of individual module types

are largely not known yet.

Since, however, they represent wide-spread, conserved

and, thus, functional motifs,

their individual roles will have to be eventually elucidated.

One peculiar class of modules are so-called „silent modules“,

which have only few relatives in the sequence space, if at all.


Edward n trifonov university of haifa and masaryk university brno

silent module 1

silent module 2

silent module 3

D

D

D

A

A

A

A

A

A

D

D

D

1

3

2

Asilent modules1-3D

IVLLVGPSGSGKTTLLRALAGLLGPDGGRRGIGMVFQEYALFPHLTVLENVALGL

| ||||| | || | | | | |||| | | ||||||

VISIIGSSGSGKSTFLRCINFLEKPSEGSIVVNGQTINLVRDKDGQLKVADKNQLRLLRTRLTMVFQHFNLWSHMTVLENVMEAP1

| ||||| | || | || || | || | | | |||| | |||| |

FMILLGPSGCGKTTTLRMIAGLEEPSRG---QIYIGDRLVADPEKGIFVPPK------DRDIAMVFQSYALYPHMTVYDNIAFPL2

| ||||||| | |||||||| | | || | |||||||||||| | | | |

FVVFVGPSGCGKSTLLRMIAGLETITSG---------DLFIGEKRMNDTPPA------ERGVGMVFQSYALYPHLSVAENMSFGL3

The silent modules appear to maintain

3D structural relationships between functionall modules


Edward n trifonov university of haifa and masaryk university brno

The list of modules revealed in the map

for a given protein sequence,

with reference to corresponding

(characterized) networks

of the pre-calculated sequence space

provides full annotation of the protein


Edward n trifonov university of haifa and masaryk university brno

Protein sequence characterization via networks in the sequence space

does not require

gap penalties,

nor substitution matrices,

nor statistics of alignment.

Every sequence fragment of interest may belong

to one and only one network.


Edward n trifonov university of haifa and masaryk university brno

Descriptive definition of sequence spaceprotein modules:

Their sequences are represented by networks

in the protein sequence space -

separate network (or group of related networks) for each module.

Each module has its own unique structure.

Typically, these are closed loops of the contour length 25-30 residues.

Apart from general activity ascribed to the protein that harbors given module,

each module type has its own specific function.

Individual modules even of the same type are sequence-wise often different.

Their evolutionfrom ancestral prototypes

may be traced along walks and networks in the sequence space.


Edward n trifonov university of haifa and masaryk university brno

Examples of sequence space

evolutionarypaths


Edward n trifonov university of haifa and masaryk university brno

KV sequence spaceALVGRSGSGKTTVTSLLM

FIAVEGIDGAGKTTLAKSLS

GxxxxGKT - Walker A motif

(NTP binding)


Edward n trifonov university of haifa and masaryk university brno

MOST COMMON sequence space

PROTEIN SEQUENCE MODULES (PROTOTYPES)

Aleph GEIVLLVGPSGSGKTTLLRALAGLLGPDGG

Beth LSGGQRQRVAIARALALEPKLLLLDEPTSALD

Gimel DVVVIGAGGAGLAAALALARAGAKVVVVE

Dalet RRGIGMVFQEYALFPHLTVLENVALGL

Heh PVIMLTARGDEEDRVEALLEAGADDYLTKPF

Vav LLGLSKKEARERALELLELVGLEEKADRYP

Zayin LLLKLLKELGLTVLLVTHDLEEA

Berezovsky et al. 2000-2003

The underlined motifs are omnipresent


Edward n trifonov university of haifa and masaryk university brno

Omnipresent 6-9 mers of 15 prokaryotes sequence space from different phyla

ALEPH ATP/GTP binding

1HVDHGKTTL

2GPPGTGKT

3 GHVDHGKT

4 GSGKTTLL

5 IDTPGHV

6 GPSGSGK

7 PTGSGKT

8 NGSGKTT

9 GKSTLLN

10 SGSGKT

11 TGSGKS

12 PGVGKT

13 PNVGKS

14 GVGKTT

15 GTGKTT

16 DHGKST

17 GKTTLA

18 GKTTLV

19 KSTLLK

BETH ATPases of ABC

transporters

20 QRVAIARAL

21 LSGGQQQRV

22LADEPT

23 TLSGGE

Other omni:

24 FIDEID

25 KMSKSL

26WTTTPWT

27NADFDGD

Omnipresence is a new measure of sequence conservation.

These elements are the most conserved ones,

coming, presumably from last common ancestor


Edward n trifonov university of haifa and masaryk university brno

Many of the sequence space27 omnipresent elements

do not match to one another

(e. g. WTTTPWT and QRVAIARAL)

yet, they turn out to belong to the same network.


Edward n trifonov university of haifa and masaryk university brno

ALEPH and BETH sequence space

reconstructed

from overlapping omnipresent motifs

turn out to be relatives,

though they do not match:

IDTPGHVDHGKTTLLN ALEPH

|

TLSGGQQQRVAIARAL BETH

They both belong to 10% monster network.

All 27 omnipresent elements belong to the same network


Edward n trifonov university of haifa and masaryk university brno

10% MONSTER network (10 sequence space7 fragments)


Edward n trifonov university of haifa and masaryk university brno

Sequence space based sequence space

evolutionary tree of omnipresent elements


Edward n trifonov university of haifa and masaryk university brno

All 27 omnipresent LUCA motifs sequence space

originate from one prototype sequence, which is:

(now skipping separate two two-hour lectures)

Ala Ala Ala Ala Gly Ala Ala Gly Gly Ala GlyGly GlyGly

encoded in

GCC GCC GCC GCCGGCGCC GCC GGC GGC GCC GGC GGC GGC GGC

which is self-complementary:

GCC GCC GCC GCCGGCGCC GCC GGC GGC GCC GGC GGC GGC GGC

GCC GCC GCC GCCGGCGCC GCC GGC GGC GCC GGC GGC GGC GGC


Edward n trifonov university of haifa and masaryk university brno

The very first gene sequence space

was a short duplex,

encoding the same thing in both strands


Edward n trifonov university of haifa and masaryk university brno

TO CONCLUDE: sequence space

Proteins are made

from standard size modules

of many types.

Each type has its unique structure and function,

but highly variable sequence

All current protein science turns inside out:

Protein world is world of modules


Edward n trifonov university of haifa and masaryk university brno

Every breakthrough that opens new vistas sequence space

also removes the ground

from under the feet of other scientists.

The scientific joy of those who have seen the new light

is accompanied by the dismay

of those whose way of life has been changed for ever.

Fersht A, Nature Rev Mol Cell Biol, 2008


Edward n trifonov university of haifa and masaryk university brno

Major references: sequence space

Papers of

Igor N. Berezovsky,

Zakharia M. Frenkel,

Yehoshua Sobolevsky

and E.N.T.

2006-2009


Edward n trifonov university of haifa and masaryk university brno

THANKS TO sequence space

Networks - Zacharia M. Frenkel

University of Haifa

Omnipresent motifs - Yehoshua Sobolevsky

University Minas Gerais, Brazil

Modules – closed loops – Igor N. Berezovsky

University of Bergen, Norway

AND TO THE AUDIENCE

Support by:

Israeli Science Foundation,

Center of Complexity Science, and

Masaryk University, Brno


Edward n trifonov university of haifa and masaryk university brno

Changing gears: sequence space

Reconstruction of evolutionary history

of the triplet code (Trifonov 2000-2003)suggests that

the earliest protein sequences could be presented

in the binary alphabet of two types of amino acids –

those encoded by xYx triplets (Ala family, A) and

those encoded by xRx triplets (Gly family, G).


Edward n trifonov university of haifa and masaryk university brno

EVOLUTION OF THE TRIPLET CODE sequence space

E. N. Trifonov, December 2007, Chart 101

Consensus temporal order of amino acids:

UCX CUX CGX AGY UGX AGR UUY UAX

Gly AlaAsp Val SerPro Glu Leu Thr Arg Ser TRM Arg Ile Gln Leu TRM Asn Lys His Phe Cys Met Tyr Trp Sec Pyl

1 GGC-GCC . . . . . . . . . . . . . . . . . | . . . . . . . .

2 | | GAC-GUC . . . . . . . . . . . . . . . | . . . . . . . .

3 GGA--|---|---|--UCC . . . . . . . . . . . . . . | . . . . . . . .

4 GGG--|---|---|---|--CCC . . . . . . . . . . . . . | . . . . . . . .

5 | | (gag)-|---|---|--GAG-CUC . . . . . . . . . . . | . . . . . . . .

6 GGU--|---|---|---|---|---|---|--ACC . . . . . . . . . . | . . . . . . . .

7 . GCG--|---|---|---|---|---|---|--CGC . . . . . . . . . | . . . . . . . .

8 . GCU--|---|---|---|---|---|---|---|--AGC . . . . . . . . | . . . . . . . .

9 . GCA--|---|---|---|---|---|---|---|---|--ugc . . . . . . . | . . UGC . . . . .

10 . . | | | CCG--|---|---|--CGG | | . . . . . . . | . . | . . . . .

11 . . | | | CCU--|---|---|---|---|---|--AGG . . . . . . | . . | . . . . .

12 . . | | | CCA--|---|---|---|---|--ugg | . . . . . . | . . | . . UGG . .

13 . . | | UCG------|---|---|--CGA | | | . . . . . . | . . | . . . . .

14 . . | | UCU------|---|---|---|---|---|--AGA . . . . . . | . . | . . . . .

15 . . | | UCA------|---|---|---|---|--UGA . . . . . . . | . . | . . . UGA .

16 . . | | . . | | ACG-CGU | | . . . . . . . | . . | . . . . .

17 . . | | . . | | ACU-----AGU | . . . . . . . | . . | . . . . .

18 . . | | . . | | ACA---------ugu . . . . . . . | . . UGU . . . . .

19 . . GAU--|-----------|---|----------------------AUC . . . . . | . . . . . . . .

20 . . . GUG----------|---|-----------------------|--cac . . . . |CAC . . . . . . .

21 . . . | . . | CUG----------------------|--CAG . . . . | | . . . . . . .

22 . . . | . . | | . . . . . aug-cau . . . . |CAU . . AUG . . . .

23 . . . | . . GAA--|-----------------------|---|--uuc . . . | . UUC . . . . . .

24 . . . GUA--------------|-----------------------|---|---|--uac . . | . | . . UAC . . .

25 . . . | . . . CUA----------------------|---|---|--UAG . . | . | . . | . . UAG

26 . . . GUU--------------|-----------------------|---|---|---|--AAC . | . | . . | . . .

27 . . . . . . . CUU----------------------|---|---|---|---|--AAG| . | . . | . . .

28 . . . . . . . . . . . . . | CAA-UUG | | | | . | . . | . . .

29 . . . . . . . . . . . . . AUA------|--uau | | | . | . . UAU . . .

30 . . . . . . . . . . . . . AUU------|---|--AAU | | . | . . . . . .

31 . . . . . . . . . . . . . . . UUA-UAA | | . | . . . . . .

32 . . . . . . . . . . . . . . . uuu---------AAA| . UUU . . . . . .

CONSECUTIVE ASSIGNMENT OF 64 TRIPLETS CODON CAPTURE

aa "age":

17 17 16 16 15 14 13 13 12 11 10 9 8 7 6 5 4 3 2 1


Edward n trifonov university of haifa and masaryk university brno

Th sequence spaceeconclusion about two alphabets

is strongly supported by respective

rearrangementsof substitution matrices:

A F I L M P TV|CD E G H K N Q R W Y

A1 1 | 1 4

F |

I1 1 3|

AlaL1 3 1|

alphabet M1 3 1|

P1 |

T1 |

V3 1 1 |_____________________

C |

D | 3 2 1

E | 3 1 2

G 1 |

GlyH | 2 3 1

alphabet K | 1 2

N | 2 1 2 1

Q | 1 2 3 1

R | 1 2 1 1

W | 1 2

Y 4 | 2

Rearranged PAM120 substitution matrix

(original matrix in Altschul SF, JMB 219, 555, 1991)


Edward n trifonov university of haifa and masaryk university brno

A F I L M P T sequence spaceV|CD E G H K N Q R W Y

A |

F | 1 3

I2 1 3|

Ala L2 2 1|

alphabet M1 2 1|

P |

T |

V3 1 1|_____________________

C |

D | 2 1

E | 2 1 2

G |

GlyH | 1 2

alphabet K | 1 1 2

N | 1 1

Q | 2 1 1

R | 2 1

W 1 | 2

Y 3 | 2 2

Rearranged BLOSUM substitution matrix

(original matrix in Henikoff S, Henikoff JG, PNAS 89, 10915,1992)


Edward n trifonov university of haifa and masaryk university brno

Rewriting modern amino acid sequence in the binary form sequence space

would suggest

what was the ancestral form of that sequence,

all the way to original Alanines and Glycines only


Edward n trifonov university of haifa and masaryk university brno

In binary form ALEPH and BETH are sequence spacerather similar

AGAAGGAGGGGAAAAG

++-+-+++++++++

AASGGGGGGAAAAGAA

Compare to

IDTPGHVDHGKTTLLN

+

TLSGGQQQRVAIARAL


Edward n trifonov university of haifa and masaryk university brno

According to the same theory sequence space

(reconstruction of evolutionary history of the triplet code)

the earliest proteins have been encoded in both strands of the genes-duplexes,

so that the xYxcodons of one strand

would be complementary to xRx codons of another strand.

Remarkably, the above ALEPH and BETH are, indeed, complementary:

ALEPHAGAAGGAGGGGAAAAG

|||||||||||-

BETHAASGGGGGGAAAAGAA


Edward n trifonov university of haifa and masaryk university brno

Two most widespread modules ALEPH and BETH, apparently, sequence space

represent the earliest duplex genethat encoded

in the earliest pasttwo vitally important activities

involved in energy supply(ATP binding and ATP-ase).

Today the module ALEPH is located in a variety of enzymes

that require ATP, including the most ancient ones:

1. ABC transporters,

2. cell division proteins (proteases),

3. initiation and

4. elongation translation factors.

Other most ancient enzymes are

5. RNA polymerase and

6. Amino acyl tRNA synthetase


Proteases cell division proteins ftsh gpp aleph fve fid
Proteases (cell division proteins FtsH) sequence spaceGPP (Aleph) FVE FID

(197) LLVGPPGTGKTLLARAVAGEA(7)SGSDFVELFVGVGAARVRD(9)PCIVFIDEIDAVGR (10) 2CEA

(146-463)LLVGPPGTGKTLLARAVAGEA(7)SGSDFVEMFVGVGASRVRD(9)PCIIFIDEIDAVGR(7-11) consensus

DER RPG

DEREQTLNQLLVEMDGF(8)MAATNRPDILDPALLRPGRFDKK (297) 2CEA

DEREQTLNQLLVEMDGF(8)IAATNRPDxLDPALLRPGRFDRQ (95-415) consensus

- another example of the omnipresent cassette


Omnipresent cassette of rna polymerases fat nek nll

Omnipresent cassette of sequence spaceRNA polymerasesFAT NEK NLL

(529) VDGGRFATSDLNDLYRRLINRNNRLK (12) RNEKRMLQEAVDAL (27) GKQGRFRQNLLGKRVDYSGRSVIVVGP 2A6E

(224-518)LDGGRFATSDLNDLYRRVINRNNRLK (12) RNEKRMLQEAVDAL(25-27)GKQGRFRQNLLGKRVDYSGRSVIVVGP consensus

VLL NAD

(62) KVVLLNRAPTLHRLGIQAF (18) AFNADFDGDQMAVH (776) 2A6E

(59-84)HPVLLNRAPTLHRLGIQAF (18) AFNADFDGDQMAVH (131-961) consensus


Edward n trifonov university of haifa and masaryk university brno

60% match threshold networks: sequence space

320,000 proteins from 120 prokaryotes, ~100,000,000 fragments

The largest (monster) network 9,368,905 sequence fragments (~10% of all)

Next largest 2,535 fragments

Networks of sizes 120 to 2,535 fragments (several thousand, 3.8% of all fragments)

Small networks cover 86% of the space

35% of fragments are single, no relatives