Proteiinianalyysi 5
1 / 71

Proteiinianalyysi 5 - PowerPoint PPT Presentation

  • Uploaded on

Proteiinianalyysi 5. Rakenteen ennustaminen Funktion ennustaminen Sekvenssist ä rakenteeseen. komparatiivinen mallitus 1-ulotteinen tilan (luokan) ennustaminen sekvenssistä

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Proteiinianalyysi 5' - irma-valentine

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Proteiinianalyysi 5

Proteiinianalyysi 5

Rakenteen ennustaminen

Funktion ennustaminen

Sekvenssist rakenteeseen
Sekvenssistä rakenteeseen

  • komparatiivinen mallitus

  • 1-ulotteinen tilan (luokan) ennustaminen sekvenssistä

  • 3-ulotteisen rakenteen tunnistaminen annetusta kirjastosta (fold recognition)

  • 3-ulotteisen rakenteen ennustaminen ab initio


The folding problem
The “Folding Problem”

Two parts:

(1) The “Search Problem”

Is the true structure one of my 2 million guesses?

Fragment assembly

(2) The “Discrimination Problem”

If it’s one of these 2 million, which one is it?

Empirical pseudopotential


(1) A stone with three ancient languages on it.

(2) A program (David Baker) that simulates the folding of a protein, using statistical energies and moves.

Fold prediction rosetta method
Fold prediction – Rosetta method

  • Knowledge based scoring function

Bayes' law:

P(structure) * P(sequence|structure)

P(structure|sequence) =


P(sequence|structure) = f(residue contacts in native structures)

near-native structures


sequence consistentlocal structure

P(structure) = probability of a protein-like structure (no clashes, globular shape)

Simons et al. (1997)

Collection of putative backbone conformations
Collection of putative backbone conformations

Protein sequence

Library of small segments




For each window of 9 residues:

lookup 25 closest (sequence) neighbours in library



Simons et al. (1997)

Intermediates are not observed but
Intermediates are not observed, but

Folding is 2-state



Nucleation sites
Nucleation sites

something happens first...

Early folding events might be recorded in the database
Early folding events might be recorded in the database

Short, recurrent sequence patterns could be folding Initiation sites

recurrent part


Non-homologous proteins

Nature has selected for these patterns because they speed folding.

I sites motifs

Type-I hairpin

diverging type-2 turn

Serine hairpin

Frayed helix

alpha-alpha corner

glycine helix N-cap

Proline helix C-cap

I-sites motifs

Backbone angles: y=green, f=red

Amino acids arranged from non-polar to polar

Fragment insertion monte carlo


Fragment insertion Monte Carlo

backbone torsion angles

accept or reject




Choose fragment from moveset

change backbone angles

Convert angles to 3D coordinates

Backbone angles are restrained in i sites regions


Backbone angles are restrained in I-sites regions

regions of high-confidence I-sites prediction

backbone torsion angles


Fragments that deviate from the paradigm (>90° in f or y) are removed from the moveset.

Generally, about one-third of the sequence has an I-sites prediction with confidence > 0.75, and is restrained.

Sequence dependent features


Sequence dependent features

Sequence independent features


vector representation

Sequence-independent features

Probabilities from the database

Current structure

The energy score for a contact between secondary structures is summed using database statistics.

Mc sa optimization
MC-SA optimization

  • for each random position

    • pick a random neighbour

    • replace backbone conformation

    • calculate probability of new structure

  • MC: Monte-Carlo

    • accept up-hill moves with a certain probability that depends on temperature

  • SA: simulated annealing

    • Gradual cooling of temperature: first allow many changes, later fewer changes

Simons et al. (1997)


  • Small molecules: ok

  • Proteins with mostly α-helices: ok

  • Proteins with mostly β-sheets: not so ok

Simons et al. (1997)


What needs to be fixed?


8% of the residues in the targets have f > 0.

44% of these are at Glycine residues.

7% of the residues in the predictions have f > 0.

but only 16% of these are at Glycines.

Contact order

True structure: 0.252

Predictions: 0.119

Prediction algorithms have underlying principles
Prediction algorithms have underlying principles

Darwin = protein evolution.

Principle: Proteins that evolved from common ancestor have the same fold.

Boltzmann = protein folding

Principle: Proteins search conformational space, minimizing the free energy (empirical pseudo-potential)

Geenin funktion m ritt minen
Geenin funktion määrittäminen

  • fenotyyppi

  • biokemiallinen aktiivisuus (in vitro)

  • ilmentyminen

  • GO, Gene Ontology

    • molekulaarinen funktio

    • biologinen prosessi

    • solunsisäinen lokalisaatio

Homologia sama funktio
Homologia  sama funktio?

Paralogia: geenien kahdentumisen tulos

Vaihtoehtoinen silmukointi: yksi geeni, monta proteiinia

Pleiotropia: yksi geeni, monta funktiota

Redundanssi: yksi funktio, monta geeniä

Heteromeria: kompleksien muodostus

“Crosstalk”: signalointireitit vaikuttavat toisiinsa

Protein functional shifts are common
Protein functional shifts are common

  • COG0044

    • Dihydroorotase

      • CAD (fusion protein)

    • Dihydropyriminidase

    • D-hydantoinase

    • Allantoinase

    • Rudimentary protein (involved in developmental programs)

COG0044 functions

Urease superfamily functions

Fast evolution functional shift
Fast evolution ~ functional shift

rat lung isoform

rat liver isoform,

functional shift

CYP2 family (cytochrome P450)

Druggable genome
“Druggable genome”

  • Property filters

    • Likelihood of functional shift

    • Degree and nature of paralogy

    • Factors reflecting pleiotropy

      • Size

      • Breadth of expression

      • Interaction potential

      • Evolutionary rates

Funktion siirto
Funktion siirto

  • Nearest neighbour (lähin homologi)

    • esim. Blast-haku

    • Fylogeneettinen lähin naapuri

  • Post-genomiset menetelmät

    • riippumattomia homologiasta

    • Proteiini-proteiini-interaktioiden vertailu

      • Guilt By Association

      • Hahmontunnistus

Funktion siirto1
Funktion siirto

  • Hypoteettinen sekvenssi  funktio?

    • Karakterisoitu homologi

      • Blast / PSI-Blast

    • Fylogenia!

      • evoluutionopeus riippuu perheestä

      • monen sekvenssin linjaus

  • Virheelliset funktion määritykset kertautuvat tietokannoissa!

    • Väärä funktio

      • liittyy domeeniin, jota ei esiinny hakusekvenssissä

    • Väärä homologiapäätelmä

    • Liian yksityiskohtainen funktion kuvaus

      • funktion muuttuminen evoluutiossa

      • biokemiallinen vs. fysiologinen funktio

        • esim. eukaryoottispesifiset funktiot eivät voi esiintyä bakteerissa

    • Sekvenssilinjaus

      • funktionaalisten aminohappojen säilyminen

      • esimerkki:

        • atratsiiniklorohydrolaasi vs. melamiinideaminaasi: 4 mutaatiota (98 % identtisyys)

    • Esim. GO liputtaa funktion määrityksen lähteen

Guilt by association
Guilt by association

  • Prediction of subcellular localization based on classification of neighbours

Non homology protein identification using network context

Query pattern


Non-homology protein identificationusing network context

Ref: Lappe M, Park J, Niggemann O, Holm L (2001) Bioinformatics Suppl 1, S149-S156

Natural selection
Natural selection

  • Functional coupling leads to correlations

    • E.g. co-occurrence of sets of genes in species

  • Residues required for molecular function

    • Functional conservation above general sequence divergence of a family

Pancreatic trypsin inhibitor 2ptc
Pancreatic trypsin inhibitor (2ptc)


  • Evolutionary Trace

    • Lichtarge et al. 1996

  • Sequence Space

    • Casari et al. 1995

  • Ortholog / paralog discriminants

    • Mirny & Gelfand 2003

Evolutionary trace
Evolutionary Trace

  • The branchpoints separating subclades of a phylogenetic tree can specify molecular speciation events, and hence evolutionary selection of amino acids

  • Map trace residues to 3D structures

Evaluation of evolutionary trace
Evaluation of Evolutionary Trace

  • Trace residues determined at many ranks

    • Trace residue sets are nested

  • Test of significance of trace residue at any rank

    • Overlap with otherwise defined functional sites

      • Bound ligands in 3D structures (~20 residues)

      • Annotated sites (~4 residues)

Et assessment
ET assessment

  • Detects 3D clusters

  • Manual filtering and pruning of the data

    • Decide which subclades of the protein family to use in analysis

    • Exclude fragments

    • Original method was based on strict invariance within subclade

  • Automatic implementations

    • But manually optimized traces score higher

Sequence space
Sequence Space

  • Aligned protein sequences represented as vectors in a high-dimensional space

    • Each amino acid type at each column of the MSA is a unique point in Sequence Space

  • Dimension reduction by Principal Components Analysis

  • Cluster proteins

    • Based on their sequence identity

  • Map residues in the same space

    • Direction points to association with protein group

Pca projection of the 3d object
PCA projection of the 3D object

New axes are

linear combination

of original axes


1st axis represents the whole family

2nd, 3rd , …, 6th axes represent subclassifications

Subfamily-specific residues are found at the

tips of a polygon

Common residues shared by several subfamilies

are found along the edges of a polygon

Many unspecific residues at origin

Ortologit ja paralogit
Ortologit ja paralogit

Malliorganismien käyttö: identtinen fysiologia?


  • Functional groupings of proteins

    • Phylogenetic lineage

      • Orthologs / paralogs

    • Clustering by general sequence similarity

  • Residues associated with above groupings

    • Intra-group conservation

    • Inter-group variation

    • Neutral residues behave randomly

Function interactions
Function = interactions

  • Protein-protein interactions

  • Co-evolution of interacting proteins

  • Comparative genomics

Experimental methods
Experimental methods

  • Y2H = yeast-two-hybrid

    • Ex vivo, binary interactions

    • Interaction must occur in the nucleus

    • Autoactivation (5-10 % of random ORFs)

    • Posttranslational modifications

  • AP/MS = affinity purification / Mass Spectrometry

    • Purified complexes

  • PChips = protein microarrays

    • In vitro

    • Covalent attachment to solid support

    • Screening with fluorescently labelled probes (e.g. proteins or lipids)

Newscientist 13 april 2002 david cohen about the work by barabasi albert et al
NewScientist, 13. April 2002, David Cohen about the work by Barabasi, Albert et al.

Interaktioiden ennustaminen
Interaktioiden ennustaminen

  • ko-evoluutio

  • genomien vertailu

    • geenien järjestys kromosomissa

    • fylogeneettiset profiilit

    • geenifuusio

Ko evoluutio

  • monen sekvenssin linjaus, etsi korreloivat mutaatiot

  • proteiinit, joilla on paljon interaktioita, muuttuvat hitaammin

  • kaksi fylogeniapuuta, etsi parit

Comparative genomics
Comparative genomics

  • Correlated genomic context between orthologous genes reveal functional couplings

    • Conserved gene order (conserved synteny)

    • Coupled gene loss / preservation (phylogenetic profiles)

    • Gene fusion events

Conserved synteny
Conserved synteny

  • Chromosomal rearrangements randomize gene order over the course of evolution

  • Groups of genes that have a similar biological function tend to remain localized in a group or cluster

  • Bacterial operons allow coordinated regulation of gene expression from a common promoter

  • Eukaryotic clusters observed, too

Phylogenetic profiling
Phylogenetic profiling




p1 p2 p3

p5 p6 p8


H. influenzae

ye hi ec

P7 0 0 1

P4 0 1 1

P6 1 0 0

P8 1 0 0

P2 1 0 1

P3 1 0 1

P1 1 1 0

P5 1 1 1

ye hi ec

P1 1 1 0

P2 1 0 1

P3 1 0 1

P4 0 1 1

P5 1 1 1

P6 1 0 0

P7 0 0 1

P8 1 0 0

p2 p3 p4

p5 p7

E. coli

Observations phyloprofiles
Observations - phyloprofiles

  • Bit-vectors sensitive to noise in gene status assignment

  • Specific patterns generated mainly from bacterial gene loss / horizontal transfer

  • Eukaryotic species have larger genomes and large numbers of eukaryote-specific protein families

Gene fusion
Gene fusion

Domain swapping

Some details
Some details

  • 6,809 interactions predicted for E. coli based on gene fusions

    • 321 (~5 %) overlap with predictios by phylogenetic profile method

    • Eight times more than random

  • Promiscuous modules (SH2, SH3, etc.)

    • 5 % of domains made more than 25 links to other proteins

    • Fusions counted within remaining set of 95 %

Observations gene fusion
Observations – gene fusion

  • Marcotte et al. (Science 285:751-753, 1999) predicted novel interactions for 50 % of yeast proteins using gene fusion information in any homologous proteins

  • Enright et al. (Nature 402:86-90, 1999) considered orthologs with higher signal-to-noise ratio but only 7 % coverage

Integrated predictions
Integrated predictions

  • Predictions by conserved synteny, phylogenetic profiles and gene fusion are largely additive

    • small overlap

  • Combined score

    • Calibrated against same / different KEGG map

  • STRING server

    • Predictions for about 50 % of genes from complete genomes


Functional association maps
Functional association maps

  • Noisy

  • Different types of interaction

    • Physical interaction (complex formation)

    • Transient interactions

      • Dependent on post-translational modification state, e.g. phosphorylation

    • Functional linkage

      • Successive steps of a metabolic pathway

      • Involvement in related biological processes


  • Tentti 28.4.

  • Uusinta 3.5. yleinen tenttipäivä

  • Tenttiin tulee

    • Päättelytehtäviä

    • Esseekysymyksiä