Proteiinianalyysi 5
This presentation is the property of its rightful owner.
Sponsored Links
1 / 71

Proteiinianalyysi 5 PowerPoint PPT Presentation


  • 55 Views
  • Uploaded on
  • Presentation posted in: General

Proteiinianalyysi 5. Rakenteen ennustaminen Funktion ennustaminen http://www.bioinfo.biocenter.helsinki.fi/downloads/teaching/spring2005/proteiinianalyysi/. Sekvenssist ä rakenteeseen. komparatiivinen mallitus 1-ulotteinen tilan (luokan) ennustaminen sekvenssistä

Download Presentation

Proteiinianalyysi 5

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Proteiinianalyysi 5

Proteiinianalyysi 5

Rakenteen ennustaminen

Funktion ennustaminen

http://www.bioinfo.biocenter.helsinki.fi/downloads/teaching/spring2005/proteiinianalyysi/


Sekvenssist rakenteeseen

Sekvenssistä rakenteeseen

  • komparatiivinen mallitus

  • 1-ulotteinen tilan (luokan) ennustaminen sekvenssistä

  • 3-ulotteisen rakenteen tunnistaminen annetusta kirjastosta (fold recognition)

  • 3-ulotteisen rakenteen ennustaminen ab initio

    • ROSETTA


The folding problem

The “Folding Problem”

Two parts:

(1) The “Search Problem”

Is the true structure one of my 2 million guesses?

Fragment assembly

(2) The “Discrimination Problem”

If it’s one of these 2 million, which one is it?

Empirical pseudopotential


Rosetta

Rosetta

(1) A stone with three ancient languages on it.

(2) A program (David Baker) that simulates the folding of a protein, using statistical energies and moves.


Fold prediction rosetta method

Fold prediction – Rosetta method

  • Knowledge based scoring function

Bayes' law:

P(structure) * P(sequence|structure)

P(structure|sequence) =

P(sequence)

P(sequence|structure) = f(residue contacts in native structures)

near-native structures

protein-likestructures

sequence consistentlocal structure

P(structure) = probability of a protein-like structure(no clashes, globular shape)

Simons et al. (1997)


Collection of putative backbone conformations

Collection of putative backbone conformations

Protein sequence

Library of small segments

...

...

...

For each window of 9 residues:

lookup 25 closest (sequence) neighbours in library

sequences

structures

Simons et al. (1997)


Intermediates are not observed but

Intermediates are not observed, but

Folding is 2-state

Unfolded

Folded


Nucleation sites

Nucleation sites

something happens first...


Early folding events might be recorded in the database

Early folding events might be recorded in the database

Short, recurrent sequence patterns could be folding Initiation sites

recurrent part

HDFPIEGGDSPMQTIFFWSNANAKLSHGY CPYDNIWMQTIFFNQSAAVYSVLHLIFLT IDMNPQGSIEMQTIFFGYAESAELSPVVNFLEEMQTIFFISGFTQTANSD INWGSMQTIFFEEWQLMNVMDKIPSIFNESKKKGIAMQTIFFILSGR PPPMQTIFFVIVNYNESKHALWCSVD PWMWNLMQTIFFISQQVIEIPSMQTIFFVFSHDEQMKLKGLKGA

Non-homologous proteins

Nature has selected for these patterns because they speed folding.


I sites motifs

Type-I hairpin

diverging type-2 turn

Serine hairpin

Frayed helix

alpha-alpha corner

glycine helix N-cap

Proline helix C-cap

I-sites motifs

Backbone angles: y=green, f=red

Amino acids arranged from non-polar to polar


Fragment insertion monte carlo

Rosetta

Fragment insertion Monte Carlo

backbone torsion angles

accept or reject

moveset

Energy

function

Choose fragment from moveset

change backbone angles

Convert angles to 3D coordinates


Backbone angles are restrained in i sites regions

Rosetta

Backbone angles are restrained in I-sites regions

regions of high-confidence I-sites prediction

backbone torsion angles

moveset

Fragments that deviate from the paradigm (>90° in f or y) are removed from the moveset.

Generally, about one-third of the sequence has an I-sites prediction with confidence > 0.75, and is restrained.


Sequence dependent features

Rosetta

Sequence dependent features


Sequence independent features

Rosetta

vector representation

Sequence-independent features

Probabilities from the database

Current structure

The energy score for a contact between secondary structures is summed using database statistics.


Mc sa optimization

MC-SA optimization

  • for each random position

    • pick a random neighbour

    • replace backbone conformation

    • calculate probability of new structure

  • MC: Monte-Carlo

    • accept up-hill moves with a certain probability that depends on temperature

  • SA: simulated annealing

    • Gradual cooling of temperature: first allow many changes, later fewer changes

Simons et al. (1997)


Results

Results

  • Small molecules: ok

  • Proteins with mostly α-helices: ok

  • Proteins with mostly β-sheets: not so ok

Simons et al. (1997)


Proteiinianalyysi 5

Rosetta

What needs to be fixed?

Turns

8% of the residues in the targets have f > 0.

44% of these are at Glycine residues.

7% of the residues in the predictions have f > 0.

but only 16% of these are at Glycines.

Contact order

True structure: 0.252

Predictions: 0.119


Prediction algorithms have underlying principles

Prediction algorithms have underlying principles

Darwin = protein evolution.

Principle: Proteins that evolved from common ancestor have the same fold.

Boltzmann = protein folding

Principle: Proteins search conformational space, minimizing the free energy (empirical pseudo-potential)


Geenin funktion m ritt minen

Geenin funktion määrittäminen

  • fenotyyppi

  • biokemiallinen aktiivisuus (in vitro)

  • ilmentyminen

  • GO, Gene Ontology

    • molekulaarinen funktio

    • biologinen prosessi

    • solunsisäinen lokalisaatio


Homologia sama funktio

Homologia  sama funktio?

Paralogia: geenien kahdentumisen tulos

Vaihtoehtoinen silmukointi: yksi geeni, monta proteiinia

Pleiotropia: yksi geeni, monta funktiota

Redundanssi: yksi funktio, monta geeniä

Heteromeria: kompleksien muodostus

“Crosstalk”: signalointireitit vaikuttavat toisiinsa


Protein functional shifts are common

Protein functional shifts are common

  • COG0044

    • Dihydroorotase

      • CAD (fusion protein)

    • Dihydropyriminidase

    • D-hydantoinase

    • Allantoinase

    • Rudimentary protein (involved in developmental programs)


Proteiinianalyysi 5

COG0044 functions

Urease superfamily functions


Fast evolution functional shift

Fast evolution ~ functional shift

rat lung isoform

rat liver isoform,

functional shift

CYP2 family (cytochrome P450)


Druggable genome

“Druggable genome”

  • Property filters

    • Likelihood of functional shift

    • Degree and nature of paralogy

    • Factors reflecting pleiotropy

      • Size

      • Breadth of expression

      • Interaction potential

      • Evolutionary rates


Funktion siirto

Funktion siirto

  • Nearest neighbour (lähin homologi)

    • esim. Blast-haku

    • Fylogeneettinen lähin naapuri

  • Post-genomiset menetelmät

    • riippumattomia homologiasta

    • Proteiini-proteiini-interaktioiden vertailu

      • Guilt By Association

      • Hahmontunnistus


Funktion siirto1

Funktion siirto

  • Hypoteettinen sekvenssi  funktio?

    • Karakterisoitu homologi

      • Blast / PSI-Blast

    • Fylogenia!

      • evoluutionopeus riippuu perheestä

      • monen sekvenssin linjaus

  • Virheelliset funktion määritykset kertautuvat tietokannoissa!

    • Väärä funktio

      • liittyy domeeniin, jota ei esiinny hakusekvenssissä

    • Väärä homologiapäätelmä

    • Liian yksityiskohtainen funktion kuvaus

      • funktion muuttuminen evoluutiossa

      • biokemiallinen vs. fysiologinen funktio

        • esim. eukaryoottispesifiset funktiot eivät voi esiintyä bakteerissa

    • Sekvenssilinjaus

      • funktionaalisten aminohappojen säilyminen

      • esimerkki:

        • atratsiiniklorohydrolaasi vs. melamiinideaminaasi: 4 mutaatiota (98 % identtisyys)

    • Esim. GO liputtaa funktion määrityksen lähteen


Guilt by association

Guilt by association

  • Prediction of subcellular localization based on classification of neighbours


Non homology protein identification using network context

Query pattern

Interactome

Non-homology protein identificationusing network context

Ref: Lappe M, Park J, Niggemann O, Holm L (2001) Bioinformatics Suppl 1, S149-S156


Natural selection

Natural selection

  • Functional coupling leads to correlations

    • E.g. co-occurrence of sets of genes in species

  • Residues required for molecular function

    • Functional conservation above general sequence divergence of a family


Pancreatic trypsin inhibitor 2ptc

Pancreatic trypsin inhibitor (2ptc)


Approaches

Approaches

  • Evolutionary Trace

    • Lichtarge et al. 1996

  • Sequence Space

    • Casari et al. 1995

  • Ortholog / paralog discriminants

    • Mirny & Gelfand 2003


Evolutionary trace

Evolutionary Trace

  • The branchpoints separating subclades of a phylogenetic tree can specify molecular speciation events, and hence evolutionary selection of amino acids

  • Map trace residues to 3D structures


Evaluation of evolutionary trace

Evaluation of Evolutionary Trace

  • Trace residues determined at many ranks

    • Trace residue sets are nested

  • Test of significance of trace residue at any rank

    • Overlap with otherwise defined functional sites

      • Bound ligands in 3D structures (~20 residues)

      • Annotated sites (~4 residues)


Et assessment

ET assessment

  • Detects 3D clusters

  • Manual filtering and pruning of the data

    • Decide which subclades of the protein family to use in analysis

    • Exclude fragments

    • Original method was based on strict invariance within subclade

  • Automatic implementations

    • But manually optimized traces score higher


Sequence space

Sequence Space

  • Aligned protein sequences represented as vectors in a high-dimensional space

    • Each amino acid type at each column of the MSA is a unique point in Sequence Space

  • Dimension reduction by Principal Components Analysis

  • Cluster proteins

    • Based on their sequence identity

  • Map residues in the same space

    • Direction points to association with protein group


A 3d object

A 3D object


Pca projection of the 3d object

PCA projection of the 3D object

New axes are

linear combination

of original axes


Coding of amino acids

Coding of amino acids


Sequence vector representation

Sequence vector representation


Interpretation

Interpretation

1st axis represents the whole family

2nd, 3rd , …, 6th axes represent subclassifications

Subfamily-specific residues are found at the

tips of a polygon

Common residues shared by several subfamilies

are found along the edges of a polygon

Many unspecific residues at origin


Protein clustering

Protein clustering


Residue clustering

Residue clustering


Selection of residues proteins

Selection of residues & proteins


Ortologit ja paralogit

Ortologit ja paralogit

Malliorganismien käyttö: identtinen fysiologia?


Summary

Summary

  • Functional groupings of proteins

    • Phylogenetic lineage

      • Orthologs / paralogs

    • Clustering by general sequence similarity

  • Residues associated with above groupings

    • Intra-group conservation

    • Inter-group variation

    • Neutral residues behave randomly


Function interactions

Function = interactions

  • Protein-protein interactions

  • Co-evolution of interacting proteins

  • Comparative genomics


Experimental methods

Experimental methods

  • Y2H = yeast-two-hybrid

    • Ex vivo, binary interactions

    • Interaction must occur in the nucleus

    • Autoactivation (5-10 % of random ORFs)

    • Posttranslational modifications

  • AP/MS = affinity purification / Mass Spectrometry

    • Purified complexes

  • PChips = protein microarrays

    • In vitro

    • Covalent attachment to solid support

    • Screening with fluorescently labelled probes (e.g. proteins or lipids)


Small part of an interaction network

Small part of an interaction network


Newscientist 13 april 2002 david cohen about the work by barabasi albert et al

NewScientist, 13. April 2002, David Cohen about the work by Barabasi, Albert et al.


Interaktioiden ennustaminen

Interaktioiden ennustaminen

  • ko-evoluutio

  • genomien vertailu

    • geenien järjestys kromosomissa

    • fylogeneettiset profiilit

    • geenifuusio


Ko evoluutio

Ko-evoluutio

  • monen sekvenssin linjaus, etsi korreloivat mutaatiot

  • proteiinit, joilla on paljon interaktioita, muuttuvat hitaammin

  • kaksi fylogeniapuuta, etsi parit


Comparative genomics

Comparative genomics

  • Correlated genomic context between orthologous genes reveal functional couplings

    • Conserved gene order (conserved synteny)

    • Coupled gene loss / preservation (phylogenetic profiles)

    • Gene fusion events


Conserved synteny

Conserved synteny

  • Chromosomal rearrangements randomize gene order over the course of evolution

  • Groups of genes that have a similar biological function tend to remain localized in a group or cluster

  • Bacterial operons allow coordinated regulation of gene expression from a common promoter

  • Eukaryotic clusters observed, too


Phylogenetic profiling

Phylogenetic profiling

p1

p4

p5

p1 p2 p3

p5 p6 p8

yeast

H. influenzae

ye hi ec

P7 0 0 1

P4 0 1 1

P6 1 0 0

P8 1 0 0

P2 1 0 1

P3 1 0 1

P1 1 1 0

P5 1 1 1

ye hi ec

P1 1 1 0

P2 1 0 1

P3 1 0 1

P4 0 1 1

P5 1 1 1

P6 1 0 0

P7 0 0 1

P8 1 0 0

p2 p3 p4

p5 p7

E. coli


Observations phyloprofiles

Observations - phyloprofiles

  • Bit-vectors sensitive to noise in gene status assignment

  • Specific patterns generated mainly from bacterial gene loss / horizontal transfer

  • Eukaryotic species have larger genomes and large numbers of eukaryote-specific protein families


Gene fusion

Gene fusion

Domain swapping


Some details

Some details

  • 6,809 interactions predicted for E. coli based on gene fusions

    • 321 (~5 %) overlap with predictios by phylogenetic profile method

    • Eight times more than random

  • Promiscuous modules (SH2, SH3, etc.)

    • 5 % of domains made more than 25 links to other proteins

    • Fusions counted within remaining set of 95 %


Observations gene fusion

Observations – gene fusion

  • Marcotte et al. (Science 285:751-753, 1999) predicted novel interactions for 50 % of yeast proteins using gene fusion information in any homologous proteins

  • Enright et al. (Nature 402:86-90, 1999) considered orthologs with higher signal-to-noise ratio but only 7 % coverage


Integrated predictions

Integrated predictions

  • Predictions by conserved synteny, phylogenetic profiles and gene fusion are largely additive

    • small overlap

  • Combined score

    • Calibrated against same / different KEGG map

  • STRING server

    • Predictions for about 50 % of genes from complete genomes

    • http://www.bork.embl-heidelberg.de/STRING/


Functional association maps

Functional association maps

  • Noisy

  • Different types of interaction

    • Physical interaction (complex formation)

    • Transient interactions

      • Dependent on post-translational modification state, e.g. phosphorylation

    • Functional linkage

      • Successive steps of a metabolic pathway

      • Involvement in related biological processes


Tentti

Tentti

  • Tentti 28.4.

  • Uusinta 3.5. yleinen tenttipäivä

  • Tenttiin tulee

    • Päättelytehtäviä

    • Esseekysymyksiä


  • Login