1 / 71

# Proteiinianalyysi 5 - PowerPoint PPT Presentation

Proteiinianalyysi 5. Rakenteen ennustaminen Funktion ennustaminen http://www.bioinfo.biocenter.helsinki.fi/downloads/teaching/spring2005/proteiinianalyysi/. Sekvenssist ä rakenteeseen. komparatiivinen mallitus 1-ulotteinen tilan (luokan) ennustaminen sekvenssistä

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Proteiinianalyysi 5' - irma-valentine

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Proteiinianalyysi 5

Rakenteen ennustaminen

Funktion ennustaminen

Sekvenssistä rakenteeseen

• komparatiivinen mallitus

• 1-ulotteinen tilan (luokan) ennustaminen sekvenssistä

• 3-ulotteisen rakenteen tunnistaminen annetusta kirjastosta (fold recognition)

• 3-ulotteisen rakenteen ennustaminen ab initio

• ROSETTA

Two parts:

(1) The “Search Problem”

Is the true structure one of my 2 million guesses?

Fragment assembly

(2) The “Discrimination Problem”

If it’s one of these 2 million, which one is it?

Empirical pseudopotential

(1) A stone with three ancient languages on it.

(2) A program (David Baker) that simulates the folding of a protein, using statistical energies and moves.

• Knowledge based scoring function

Bayes' law:

P(structure) * P(sequence|structure)

P(structure|sequence) =

P(sequence)

P(sequence|structure) = f(residue contacts in native structures)

near-native structures

protein-likestructures

sequence consistentlocal structure

P(structure) = probability of a protein-like structure (no clashes, globular shape)

Simons et al. (1997)

Protein sequence

Library of small segments

...

...

...

For each window of 9 residues:

lookup 25 closest (sequence) neighbours in library

sequences

structures

Simons et al. (1997)

Folding is 2-state

Unfolded

Folded

something happens first...

Short, recurrent sequence patterns could be folding Initiation sites

recurrent part

HDFPIEGGDSPMQTIFFWSNANAKLSHGY CPYDNIWMQTIFFNQSAAVYSVLHLIFLT IDMNPQGSIEMQTIFFGYAESAELSPVVNFLEEMQTIFFISGFTQTANSD INWGSMQTIFFEEWQLMNVMDKIPSIFNESKKKGIAMQTIFFILSGR PPPMQTIFFVIVNYNESKHALWCSVD PWMWNLMQTIFFISQQVIEIPSMQTIFFVFSHDEQMKLKGLKGA

Non-homologous proteins

Nature has selected for these patterns because they speed folding.

diverging type-2 turn

Serine hairpin

Frayed helix

alpha-alpha corner

glycine helix N-cap

Proline helix C-cap

I-sites motifs

Backbone angles: y=green, f=red

Amino acids arranged from non-polar to polar

Fragment insertion Monte Carlo

backbone torsion angles

accept or reject

moveset

Energy

function

Choose fragment from moveset

change backbone angles

Convert angles to 3D coordinates

Backbone angles are restrained in I-sites regions

regions of high-confidence I-sites prediction

backbone torsion angles

moveset

Fragments that deviate from the paradigm (>90° in f or y) are removed from the moveset.

Generally, about one-third of the sequence has an I-sites prediction with confidence > 0.75, and is restrained.

Sequence dependent features

vector representation

Sequence-independent features

Probabilities from the database

Current structure

The energy score for a contact between secondary structures is summed using database statistics.

• for each random position

• pick a random neighbour

• replace backbone conformation

• calculate probability of new structure

• MC: Monte-Carlo

• accept up-hill moves with a certain probability that depends on temperature

• SA: simulated annealing

• Gradual cooling of temperature: first allow many changes, later fewer changes

Simons et al. (1997)

• Small molecules: ok

• Proteins with mostly α-helices: ok

• Proteins with mostly β-sheets: not so ok

Simons et al. (1997)

What needs to be fixed?

Turns

8% of the residues in the targets have f > 0.

44% of these are at Glycine residues.

7% of the residues in the predictions have f > 0.

but only 16% of these are at Glycines.

Contact order

True structure: 0.252

Predictions: 0.119

Prediction algorithms have underlying principles

Darwin = protein evolution.

Principle: Proteins that evolved from common ancestor have the same fold.

Boltzmann = protein folding

Principle: Proteins search conformational space, minimizing the free energy (empirical pseudo-potential)

Geenin funktion määrittäminen

• fenotyyppi

• biokemiallinen aktiivisuus (in vitro)

• ilmentyminen

• GO, Gene Ontology

• molekulaarinen funktio

• solunsisäinen lokalisaatio

Homologia  sama funktio?

Paralogia: geenien kahdentumisen tulos

Vaihtoehtoinen silmukointi: yksi geeni, monta proteiinia

Pleiotropia: yksi geeni, monta funktiota

Redundanssi: yksi funktio, monta geeniä

Heteromeria: kompleksien muodostus

“Crosstalk”: signalointireitit vaikuttavat toisiinsa

• COG0044

• Dihydroorotase

• Dihydropyriminidase

• D-hydantoinase

• Allantoinase

• Rudimentary protein (involved in developmental programs)

Urease superfamily functions

Fast evolution ~ functional shift

rat lung isoform

rat liver isoform,

functional shift

CYP2 family (cytochrome P450)

• Property filters

• Likelihood of functional shift

• Degree and nature of paralogy

• Factors reflecting pleiotropy

• Size

• Interaction potential

• Evolutionary rates

• Nearest neighbour (lähin homologi)

• esim. Blast-haku

• Fylogeneettinen lähin naapuri

• Post-genomiset menetelmät

• riippumattomia homologiasta

• Proteiini-proteiini-interaktioiden vertailu

• Guilt By Association

• Hahmontunnistus

• Hypoteettinen sekvenssi  funktio?

• Karakterisoitu homologi

• Blast / PSI-Blast

• Fylogenia!

• evoluutionopeus riippuu perheestä

• monen sekvenssin linjaus

• Virheelliset funktion määritykset kertautuvat tietokannoissa!

• Väärä funktio

• liittyy domeeniin, jota ei esiinny hakusekvenssissä

• Väärä homologiapäätelmä

• Liian yksityiskohtainen funktion kuvaus

• funktion muuttuminen evoluutiossa

• esim. eukaryoottispesifiset funktiot eivät voi esiintyä bakteerissa

• Sekvenssilinjaus

• funktionaalisten aminohappojen säilyminen

• esimerkki:

• atratsiiniklorohydrolaasi vs. melamiinideaminaasi: 4 mutaatiota (98 % identtisyys)

• Esim. GO liputtaa funktion määrityksen lähteen

• Prediction of subcellular localization based on classification of neighbours

Interactome

Non-homology protein identificationusing network context

Ref: Lappe M, Park J, Niggemann O, Holm L (2001) Bioinformatics Suppl 1, S149-S156

• Functional coupling leads to correlations

• E.g. co-occurrence of sets of genes in species

• Residues required for molecular function

• Functional conservation above general sequence divergence of a family

Pancreatic trypsin inhibitor (2ptc)

• Evolutionary Trace

• Lichtarge et al. 1996

• Sequence Space

• Casari et al. 1995

• Ortholog / paralog discriminants

• Mirny & Gelfand 2003

• The branchpoints separating subclades of a phylogenetic tree can specify molecular speciation events, and hence evolutionary selection of amino acids

• Map trace residues to 3D structures

• Trace residues determined at many ranks

• Trace residue sets are nested

• Test of significance of trace residue at any rank

• Overlap with otherwise defined functional sites

• Bound ligands in 3D structures (~20 residues)

• Annotated sites (~4 residues)

• Detects 3D clusters

• Manual filtering and pruning of the data

• Decide which subclades of the protein family to use in analysis

• Exclude fragments

• Original method was based on strict invariance within subclade

• Automatic implementations

• But manually optimized traces score higher

• Aligned protein sequences represented as vectors in a high-dimensional space

• Each amino acid type at each column of the MSA is a unique point in Sequence Space

• Dimension reduction by Principal Components Analysis

• Cluster proteins

• Based on their sequence identity

• Map residues in the same space

• Direction points to association with protein group

New axes are

linear combination

of original axes

1st axis represents the whole family

2nd, 3rd , …, 6th axes represent subclassifications

Subfamily-specific residues are found at the

tips of a polygon

Common residues shared by several subfamilies

are found along the edges of a polygon

Many unspecific residues at origin

Malliorganismien käyttö: identtinen fysiologia?

• Functional groupings of proteins

• Phylogenetic lineage

• Orthologs / paralogs

• Clustering by general sequence similarity

• Residues associated with above groupings

• Intra-group conservation

• Inter-group variation

• Neutral residues behave randomly

• Protein-protein interactions

• Co-evolution of interacting proteins

• Comparative genomics

• Y2H = yeast-two-hybrid

• Ex vivo, binary interactions

• Interaction must occur in the nucleus

• Autoactivation (5-10 % of random ORFs)

• Posttranslational modifications

• AP/MS = affinity purification / Mass Spectrometry

• Purified complexes

• PChips = protein microarrays

• In vitro

• Covalent attachment to solid support

• Screening with fluorescently labelled probes (e.g. proteins or lipids)

NewScientist, 13. April 2002, David Cohen about the work by Barabasi, Albert et al.

• ko-evoluutio

• genomien vertailu

• geenien järjestys kromosomissa

• fylogeneettiset profiilit

• geenifuusio

• monen sekvenssin linjaus, etsi korreloivat mutaatiot

• proteiinit, joilla on paljon interaktioita, muuttuvat hitaammin

• kaksi fylogeniapuuta, etsi parit

• Correlated genomic context between orthologous genes reveal functional couplings

• Conserved gene order (conserved synteny)

• Coupled gene loss / preservation (phylogenetic profiles)

• Gene fusion events

• Chromosomal rearrangements randomize gene order over the course of evolution

• Groups of genes that have a similar biological function tend to remain localized in a group or cluster

• Bacterial operons allow coordinated regulation of gene expression from a common promoter

• Eukaryotic clusters observed, too

p1

p4

p5

p1 p2 p3

p5 p6 p8

yeast

H. influenzae

ye hi ec

P7 0 0 1

P4 0 1 1

P6 1 0 0

P8 1 0 0

P2 1 0 1

P3 1 0 1

P1 1 1 0

P5 1 1 1

ye hi ec

P1 1 1 0

P2 1 0 1

P3 1 0 1

P4 0 1 1

P5 1 1 1

P6 1 0 0

P7 0 0 1

P8 1 0 0

p2 p3 p4

p5 p7

E. coli

• Bit-vectors sensitive to noise in gene status assignment

• Specific patterns generated mainly from bacterial gene loss / horizontal transfer

• Eukaryotic species have larger genomes and large numbers of eukaryote-specific protein families

Domain swapping

• 6,809 interactions predicted for E. coli based on gene fusions

• 321 (~5 %) overlap with predictios by phylogenetic profile method

• Eight times more than random

• Promiscuous modules (SH2, SH3, etc.)

• 5 % of domains made more than 25 links to other proteins

• Fusions counted within remaining set of 95 %

• Marcotte et al. (Science 285:751-753, 1999) predicted novel interactions for 50 % of yeast proteins using gene fusion information in any homologous proteins

• Enright et al. (Nature 402:86-90, 1999) considered orthologs with higher signal-to-noise ratio but only 7 % coverage

• Predictions by conserved synteny, phylogenetic profiles and gene fusion are largely additive

• small overlap

• Combined score

• Calibrated against same / different KEGG map

• STRING server

• Predictions for about 50 % of genes from complete genomes

• http://www.bork.embl-heidelberg.de/STRING/

• Noisy

• Different types of interaction

• Physical interaction (complex formation)

• Transient interactions

• Dependent on post-translational modification state, e.g. phosphorylation

• Successive steps of a metabolic pathway

• Involvement in related biological processes

• Tentti 28.4.

• Uusinta 3.5. yleinen tenttipäivä

• Tenttiin tulee

• Päättelytehtäviä

• Esseekysymyksiä