Protein tertiary structure prediction
This presentation is the property of its rightful owner.
Sponsored Links
1 / 42

Protein Tertiary Structure Prediction PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on
  • Presentation posted in: General

Protein Tertiary Structure Prediction. Structural Bioinformatics. The Different levels of Protein Structure. Primary: amino acid linear sequence. Secondary:  -helices, β -sheets and loops. Tertiary : the 3D shape of the fully folded polypeptide chain. PDB: Protein Data Bank.

Download Presentation

Protein Tertiary Structure Prediction

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Protein tertiary structure prediction

Protein Tertiary Structure Prediction

Structural Bioinformatics


Protein tertiary structure prediction

The Different levels of Protein Structure

Primary: amino acid linear sequence.

Secondary: -helices, β-sheets and loops.

Tertiary: the 3D shape of the fully folded

polypeptide chain


Pdb protein data bank

PDB: Protein Data Bank

  • DataBase of molecular structures :

    Protein, Nucleic Acids (DNA and RNA),

  • Structures solved by

    X-ray crystallography

    NMR

    Electron microscopy


Rcsb pdb protein data bank

RCSB PDB – Protein Data Bank

http://www.rcsb.org/pdb/


How can we view the protein structure

How can we view the protein structure ?

  • Download the coordinates of the structure from the PDB

    http://www.rcsb.org/pdb/

  • Launch a 3D viewer program

    For example we will use the program Pymol

    The program can be downloaded freely from

    the Pymol homepage http://pymol.sourceforge.net/

  • Upload the coordinates to the viewer


Pymol example

Pymol example

  • Launch Pymol

  • Open file “1aqb” (PDB coordinate file)

  • Display sequence

  • Hide everything

  • Show main chain / hide main chain

  • Show cartoon

  • Color by ss

  • Color red

  • Color green, resi 1:40

Help http://pymol.sourceforge.net/newman/user/toc.html


Predicting 3d structure

Predicting 3D Structure

Outstanding difficult problem

Based on sequence homology

  • Comparative modeling (homology)

    Based on structural homology

  • Fold recognition (threading)


Comparative modeling

Based on Sequence homology

Comparative Modeling

Similar sequences suggests similar structure


Protein tertiary structure prediction

Sequence and Structure alignments of two Retinol Binding Protein


Structure alignments

Structure Alignments

There are many different algorithms for structural Alignment.

The outputs of a structural alignment are a superposition of the atomic coordinates and a minimal Root Mean Square Distance (RMSD) between the structures. The RMSD of two aligned structures indicates their divergence from one another.

Low values of RMSD mean similar structures


Protein tertiary structure prediction

Dali (Distance mAtrix aLIgnment)

DALI offers pairwise alignments of protein structures. The algorithm uses the three-dimensional coordinates of each protein to calculate distance matrices comparing residues.

See Holm L and Sander C (1993) J. Mol. Biol. 233:123-138.

SALIGN http://salilab.org/DBALI/?page=tools


Comparative modeling1

Based on Sequence homology

Comparative Modeling

Similar sequence suggests similar structure

Builds a protein structure model based on its alignment to one or more related protein structures in the database


Comparative modeling2

Based on Sequence homology

Comparative Modeling

  • Accuracy of the comparative model is related to the sequence identity on which it is based

    >50% sequence identity = high accuracy

    30%-50% sequence identity= 90% modeled

    <30% sequence identity =low accuracy (many errors)


Protein tertiary structure prediction

Homology Threshold for Different Alignment Lengths

Homology

Threshold(t)

Alignment length (L)

A sequence alignment between two proteins is considered to imply

structural homology if the sequence identity is equal to or above the

homology threshold t in a sequence region of a given length L.

The threshold values t(L) are derived from PDB


Comparative modeling3

Comparative Modeling

  • Similarity particularly high in core

    • Alpha helices and beta sheets preserved

    • Even near-identical sequences vary in loops


Comparative modeling methods

Based on Sequence homology

Comparative Modeling Methods

MODELLER (Sali –Rockefeller/UCSF)

SCWRL (Dunbrack- UCSF )

SWISS-MODEL

http://swissmodel.expasy.org//SWISS-MODEL.html


Comparative modeling4

Based on Sequence homology

Comparative Modeling

Modeling of a sequence based on known structures

Consist of four major steps :

  • Finding a known structure(s) related to the sequence to be modeled (template), using sequence comparison methods such as PSI-BLAST

2. Aligning sequence with the templates

3. Building a model

4. Assessing the model


Fold recognition

Based on Structure homology

Fold Recognition


Protein tertiary structure prediction

Based on Secondary Structure

Protein Folds: sequential and spatial arrangement of secondary structures

Hemoglobin

TIM


Protein tertiary structure prediction

Similar folds usually mean similar function

Transcription

factors

Homeodomain


Protein tertiary structure prediction

The same fold can have multiple functions

Rossmann

12 functions

31 functions

TIM barrel


Fold recognition1

Based on Structure homology

Fold Recognition

  • Methods of protein fold recognition attempt to detect similarities between protein 3D structure that have no significant sequence similarity.

  • Search for folds that are compatible with a particular sequence.

  • "the turn the protein folding problem on it's head” rather than predicting how a sequence will fold, they predict how well a fold will fit a sequence


Protein tertiary structure prediction

Based on Structure homology

Basic steps in Fold Recognition :

Compare sequence against a Library of all known Protein Folds (finite number)

Query sequence

MTYGFRIPLNCERWGHKLSTVILKRP...

Goal: find to what folding template the sequence fits best

There are different ways toevaluate sequence-structure fit


Protein tertiary structure prediction

Potential fold

Based on Secondary Structure homology

There are different ways toevaluate sequence-structure fit

1) ... 56) ... n)

...

...

-10 ... -123 ... 20.5

MAHFPGFGQSLLFGYPVYVFGD...


Programs for fold recognition

Based on Secondary Structure homology

Programs for fold recognition

  • TOPITS (Rost 1995)

  • GenTHREADER (Jones 1999)

  • SAMT02 (UCSC HMM)

  • 3D-PSSMhttp://www.sbg.bio.ic.ac.uk/~3dpssm/


Ab initio modeling

Ab Initio Modeling

  • Compute molecular structure from laws of physics and chemistry alone

    Theoretically Ideal solution

    Practically nearly impossible

    WHY ?

    • Exceptionally complex calculations

    • Biophysics understanding incomplete


Ab initio methods

Ab Initio Methods

  • Rosetta (Bakers lab, Seattle)

  • Undertaker (Karplus, UCSC)


Casp critical assessment of structure prediction

CASP - Critical Assessment of Structure Prediction

  • Competition among different groups for resolving the 3D structure of proteins that are about to be solved experimentally.

  • Current state -

    • ab-initio - the worst, but greatly improved in the last years.

    • Modeling - performs very well when homologous sequences with known structures exist.

    • Fold recognition - performs well.


What can you do fold it solve puzzles for science

What can you do?FOLDITSolve Puzzles for Science

A computer game to fold proteins

http://fold.it/portal/puzzles


What s next

What’s Next

Predicting function from structure


Protein tertiary structure prediction

Structural Genomics: a large scale structure determination project designed to cover all representative protein structures

ATP binding domain of protein MJ0577

Zarembinski, et al., Proc.Nat.Acad.Sci.USA, 99:15189 (1998)


Protein tertiary structure prediction

Wanted !

Automated methodsto predict function from the protein structures resulting from the structural genomic project.

As a result of the Structure Genomic

initiative many structures of proteins

with unknown function will be solved


Protein tertiary structure prediction

Approaches for predicting function from structure

ConSurf - Mapping the evolution conservation on the protein structure http://consurf.tau.ac.il/


Protein tertiary structure prediction

Approaches for predicting function from structure

PFPlus – Identifying positive electrostatic patches on the protein structure http://pfp.technion.ac.il/


Protein tertiary structure prediction

Approaches for predicting function from structure

SHARP2 – Identifying positive electrostatic patches on the protein structure http://www.bioinformatics.sussex.ac.uk/SHARP2


Machine learning approach for predicting function from structure

Machine learning approach for predicting function from structure

Find the common properties of a protein family (or any group of proteins of interest)

which are unique to the group and different from all the other proteins.

Generate a model for the group and predict new members of the family which have similar properties.


Knowledge based approach

Knowledge Based Approach

Basic Steps

1. Building a Model

  • Generate a dataset of proteins with a common function (DNA binding protein)

  • Generate a control dataset

  • Calculate the different properties which are characteristic of the protein family you are interested for all the proteins in the data (DNA binding proteins and the non-DNA binding proteins

  • Represent each protein in a set by a vector of calculated features and build a statistical model to split the groups


Protein tertiary structure prediction

Basic Steps

2. Predicting the function of a new protein

  • Calculate the properties for a new protein

    And represent them in a vector

  • Predict whether the tested protein belongs to the family


Test case

TEST CASE

Y14 – A protein sequence translated from an ORF

(Open Reading Frame)

Obtained from the Drosophila complete Genome

>Y14

PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHLNLDRRTGFSKGYALVEYETHKQALAAKEALNGAEIMGQTIQVDWCFVKG G


Protein tertiary structure prediction

?

Support Vector Machine (SVM)

To find a hyperplane that maximally

separates the RNA-binding from non-RNA binding

into two classes

RNA binding

=[x1, x2, x3…]

Kernel

function

new

protein

structure

Non-NA binding

=[y1, y2,y3…]

Input space

Feature space


Protein tertiary structure prediction

>Y14

PQRSVGWILFVTSIHEEAQEDEIQEKFCDYGEIKNIHLNLDRRTGFSKGYALVEYETHKQALAAKEALNGAEIMGQTIQVDWCFVKG G

Y14 DOES NOT BIND RNA


  • Login