slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Protein secondary structure Prediction PowerPoint Presentation
Download Presentation
Protein secondary structure Prediction

Loading in 2 Seconds...

play fullscreen
1 / 22

Protein secondary structure Prediction - PowerPoint PPT Presentation


  • 110 Views
  • Uploaded on

Protein secondary structure Prediction. The problem. Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE. Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC. Why 2 nd Structure prediction?. Some historical landmarks. 1 st generation – 70’s (~50-60% accuracy) single residue statistics, explicit rules

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Protein secondary structure Prediction' - omar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Protein secondary structure Prediction

  • The problem

Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE

Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC

  • Why 2nd Structure prediction?
slide2

Some historical landmarks

  • 1st generation – 70’s (~50-60% accuracy)
  • single residue statistics, explicit rules
  • Chou & Fasman 1974, GOR1 1978
  • 2nd generation – 80’s (~60-70% accuracy)
  • single residue statistics, nearest-neighbors, neural network (more with local interaction)
  • GOR3 1987, Levin et al. 1986, Qian & Sejnowski 1988, Holly & Karplus, 1989
  • 3rd generation – 90’s (~78% accuracy)
  • neural network with homologous sequence information
  • PHD 1993, PSIPRED 1999, SSPRO 2000
slide3

Chou-Fasman method

  • Straight statistical approach
  • Conformational propensity e.g. helical propensity
  • Categorize each amino acid
  • e.g. helix former, helix breaker, helix indifferent
  • Find nucleation sites
  • short sequence with high concentration of a category
  • Extend the nucleation sites till a threshold
  • Handle overlaps
slide4

Chou-Fasman method

Conformational parameters

(Table from Krane and Raymer’s book)

  • What is the drawback of the method?
slide5

Introduction to neural network

  • A self learning system – using a training data set
  • A perceptron
    • An analogy – apple and orange sorter
    • Threshold unit – classify a vector of inputs
  • Weight ! How to get it?
slide6

Basics in neural network (1 unit only)

  • Modify threshold unit a little bit
    • Step function vs. continuous threshold function (a)
  • Problem about weight
  • Do not fit examples exactly - minimize an error function
slide7

Basics in neural network (1 unit only)

  • Squared error function E(w)
  • Minimize error E(w) - using gradient descent method
  • Weight update in each step
    • Learning rate 
slide8

Basic neural network in secondary structure prediction

(Figure from Kneller et. al. JMB 1990)

Activation a1=

Output y1=

Error E1=

E1

E2

E3

y1

y2

y3

w11

w12

w13

w14

x1

x2

x3

x4

slide9

Multi-layer neural network

  • Complete neural network
  • - a set of continuous threshold units interconnected in a topology
  • - output of some unit is input of other units

Output units (z)

Hidden units (y)

Input units (x)

x1

x2

x3

x4

slide10

PHD method

(Rost B. & Sander C, JMB 1993)

  • Use profile of multiple sequence alignment
  • Multiple layers
  • Accuracy >70%
slide11

Protein Folding Problem

  • A protein folds into a unique 3D structure in physiological condition
  • What is the protein folding problem?
  • 3D structure is a key to understand function mechanism
  • Rational drug design
  • 3D structure prediction
slide12

Protein Folding Problem

  • Hard?
  • Can it be done?
  • Sampling conformational space
    • SS structures offer simplicity
    • Side chain filling the space
    • May not be random search
  • Free energy ( G) =
    • Interaction energy – Entropic energy
slide13

Protein Folding Problem

  • Experimental finding
    • Protein does not start folding from the end
    • SS seem to fold early
    • Hydrophobic aa in the core
    • Hydrophilic aa on surface
  • Energy function approximation
    • Physics based (bond length, bond angle, pair interactions)
    • Statistics based
slide14

Scope of the problem

  • Majority of the newly solved protein structure share certain level of similarity with a known structure
  • Certain families of proteins have no or few structures solved
  • Human genes ~20k
  • Structure genomics initiative
slide15

Protein structure prediction

  • Comparative modeling
  • >30% sequence identify
  • Fold recognition – formally known as threading
  • twilight zone <25% sequence identity
  • Ab initio
    • new fold
slide16

CASP

Compare and rank

Experimentally solved structure

Predicted structure

  • CASP –
  • e.g. Skolnick (2003) Proteins: 53:p469-79
  • Ginalski (2003) Proteins: 53: p410-17
    • Zhang, Y. “Template-based modeling and free modeling by I-TASSER in CASP7 (pages 108–117)” Proteins, 69, S8, P108-17 (2007).
slide17

Search for structures

Select templates

Align target sequence with structures

Build model

Evaluate model

Comparative Modeling

http://www.salilab.org/~andras/watanabe/main.html

  • Sequence identity vs. structure overlap (Fig)
slide18

Comparative Modeling

  • Search for structures:
  • pair-wise sequence alignment with database
  • multiple sequence alignment -> profile
  • fold assignment / threading – use structure information in comparison
  • Select template:
    • sequence similarity, evolutionary relationship, environment, resolution
  • Sequence alignment (target and template)
  • standard method with tune
slide19

Ab Inito Prediction

  • Challenge:
    • Search space
    • Energy function
  • Reduction in search space
  • use lattice
  • use simplified amino acids
  • use building blocks available in nature
  • Energy function:
  • physics
  • statistics - empirical
slide20

Ab inito 3D Structure prediction

An example - ROSETTA

Simons KT, Kooperberg C, Huang E, Baker D; J Mol Biol. (1997) 268, 209-225

Schonbrun J, Wedemeyer W, Baker D; Current Opinion in Structure biology, (2002), 12:348-54

ROSETTA

narrow search - use local structure available

statistical based energy function

one of the top few ab initio methods in CASP4.

slide21

ROSETTA – segment matching

Observations:

Analysis of 9-a.a. segments in structure database

distribution of the conformations of 9-mers

Main idea of the method

build segment conformational library

(fragment library for 3mer and 9mer)

put pieces together

better (energy function and search space)

slide22

Model Building

  • Assembly of rigid bodies
    • dissecting structure into core, loops and side- chains
  • Satisfy spatial constraints (Fig.)
  • derive spatial constraints, find a structure that optimize all the constraints
  • spatial constraints generated from
  • input alignment;
  • general spatial preferences found in known structures;
  • molecular force field;