Neural networks for protein structure prediction brown jmb 1999
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Neural Networks for Protein Structure Prediction Brown, JMB 1999 PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on
  • Presentation posted in: General

Neural Networks for Protein Structure Prediction Brown, JMB 1999. CS 466 Saurabh Sinha. Outline. Goal is to predict “secondary structure” of a protein from its sequence Artificial Neural Network used for this task Evaluation of prediction accuracy. What is Protein Structure?.

Download Presentation

Neural Networks for Protein Structure Prediction Brown, JMB 1999

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Neural networks for protein structure prediction brown jmb 1999

Neural Networks for Protein Structure PredictionBrown, JMB 1999

CS 466

Saurabh Sinha


Outline

Outline

  • Goal is to predict “secondary structure” of a protein from its sequence

  • Artificial Neural Network used for this task

  • Evaluation of prediction accuracy


What is protein structure

What is Protein Structure?


Neural networks for protein structure prediction brown jmb 1999

http://academic.brooklyn.cuny.edu/biology/bio4fv/page/3d_prot.htm


Neural networks for protein structure prediction brown jmb 1999

http://matcmadison.edu/biotech/resources/proteins/labManual/images/220_04_114.png


Protein structure

Protein Structure

  • An amino acid sequence “folds” into a complex 3-D structure

  • Finding out this 3-D structure is a crucial and challenging task

  • Experimental methods (e.g., X-ray crystallography) are very tedious

  • Computational predictions are a possibility, but very difficult


What is secondary structure

What is “secondary structure”?


Neural networks for protein structure prediction brown jmb 1999

“Strand”

“Helix”

http://www.wiley.com/college/pratt/0471393878/student/structure/secondary_structure/secondary_structure.gif


Neural networks for protein structure prediction brown jmb 1999

“Helix”

“Strand”

http://www.npaci.edu/features/00/Mar/protein.jpg


Secondary structure prediction

Secondary structure prediction

  • Well, the whole 3-D “tertiary” protein structure may be hard to predict from sequence

  • But can we at least predict the secondary structural elements such as “strand”, “helix” or “coil”?

  • This is what this paper does

  • .. and so do many other papers (it is a hard problem !)


A survey of structure prediction

A survey of structure prediction

  • The most reliable technique is “comparative modeling”

    • Find a protein P whose amino acid sequence is very similar to your “target” protein T

    • Hope that this other protein P does have a known structure

    • Predict a similar structure similar to that of P, after carefully considering how the sequences of P and T differ


A survey of structure prediction1

A survey of structure prediction

  • Comparative modeling fails if we don’t have a suitable homologous “template” protein P for our protein T

  • “Ab initio” tertiary methods attempt to predict the structure without using a protein structure

    • Incorporate basic physical and chemical principles into the structure calculation

    • Gets very hairy, and highly computationally intensive

  • The other option is prediction of secondary structure only (i.e., making the goal more modest)

    • These may be used to provide constraints for tertiary structure prediction


Secondary structure prediction1

Secondary structure prediction

  • Early methods were based on stereochemical principles

  • Later methods realized that we can do better if we use not only the one sequence T (our sequence), but also a family of “related sequences”

  • Search for sequences similar to T, build a multiple alignment of these, and predict secondary structure from the multiple alignment of sequence


What s multiple alignment doing here

What’s multiple alignment doing here ?

  • Most conserved regions of a protein sequence are either functionally important or buried in the protein “core”

  • More variable regions are usually on surface of the protein,

    • there are few constraints on what type of amino acids have to be here (apart from bias towards hydrophilic residues)

  • Multiple alignment tells us which portions are conserved and which are not


Neural networks for protein structure prediction brown jmb 1999

hydrophobic core

http://bio.nagaokaut.ac.jp/~mbp-lab/img/hpc.png


What s multiple alignment doing here1

What’s multiple alignment doing here ?

  • Therefore, by looking at multiple alignment, we could predict which residues are in the core of the protein and which are on the surface (“solvent accessibility”)

  • Secondary structure then predicted by comparing the accessibility patterns associated with helices, strands etc.

  • This approach (Benner & Gerloff) mostly manual

  • Today’s paper suggest an automated method


The psi pred algorithm

The PSI-PRED algorithm

  • Given an amino-acid sequence, predict secondary structure elements in the protein

  • Three stages:

  • Generation of a sequence profile (the “multiple alignment” step)

  • Prediction of an initial secondary structure (the neural network step)

  • Filtering of the predicted structure (another neural network step)


Generation of sequence profile

Generation of sequence profile

  • A BLAST-like program called “PSI-BLAST” used for this step

  • We saw BLAST earlier -- it is a fast way to find high scoring local alignments

  • PSI-BLAST is an iterative approach

    • an initial scan of a protein database using the target sequence T

    • align all matching sequences to construct a “sequence profile”

    • scan the database using this new profile

  • Can also pick out and align distantly related protein sequences for our target sequence T


The sequence profile looks like this

The sequence profile looks like this

  • Has 20 x M numbers

  • The numbers are log likelihood of each residue at each position


Preparing for the second step

Preparing for the second step

  • Feed the sequence profile to an artificial neural network

  • But before feeding, do a simply “scaling” to bring the numbers to 0-1 scale


Intro to neural nets the second and third steps of psipred

Intro to Neural nets (the second and third steps of PSIPRED)


Artificial neural network

Artificial Neural Network

  • Supervised learning algorithm

  • Training examples. Each example has a label

    • “class” of the example, e.g., “positive” or “negative”

    • “helix”, “strand”, or “coil”

  • Learns how to predict the class of an example


Artificial neural network1

Artificial Neural Network

  • Directed graph

  • Nodes or “units” or “neurons”

  • Edges between units

  • Each edge has a weight (not known a priori)


Layered architecture

Layered Architecture

http://www.akri.org/cognition/images/annet2.gif

Input here is a four-dimensional vector. Each dimension goes

into one input unit


Layered architecture1

Layered Architecture

http://www.geocomputation.org/2000/GC016/GC016_01.GIF

(units)


What a unit neuron does

What a unit (neuron) does

  • Unit i receives a total input xi from the units connected to it, and produces an output yi = fi(xi) where fi() is the “transfer function” of unit i

wi is called the “bias” of the unit


Weights bias and transfer function

Weights, bias and transfer function

Unit takes n inputs

Each input edge has weight wi

Bias b

Output a

Transfer function f()

Linear, Sigmoidal, or other


Weights bias and transfer function1

Weights, bias and transfer function

  • Weights wij and bias wi of each unit are “parameters” of the ANN.

    • Parameter values are learned from input data

  • Transfer function is usually the same for every unit in the same layer

  • Graphical architecture (connectivity) is decided by you.

    • Could use fully connected architecture: all units in one layer connect to all units in “next” layer


Where s the algorithm

Where’s the algorithm?

  • It’s in the training of parameters !

  • Given several examples and their labels: the training data

  • Search for parameter values such that output units make correct predictions on the training examples

  • “Back-propagation” algorithm

    • Read up more on neural nets if you are interested


Back to psipred

Back to PSIPRED …


Step 2

Step 2

  • Feed the sequence profile to the input layer of an ANN

  • Not the whole profile, only a window of 15 consecutive positions

  • For each position, there are 20 numbers in the profile (one for each amino acid)

  • Therefore ~ 15 x 20 = 300 numbers fed

  • Therefore, ~ 300 “input units” in ANN

  • 3 output units, for “strand”, “helix”, “coil”

    • each number is confidence in that secondary structure for the central position in the window of 15


Neural networks for protein structure prediction brown jmb 1999

e.g.,

0.18

0.09

0.67

helix

strand

15

coil

Input layer

Hidden

layer


Step 3

Step 3

  • Feed the output of 1st ANN to the 2nd ANN

  • Each window of 15 positions gave 3 numbers from the 1st ANN

  • Take 15 successive windows’ outputs and feed them to 2nd ANN

  • Therefore, ~ 15 x 3 = 45 input units in ANN

  • 3 output units, for “strand”, “helix”, “coil”


Test of performance

Test of performance


Cross validation

Cross-validation

  • Partition the training data into “training set” (two thirds of the examples) and “test set” (remaining one third)

  • Train PSIPRED on training set, test predictions and compare with known answers on test set.

  • What is an answer?

    • For each position of sequence, a prediction of what secondary structure that position is involved in

    • That is, a sequence over “H/S/C” (helix/strand/coil)

  • How to compare answer with known answer?

    • Number of positions that match


  • Login