Protein quaternary fold recognition using conditional graphical models
Download
1 / 22

Protein Quaternary Fold Recognition Using Conditional Graphical Models - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on
  • Presentation posted in: General

Protein Quaternary Fold Recognition Using Conditional Graphical Models. Yan Liu, Jaime Carbonell V anathi Gopalakrishnan (U Pitt), Peter Weigele (MIT) Language Technologies Institute School of Computer Science Carnegie Mellon University IJCAI-2007 – Hyderabad, India. Nobelprize.org.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Protein Quaternary Fold Recognition Using Conditional Graphical Models ' - brede


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Protein quaternary fold recognition using conditional graphical models

Protein Quaternary Fold Recognition Using Conditional Graphical Models

Yan Liu, Jaime Carbonell

Vanathi Gopalakrishnan (U Pitt), Peter Weigele (MIT)

Language Technologies Institute

School of Computer Science

Carnegie Mellon University

IJCAI-2007 – Hyderabad, India


Snapshot of cell biology

Nobelprize.org Graphical Models

DSCTFTTAAAAKAGKAKAG

Protein sequence

+

Protein function

Protein structure

Snapshot of Cell Biology


Example protein structures
Example Protein Structures Graphical Models

Triple beta-spiral fold in Adenovirus Fiber Shaft

Adenovirus Fibre Shaft

Virus Capsid


Predicting protein structures
Predicting Protein Structures Graphical Models

  • Protein Structure is a key determinant of protein function

  • Crystalography to resolve protein structures experimentally in-vitro is very expensive, NMR can only resolve very-small proteins

  • The gap between the known protein sequences and structures:

    • 3,023,461 sequences v.s. 36,247 resolved structures (1.2%)

    • Therefore we need to predict structures in-silico


Quaternary folds and alignments

Seq 1: APA Graphical ModelsFSVSPA … SGACGPECAESG

Seq 2 : DSCTFT…TAAAAKAGKAKCSTITL

Quaternary Folds and Alignments

  • Protein fold

    • Identifiable regular arrangement of secondary structural elements

      • Thus far, a limited number of protein folds have been discovered (~1000)

    • Very few research work on quaternary folds

      • Complex structures and few labeled data

  • Quaternary fold recognition


Protein quaternary fold recognition using conditional graphical models

Previous Work Graphical Models

  • Sequence similarity perspective

    • Sequence similarity searches, e.g. PSI-BLAST [Altschul et al, 1997]

    • Profile HMM, .e.g. HMMER [Durbin et al, 1998] and SAM [Karplus et al, 1998]

    • Window-based methods, e.g. PSI_pred [Jones, 2001]

  • Physical forces perspective

    • Homology modeling or threading, e.g. Threader [Jones, 1998]

  • Structural biology perspective

    • Painstakingly hand-engineered methods for specific structures, e.g.αα- and ββ- hairpins, β-turn and β-helix [Efimov, 1991; Wilmot and Thornton, 1990; Bradley at al, 2001]

Fail to capture the structure properties and long-range dependencies

Generative models based on rough approximation of free-energy, perform very poorly on complex structures

Very Hard to generalize due to built-in constants, fixed features


Conditional random fields
Conditional Random Fields Graphical Models

  • Hidden Markov model (HMM)[Rabiner, 1989]

  • Conditional random fields (CRFs)[Lafferty et al, 2001]

    • Model conditional probability directly (discriminative models, directly optimizable)

    • Allow arbitrary dependencies in observation

    • Adaptive to different loss functions and regularizers

    • Promising results in multiple applications

    • But, need to scale up (computationally) and extend to long-distance dependencies


Our solution conditional graphical models
Our Solution: Conditional Graphical Models Graphical Models

Local dependency

Long-range dependency

  • Outputs Y = {M, {Wi} }, where Wi = {pi, qi, si}

  • Feature definition

    • Node feature

    • Local interaction feature

    • Long-range interaction feature


Linked segmentation crf

Joint Labels Graphical Models

Linked Segmentation CRF

  • Node: secondary structure elements and/or simple fold

  • Edges: Local interactions and long-range inter-chain and intra-chain interactions

  • L-SCRF: conditional probability of y given x is defined as


Linked segmentation crf ii
Linked Segmentation CRF (II) Graphical Models

  • Classification:

  • Training : learn the model parametersλ

    • Minimizing regularized negative log loss

    • Iterative search algorithms by seeking the direction whose empirical values agree with the expectation

  • Complex graphs results in huge computational complexity


Approximate inference of l scrf
Approximate Inference of L-SCRF Graphical Models

  • Most approximation algorithms cannot handle variable number of nodes in the graph, but we need variable graph topologies, so…

  • Reversible jump MCMC sampling [Greens, 1995, Schmidler et al, 2001] withFour types of Metropolis operators

    • State switching

    • Position switching

    • Segment split

    • Segment merge

  • Simulated annealing reversible jump MCMC [Andireu et al, 2000]

    • Replace the sample with RJ MCMC

    • Theoretically converge on the global optimum


Experiments target quaternary fold
Experiments: Target Quaternary Fold Graphical Models

  • Triple beta-spirals [van Raaij et al. Nature 1999]

    • Virus fibers in adenovirus, reovirus and PRD1

  • Double barrel trimer [Benson et al, 2004]

    • Coat protein of adenovirus, PRD1, STIV, PBCV



Tertiary fold recognition helix fold
Tertiary Fold Recognition: Graphical Modelsβ-Helix fold

  • Histogram and ranks for known β-helices against PDB-minus dataset

5

Chain graph model reduces the real running time of SCRFs model by around 50 times


Fold alignment prediction helix
Fold Alignment Prediction: Graphical Modelsβ-Helix

  • Predicted alignment for known β-helices on cross-family validation


Discovery of new potential helices
Discovery of New Potential Graphical Modelsβ-helices

  • Run structural predictor seeking potential β-helices from Uniprot (structurally unresolved) databases

    • Full list (98 new predictions) can be accessed at www.cs.cmu.edu/~yanliu/SCRF.html

  • Verification on 3 proteins with later experimentally resolved structures from different organisms

    • 1YP2: Potato Tuber ADP-Glucose Pyrophosphorylase

    • 1PXZ: The Major Allergen From Cedar Pollen

    • GP14 of Shigella bacteriophage as a β-helix protein

    • No single false positive!


Experiment results fold recognition
Experiment Results: Fold Recognition Graphical Models

Triple beta-spirals

Double barrel-trimer


Experiment results alignment prediction

Triple beta-spirals Graphical Models

Four states: B1, B2, T1 and T2

Correct Alignment:

B1: i – o B2: a - h

Predicted Alignment

B1

B2

Experiment Results: Alignment Prediction


Experiment results discovery of new membership proteins
Experiment Results: Graphical ModelsDiscovery of New Membership Proteins

  • Predicted membership proteins of triple beta-spirals can be accessed at

    http://www.cs.cmu.edu/~yanliu/swissprot_list.xls

  • Membership proteins of double barrel-trimer suggested by biologists [Benson, 2005] compared with L-SCRF predictions


Conclusion
Conclusion Graphical Models

  • Conditional graphical models for protein structure prediction

    • Effective representation for protein structural properties

    • Feasibility to incorporate different kinds of informative features

    • Efficient inference algorithms for large-scale applications

  • A major extension compared with previous work

    • Knowledge representation through graphical models

    • Ability to handle long-range interactions within one chain and between chains

  • Future work

    • Automatic learning of graph topology

    • Applications to other domains


Graphical models
Graphical Models Graphical Models

  • A graphical model is a graph representation of probability dependencies [Pearl 1993; Jordan 1999]

    • Node: random variables

    • Edges: dependency relations

  • Directed graphical model (Bayesian networks)

  • Undirected graphical model (Markov random fields)


ad
  • Login