Dynameomics
Download
1 / 44

Dynameomics - PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on

Background HT MD – Target Selection – Database Mining Native DB Reference unfolded peptide DB Mining Unfolding Protein DB Prion Protein and amyloid DB. Dynameomics. Valerie Daggett Bioengineering Department Biomedical and Health Informatics University of Washington Seattle, WA. DNA.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Dynameomics' - jarvis


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Dynameomics

Background

HT MD – Target Selection – Database

Mining Native DB

Reference unfolded peptide DB

Mining Unfolding Protein DB

Prion Protein and amyloid DB

Dynameomics

Valerie Daggett

Bioengineering Department

Biomedical and Health Informatics

University of Washington

Seattle, WA


Central dogma of biology

DNA

transcription

RNA

translation

Protein

Central dogma of biology

Genomes

…AAAGTCCAGGCAGAATATAATTCTATAAAG

GGAACTCCTTCAGAGGCTGAAATCTTT…

information to make protein

template to make protein

…LEVVAATPTSLLISWDAPAVTVRYYTYGETGGNSPVQEFTVPGS…

function, phenotype

Life


Central dogma of biology1

DNA

transcription

RNA

translation

Protein

Central dogma of biology

Genomes

…AAAGTCCAGGCAGAATATAATTCTATAAAG

GGAACTCCTTCAGAGGCTGAAATCTTT…

information to make protein

template to make protein

…LEVVAATPTSLLISWDAPAVTVRYYTYGETGGNSPVQEFTVPGS…

function, phenotype

Life

Motion critical


Dynamic cleft discovered through MD

Cytochrome b5

Storch et al., Biochem, 1995, 1999a,b, 2000


Protein folding embedded

DNA

transcription

RNA

translation

Protein

Protein folding embedded

Genomes

Proteinfolding problem

D, denatured

biologically inactive

?

N, native

biologically active

Life


Protein folding embedded1

DNA

transcription

RNA

translation

Protein

Protein folding embedded

Genomes

Protein un/folding problem

D, denatured

biologically inactive

?

Process or pathway

N, native

biologically active

Life


Unfolding pathway of CI2 in water

[Simulation contains 500,000 structures]

373 K

N (94 ns)

TS (21 ns)

D (30 ns)

D (94 ns)

  • MD unfolding process in good agreement with experiment

  • TS in quantitative agreement with experiment---prediction

  • Residual structure in D verified experimentally

  • Atomic-level characterization of transition, intermediate

    and denatured state ensembles

Daggett and Fersht, TIBS, PNAS, +


Conformational ensembles in folding

N

TS

D

Day and Daggett, PNAS, 2005

100 simulations


TS

N

D

Refolding by quenching TS

8

‘D’

7

6

5

TS

Ca RMSD (Å)

4

3

2

Control, N

1

Brute force MD can refold

proteins from the TS

Plan: predict TS structures,

perform MD simulations and

solve protein folding problem

But we need info to predict TS

(TS easier than D)

0

0.5

1

1.5

2

2.5

3

Time (ns)

DeJong et al., JMB 2002


  • 5 ns

  • 25.6 ns

  • 200 ns

  • I57

  • A16

  • I57

  • A16

  • L49

  • I20

  • L49

  • I20

  • 4.8 Å

  • 4.0 Å

  • 8.9 Å

  • Reversible folding and unfolding

  • 348K in water, the Tm of the protein

  • And, refolding = unfolding

    Detailed pathway reversed

  • A16/I20 orientation maintained

  • Day and Daggett, JMB, 2007

  • McCully et al., Biochem in press (EnHD)


Reverse central dogma of biology
Reverse central dogma of biology

Determine pathways for

many proteins, ascertain

general

features

D, denatured

biologically inactive

?

Process or pathway

DNA

N, native

biologically active

RNA

Decode genomes

Protein


Proteins
Proteins

  • Proteins are life’s machines, tools and structures

    • Many jobs, many shapes, many sizes


Dynameomics

Goals:

  • Perform HT MD simulations of representatives of all folds (41,000 structures in PDB → 1130 fold families)

    2. Construct a novel relational/multidimensional database to house these data and facilitate discovery

    • Native state – information relevant to disease and drug design targets, SNPs

    • Unfolding – disease and solution to protein folding problem

  • NERSC

  • DOE

  • Unix

  • The Wall

  • Windows

  • Athena @ MS

  • Beck et al., Prot Eng Des Sel, 2008


700

1.0

600

0.8

500

400

0.6

Population

Coverage

300

0.4

200

100

0.2

0

0

50

100

150

200

0

200

400

600

800

1000

Fold Rank

Fold Rank

Fold space

30 folds represent ~ 50% of known protein structures

  • Divide protein structures into folds

    • Consensus of SCOP, CATH and Dali

  • Rank folds based on population

  • Choose a representative protein from each fold

Day et al., Prot. Sci., 2003


Target selection

  • Selection criteria

    • Structure quality

    • Protein size

    • Experimental data available

    • Biomedical relevance

    • 1st globular then membrane

CheY [PDB:3chy]

Example: Rank 2, population 424

Amanda Jonsson


Targets with biomedical relevance
Targets with biomedical relevance

Amyloid- precursor protein

HIV-1 Protease

Glutathione S-transferase

Alzheimer’s disease

HIV

Chemotherapy resistance

Triosephosphate isomerase

MAP30

Serum amyloid P component

Amyloidosis

HIV and cancer

Neurodegeneration


Top 30 folds

Represent 50% of all known protein folds

Data and metadata for ‘Top 30’ at www.dynameomics.org


Dynameomics protocol

  • One 298 K native state simulation (21-60 ns,

    <26 ns>)

  • At least three 310 K native simulations (some)

  • At least five 498 K unfolding simulations

    • Two long simulations (at least 31 ns, <36 ns>)

    • At least three short simulations (2 ns, <14 ns>)

    • (5 simulations ~ 100 simulations)

Trade-off sampling of different folds and different sequences as opposed to more thorough sampling of individual protein (~400 simulations of PrP)


Validation of trajectories
Validation of Trajectories

  • Computational checks: energy conservation

  • Native State: NOEs, S2 order parameters from NMR relaxation experiments, etc.

  • Unfolding Process: F values, residual structure in denatured state, intermediates

David Beck


Native State Simulations: Ubiquitin

  • NOEs (2727)

  • MD: 95.2 %

  • XTAL: 94.4%

  • Proton Chemical shifts: R=0.98


Comparison with available NMR

  • The 27 proteins with available data (by PDB code) are: 1aa3, 1c06, 1d1r, 1gle, 1kjs, 2ife, 3gcc, 1bf0, 1cmz, 1cok, 1cz4, 1d1n, 1d8v, 1enh, 1fad, 1fvl, 1fzt, 1ght, 1i11, 1iyu, 11dl, 1mut, 1sso, 1tfb, 1ubq, 1uxc, 3chy.

  • Proton chemical shifts from MD structures were calculated with SHIFTS (Osapay and Case, 1991). The 15 proteins with data available (by PDB code): 1mjc, 1hcc, 1ubq, 1baz, 1cz4, 1a2p, 1e65, 1ill, 3chy, 1ght, 1cmz, 1gpr, 1byl, 1fzt, 1b10.


Dynameomics status

  • Dataset includes over 500 proteins and nearly 4000 simulations for a total of >60 s of simulation time, > 65M structures

  • > 64 TB

Not including 637 amyloid simulations


Comprehensive data metadata
Comprehensive data/metadata

In theory,

build a warehouse

Andrew Simms


Build a data warehouse not so easy
Build a data warehouse (not so easy)

  • The data set is large… (~6 months to load protein coordinates)

    • Storing protein data only, no solvent data

    • Only single simulations per table (10M – 90M rows)

    • 4000 simulations x 10 analyses right now (40K tables)

    • And we are growing at a rate of ~2000 simulations per year (10K tables)

  • Approach for scaling...

    • Multiple servers

    • Multiple databases per server

    • 100 targets per database

  • Andrew Simms

  • Simms et al., Prot Eng Des Sel, 2008


Multi d cubes for complex data analysis

Though our data set may be large, our requirements are typical in the scientific world

Large, complex and often multidimensional data sets

Analytical rather than transactional processing

Need for performance and storage efficiency

Multi-D cubes for complex data analysis

On-line analytical processing – OLAP

MOLAP – multidimensional OLAP

Catherine Kehl


Molecular dynamics
Molecular Dynamics typical in the scientific world

  • MD provides atomic resolution of native dynamics

3chy, waters and hydrogens hidden


Molecular dynamics1
Molecular Dynamics typical in the scientific world

  • MD provides atomic resolution of native dynamics

native state simulation of 3chy at 298 K, Asp 57


Native-state dynamics: helix motion typical in the scientific world

a3:a4

a2:a3

a3:a4

Standard Deviation Helix Angle (degrees)

CheY at 298 K

α5

α4

α2

α2

α4

α3

α3

0 ns

5 ns

10 ns

15 ns

20 ns

a2 and a3 dynamic, a4 and a5 stable structural scaffold


CheY – Binding partners typical in the scientific world

Structures of CheY complexes -show binding to α4 and α5

a4:a5 Distances between ends of helices

α5

α4

α2

α3

20 ns

α2

α4

α5

α3

CheY - CheZ

  • Functionally important face of protein stable

  • Asp 57, phosphorylation

  • Motion in a2 and a3 does not disrupt function, entropy sink?

CheY - CheA

CheY-FliM

Rudesh Toofanny


Catechol o methyltransferase
Catechol O-methyltransferase typical in the scientific world

CheY

COMT

  • Both proteins: Rank 2 Rossman fold

  • COMT polymorphism: Val108 → Met

  • 108M - increased risk for diseases such as breast cancer and OCD

  • Improved memory

  • MD

    108M

    • a6 and a7 mobile in COMT, too

    • In 108M movement of a6 propagated 16 Å and disrupts the active site

    15 ns

    Rutherford et al., Biochem. 2006

    30 ns

    Importance of characterizing dynamics


    • Native-like typical in the scientific world

    • Intermediate

    Rutherford et al., BBA, JMB, JMB, Biochem, 2008

    SNP-induced changes in COMT

    a8

    a7

    a6

    108V

    108M

    Mutation to Met leads to loosening of the active site

    Followed up with CD, NMR, crystallography, fluorescence


    SNP leads to broader conformational ensemble at 310 K typical in the scientific world

    Starting Structure

    25 °C

    37 °C

    50 °C

    108V COMT

    108M COMT

    Ca-RMSD Distribution (Å)


    Snp omics
    SNP-omics typical in the scientific world

    COMT – SNP leads to subtle differences in packing near the mutation site that propagate to the active site

    Similar behavior now seen in 4 other members of this methyltransferase family (fold rank 2)

    Effects NOT apparent in static structures

    Large scale effort to investigate dynamic effects of SNPs

    starting with 80 proteins ---- dynameomics protocol add multiple 310 K simulations


    Slirp
    SLIRP typical in the scientific world

    • Structural Library of Intrinsic Residue Propensities (SLIRP) to determine structural propensities for design

      • GGXGG peptides at in water at 298 K and 498 K and in 8M urea at 298 K (multiple simulations, 100 ns)

        • Unbiased coil library, main chain and side chain, exhaustive sampling

      • Dynamic protein side chain rotamer library

        • Rotamer populations, improved over static from crystal structures

        • S2axis, waiting times between rotamers


    “Random Coil” Peptides: Ala typical in the scientific world

    GGAGG

    Protein-MD

    Protein-PDB

    16%

    26%

    4%

    26%

    24%

    F (°)

    F (°)

    F (°)

    HN, Ha, Hb, NH, Ca, Cb, and C’ for GGAGG are very close to the corresponding experimentally derived values (R = 0.999 over 28 points, 7 atoms x 4 independent simulations).


    Chemical shifts for GGXGG: MD and Expt typical in the scientific world

    Predictions calculated with ShiftX v1.0 (Neal et al., 2003, J Biomol NMR) Experimental data taken from Schwarzinger et al., J. Biomol NMR, 2000


    “Random Coil” Peptides vs. Protein: Ala typical in the scientific world

    GGAGG

    Protein-MD

    Protein-PDB

    F (°)

    F (°)

    F (°)

    Ala in protein MD distributions (188 proteins) similar to PDB

    Ala in GGXGG different

    GGAGG vs experimental helix propensities, R = 0.28

    Protein MD vs helix propensities, R = 0.92

    Host-guest studies reflecting the host more than the guest


    Mining the database
    Mining the database typical in the scientific world

    • SLIRP to determine structural propensities for design

    • Dynamic area conserved in members of protein family. In one case critical for biological function and in another mutation at the region leads to disease

    • Inflexible region across 188 proteins, identified novel structural elements associated with loop structure (antifreeze)

    Rudesh Toofanny

    Noah Benson


    Unfolding typical in the scientific world

    N

    TS

    D

    Refolding

    ?

    ?

    TS

    N

    D

    Solving the protein folding problem?

    • Data mining of the Dynameomics database for information to predict TS structures

    • Bootstrapping to native state prediction by refolding from predicted transition state structures

    Dustin Schaeffer


    Contact analysis
    Contact analysis typical in the scientific world

    • Determined contact probabilities by amino acid and

      separation between the amino acids from mining of

      Dynameomics DB

    Contacts

    i → i+x

    Leu

    Leu

    Residue Type 2

    Residue separation

    Leu-Leu i → i+3

    i→i+2

    i

    i→i+3

    Residue Type 1

    i→i+1


    Coordinates from contacts
    Coordinates from contacts typical in the scientific world

    Most Probable contacts

    Protein structure

    DG

    A set of distances for a particular sequence can be converted into coordinates by singular value decomposition (SVD) of a distance matrix ― distance geometry


    Ts predictions for fyn sh3
    TS predictions for Fyn SH3 typical in the scientific world

    Prediction from

    mined data via

    distance geometry

    (too compact)

    RMSD = 3.8 0.37Å

    MD-generated

    TS ensemble


    Solving the folding problem with md

    DB Info + DG typical in the scientific world

    We have TSs for 80% of known protein structures

    We have refolded from TS

    MD

    Solving the folding problem with MD

    High-throughput structure prediction should be possible by refolding from transition states

    Sequence

    TS Structure

    N Structure


    Dynameomics conclusions
    Dynameomics Conclusions typical in the scientific world

    • Native state simulations to probe protein function, for drug design, SNP-omics

    • Unfolding simulations for structure prediction, protein design/redesign, unfolding diseases

    • SLIRP---Structural Library of Intrinsic Residue Propensities: intrinsic mainchain conformations, dynamic side chain rotamer library, coil library

    • Dynameomics.org

    • Noah Benson


    ad