massively distributed computing and an nrpgm project on protein structure and function l.
Skip this Video
Loading SlideShow in 5 Seconds..
Massively Distributed Computing and An NRPGM Project on Protein Structure and Function PowerPoint Presentation
Download Presentation
Massively Distributed Computing and An NRPGM Project on Protein Structure and Function

Loading in 2 Seconds...

play fullscreen
1 / 39

Massively Distributed Computing and An NRPGM Project on Protein Structure and Function - PowerPoint PPT Presentation

  • Uploaded on

Massively Distributed Computing and An NRPGM Project on Protein Structure and Function. Computation Biology Lab Physics Dept & Life Science Dept National Central University. From Gene to Protein. About Protein. Function Storage, Transport, Messengers, Regulation… Everything that sustains life

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Massively Distributed Computing and An NRPGM Project on Protein Structure and Function

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
massively distributed computing and an nrpgm project on protein structure and function

Massively Distributed ComputingandAn NRPGM ProjectonProtein Structure and Function

Computation Biology LabPhysics Dept & Life Science DeptNational Central University

about protein
About Protein
  • Function
    • Storage, Transport, Messengers, Regulation… Everything that sustains life
    • Structure: shell, silk, spider-silk, etc.
  • Structure
    • String of amino acid with 3D structure
    • Homology and Topology
  • Importance
    • Science, Health & Medicine
    • Industry – enzyme, detergent, etc.
  • An example – 3hvt.pdb
problem structure function
Problem Structure & Function
  • Primary sequence  Native state with 3D structure
    • Structure function
    • Expensive and time consuming
  • Misfolding means malfunction
    • Mad cow disease (“prion” misfolds)
the folding problem
The Folding Problem
  • Complexity of mechanism & pathway is huge challenge to science and computation technology
molecular dynamics md
Molecular Dynamics (MD)
  • Molecular’s behavior determined by
    • Ensemble statistics
    • Newtonian mechanics
  • Experiment in silico
  • All-atom w. water
    • Huge number of particles
  • Super-heavyduty computation
  • Software for macromolecular MD available
basic statistics for protein md simulation
Basic Statistics for Protein MD Simulation
  • Atoms in a small protein plus surrounding

water (N) 32000

  • Approximate number of interactions

in force calculation (N2/2 ) 0.5x109

  • Machine instructions per force calculation 1000
  • Machine time per calculation (CPU: 3G) 160 sec
  • Typical time-step size 0.5x10–15 sec
  • Total number of steps for 1 ms folding 0.5x109 steps
  • Total machine time (160 sec x 0.5x109) 106 days
how to overcome the factor of 1 million
How to overcome the factor of 1 million
  • A two-pronged approach
    • Faster or more CPUs
      • Nature of bottle-neck in protein folding dictated by Boltzmann distribution, can be overcome by large statistics (parallel computing NOT needed)
      • Our solution: Massively distributed computing
      • We seek factor of ~ 10,000
      • Note. IBM’s solution: Blue Machine w/ 106 CPUs
    • Shorten computation time
      • Many simulation steps needed b/c short time-scale of fast (vibrational) mode of ~ 10fs
      • But time-scale of folding motion slow, ~ 1 ns
      • Ideal solution: by-pass or smooth out fast modes
Protein Studies byMassively Distributed Computing A Project in National Research Program on Genomic Medicine
  • Scientific
    • Protein folding, structure, function, protein-molecule interaction
    • Algorithm, force-field
  • Computing
    • Massive distributive computing
  • Education
    • Everyone and Anyone with a personal PC

can take part

  • Industry – collaborative development
distributive computing
Distributive Computing
  • Concept
    • Computation through internet
    • Utilize idle PC power (through screen-saver)
  • Advantage
    • Cheap way to acquire huge computation power
    • Perfectly suited to task
      • Huge number of runs needed to beat statistics
      • Parallel computation NOT needed
  • Massive data - good management necessary
  • Public education – anyone w/ PC can take part
hardware strategies
Hardware Strategies
  • Parallel computation (we are not this)
    • PC cluster
    • IBM (The blue gene), 106 CPU
  • Massive distributive computing
    • Grid computing (formal and in the future)
    • Server to individual client (now in inexpensive)
      • Examples: SETI, folding@home, genome@home
      • Our project: protein@CBL
software components
Software Components
  • Dynamics of macromolecules
    • Molecular dynamics, all atomistic or mean-field solvent
    • Computer codes
      • GROMACS (for distributive comp; freeware)
      • AMBER and others(for in-house comp; licensed)
  • Distributed Computing
    • COSM - a stable, reliable, and secure system for large scale distributed processing (freeware)
structure of cosm network dist n
Structure of COSM (network dist’n)


System tests

(test all Cosm functions)


Connect to server

Send Request

Recv Assignment

Running Simulation

Put Result

Get Accept

Packet Request

Packet Assignment

Packet Result

Packet Accept

structure at server end



  • Temporary
  • databank
  • Job analysis
  • Automatic
  • temperature
  • swaps by
  • parallel tem-
  • pering


Human intervention






Structure at Server end
structure at client end



If crash

MD Run


Return result

Delete files

Structure at Client end
multi temperature annealing
Multi-temperature Annealing
  • Project suited for multi-temperature runs – Parallel Tempering
  • Two configurations with energy and temperature (E1, T1) and (E2, T2)

Temperature swapped with probability

P = min{1, exp[-(E2-E1)(1/kT1 – 1/kT2)]}

  • Mode of operation
    • Send same peptide at different temperature to many clients; let run; collect; swap T’s by multiple parallel tempering; randomly redistribute peptides with new T’s to clients
multi temperature annealing ii



Old temperatures


Swap temps by

Multiple “peptide”








New temperatures


Multi-temperature Annealing (II)
potential of massive distributive computing
Potential of Massive Distributive Computing
  • Simulation of folding a small peptide for 100ns
    • Each run (105 simulation steps; 100 ps) ~100 min PC time
    • 1000 runs (100 ns) per “fold” ~105 min
    • Approx. 70 days on single PC running 24h/day
  • Ideal client contribute 8h/day
    • 100 clients 70x3/100 = 2 days per fold
    • 10,000 clients 50 folds/day(small peptide)
  • Mid-sized protein needs > 1 ms to fold
    • 106 days on single PC
    • 10,000 clients ~300 days
    • 106 clients (!!) ~3 days
  • Launched –August 2002
  • Small PC-cluster – October 2002
    • In-house runs to learn codes
  • Infrastructure for Distributive Computation
    • InstallationGromacs & COSM – January-March 2003
  • Test runs
    • IntraLaboratory test run – March-October 2003
    • NCU test run – July-October 2003
  • Launched on WWW – November 20 2003
  • Scientific studies
    • Getting familiar w/ MD and folding of peptides
    • Looking for ways to increase MD time step
current status of pac
Current status of PAC
  • Last beta version Pac v0.9
    • Released on July 15
    • To lab CBL members & physics dept
    • About 25 clients
  • First alpha version Pac v1.0 released October 1 2003
  • Current version Pac v1.2
    • Releases to public on 20 November 2003
    • In search of clients
      • Portal in “Educities”,700 downloads, ~700 active clients
      • PC’s in university administrative units
      • City halls and county government offices
      • Talks and visits to universities and high schools
some current simulations

1L2Y: (20 res.)

NMR Structure Of Trp-Cage Miniprotein Construct Tc5B; synthetic.

1SOL: (20 res.) A Pip2 and F-Actin-Binding Site Of Gelsolin, Residue 150-169. One helix.

1ZDD: (35 res.) Disulfide-Stab-ilized Mini Protein A Domain.

Two helices.

Some current Simulations
a small test case 1sol
A small test case – 1SOL
  • Target peptide – 1SOL.pdb
    • 20 amino acids; 3-loop helix

and 1 hairpin; 352 atoms;

~4000 bonds interaction

    • Unit time step= 1 fs
  • Compare constant temperature and parallel-tempering
    • Constant T @ 300K
    • Parallel-tempering with about 20 peptides, results returned to server for swapping after each “run”, or 105 time steps (100 ps)

Parallel-tempering (1SOL)

Temperature (K)

Number of runs (in units of 100 ps)

P = min{1, exp[-(E2-E1)(1/kT1 – 1/kT2)]}


Initial structure

Native conformation

Const temp. (20ns)

Parallel-temp. (1.6ns)

Preliminary result on 1SOL

a second test case 1l2y
A second test case – 1L2Y
  • Simulation target

– Trp-Cage

  • 20 amino acids,

2 helical loops

  • A short, artificial and fold-by-itself peptide
  • Have been simulated with AMBER
  • Folding mechanism not well understood
a case in swap history 1l2y

Temperature (K)

Number of runs (in units of 100 ps)

A case in swap History (1L2Y)

Preliminary result on 1L2Y

(11 peptides)

Native state

Initial state

PAC 6ns

speeding up simulation separating the fast from slow modes
Speeding up simulation - Separating the fast from slow modes
  • Fast modes associated with bonded interactions
    • Bond-stretching vibrations ~ 10-20 fs
    • Bond-angle bending vibrations ~ 20-40 fs
  • Slow modes associated with dihedral angles
    • Of the order of 0.1 ns
    • Alpha-helix folds in ~ 1 -10 ns
    • Beta-sheets folds in ~ 10 -100 ns
    • Native structure ~ 1 ms -1 s
bonded interactions








Bonded interactions
  • Bond stretching
  • Harmonic angle potential
bond stretching vibrations
Bond-stretching vibrations

Bond-stretching vibrations with an approximate oscillation or relaxation time ζ≈10 fs for bond involving a hydrogen atom (C-H)

bond stretching vibrations ii
Bond-stretching vibrations (II)

Std < 0.03 A; very small compared with tolerance in structure. Most codes including GROMACS and AMBER have option to freeze out degree of freedom.

bond angle bending vibrations
Bond-angle bending vibrations

Bond-angle bending vibrations with ζ ≈20 fs for bond angles involving hydrogen atom (H-N-C).

bond angle bending vibrations ii
Bond-angle bending vibrations (II)

Unique value with relatively small std (~ 3-5 degrees). But cannot be frozen; looking for ways to “half-freeze.”


Current and future efforts

  • Computing facility
    • expand the base of PAC clients; target 10,000
  • Data management
    • efficient server-client protocol
    • efficient management and analysis of data when client number is large
  • Running simulations
    • optimum implementation of parallel tempering
    • reduce size of water box
  • Dealing with fast modes
    • freeze bond stretching
    • isolate bond-angle bending deg. of freedom for special treatment; new (heavy) code-writing
    • target time-step: > 20 fs; ultimately 100 fs
the team
The Team
  • Funded by NRPGM/NSC
  • Computational Biology Laboratory

Physics Dept & Life Sciences Dept

National Central University

    • PI: Professor HC Lee (Phys & LS/NCU)
    • Jia-Lin Lo (PhD student)
    • Jun-Ping Yiu (MSc Res. Assistant)
    • Chien-Hao Wei (MSc RA)
    • Engin Lee ( MSc student )
    • Dr. Richard Tseng (PDF, since May 2004)
    • Visiting scientist: physicist/computer specialist (TBA)
Please visit

and let your PC take part in this project while you sleep

Thank you