1 / 28

Building a Massive Virtual Screening using Grid Infrastructure

Building a Massive Virtual Screening using Grid Infrastructure. Chak Sangma Centre for Cheminformatics Kasetsart University. Putchong Uthayopas High Performance Computing and Networking Center, Kasetsart University. Motivation. Thailand’s Medicinal Plants is important for Thai society

Download Presentation

Building a Massive Virtual Screening using Grid Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Building a Massive Virtual Screening using Grid Infrastructure Chak Sangma Centre for Cheminformatics Kasetsart University Putchong Uthayopas High Performance Computing and Networking Center, Kasetsart University

  2. Motivation • Thailand’s Medicinal Plants is important for Thai society • Over 1,000 species • Over 200,000 compounds • Multiple disease targets • Problem • No complete collection of compounds database • The practice is still mostly rely on local knowledge and conventional wisdom • Lack of systematic verifications by scientific methods SIATIC PENNYWORT Bariena lunulina Linae

  3. Kasetsart University Thai Medicinal Plants Effort • Led by Center for Cheminformatics, Kasetsart University (Dr. Chak Sangma) • Goal • Establish Thai medicinal plant knowledgebase by building 3D molecular database • Employ Virtual Screening to verify active compounds with conventional knowledge

  4. Reports and Literatures 2D Structures Approximated 3D Structures Compute Intensive! Optimized 3D Structures with GAMESS Calculated Binding Energy with Autodock 3.0 Structure in 0.5 Å fromBinding Site SOM Neural Network Map Results

  5. ThaiGrid Drug Design Portal • Partners • High Performance Computing and networking Center, KU • Center for Cheminfomatics, KU • IBM Thailand • Goal • Building a virtual screening infrastructure on ThaiGrid System • Start from KU campus Grid and extended to other ThaiGrid partner universities later • Link • http://tgcc.cpe.ku.ac.th • http://www.thaigrid.net

  6. Challenge • Recent project for National Center for Genetic Engineering and Biotechnology, Thailand • Screen 3000 compounds in 3 months • Computation time on 2.4 GHz Pentium IV 4 system • Over 30 mins/1 optimized structure • Over 30 mins/1 docking • Estimate computing time on single processor • (3,000 x 30) + (3,000 x 30) • 3,000 Hours • 125 Days • 4 month 16 days • Not fast enough!

  7. Key Technologies • Three key technologies must be combined to provide the solution • Cluster Computing • Grid Computing • Portal Technology

  8. What we want to do? Hide the complexity of Grid and computational chemistry software from scientists while providing massive computational power needed

  9. Infrastructure • ThaiGrid infrastructure are used • 10 Clusters from 6 organizations • AMATA – KU • GASS – KU • MAEKA – KU • WARINE – KU • CAMETA – SUT • OPTIMA - AIT • ENQUEUE – KMUTNB • PALM – KMUTNB • SPIRIT – CU • INCA - KMUTT • 158 CPUs on 110 nodes

  10. Portal SQMS/G SCMSWeb Globus 2.4 SQMS SQMS SQMS SQMS AMATA Warine GASS Maeka KU Gigabit Campus Network Software Architecture • Each cluster has local scheduler • SGE, OpenPBS, Condor can be used • We use our SQMS scheduler • Globus2.4 is used as middleware • Resources control and security (GSI) • Grid level scheduler control multi-cluster job submission • Use KU own SQMS/G

  11. The Portal • Roles • User interface • Automate execution flow • File access and management • Features • Create project • Add ligand, enzyme • Submit screening job, monitor job status • Download output • Current portal is built using Plone • http://www.plone.org/ • Python based web content management • Flexible and extensible

  12. How things work! Task Task Resource Broker (SQMS/G) Portal Grid Middleware Globus2.4 Task Task Task Monitor Compute Resource Compute Resource Compute Resource Compute Resource Compute Resource KU Campus network

  13. XK-263 Results • The first version of compound databases (around 3,000 compounds) • 3,000 compounds screened ( found 30 high potential compounds) • 4 drug targets (Influenza, HIV-RT, HIV-PR, HIV-IN)

  14. Experiences • Some files such as enzyme structure and output are very large. • Require a good bandwidth between sites • Some simple optimizing techniques can help • Implements caching of enzyme structure file at target hosts. Substantially reduce the number of transfer needed • Batch scheduling approach is good if the systems are very homogenous • Allow dynamic execution code staging to the target host without installation/recompilation • Many script tools must be developed to • Streamline the execution • Handling data and code staging • Cleanup the execution

  15. Next Generation Massive Screening on Grid • Move to Service Oriented Grid • Use Grid and Web services to encapsulate key applications • Build broker and service discovery infrastructure • Rely heavily on OGSA and GT3.X, 4.X • Portlet based portal • JSR 168: Portlet Specification compliance • More modular , customizable, flexible • Plan to adopt GridShpere from gridlab (www.gridlab.org) • Use database as backend instead of files • OGSA DAI might be used for data access

  16. Progress • We are working on • New portal using GridSphere technology (done, testing) • Service wrapper for lagacy code • Gamess, autodock (done, testing) • MMJFS interface ( progress) • OGSA DAI integration (progress) • Service Registration and Discovery (partial) • Broker System ( design) • New Monitoring (done) • Schedule • Finish and testing Jan-Feb 2005 • Deploy in March 2005

  17. File Server Molecular DB Grid Ftp Gamess Scheduler OGSA DAI MMJFS Gamess Service Portlet Gamess Portal Registration Server Broker Server Backend DB

  18. Design Choices • Mass Data Transportation across site • Central ftp server is used to store data/database • Each compute node can pull required data from this ftp • Adhoc – ftp , wget/http (firewall friendly) • Next – Grid ftp • Cluster/ Single server • Gridify using service wrapper to expose grid service of that lagacy application to the grid • Not working for cluster since compute node are hidden behind head node • Back to MMJFS interface that talk to local shceduler

  19. Design Choices • Service Discovery Mechanism • Publish/subscribe model • Service advertising interface/protocol • Backend data based that shared between registration service component and broker component • Adoption of Grid Notification service and model • Available from mygrid project, seems to be useful for more dynamics environment • Scalability…. Broker Service Registration Service Discovery (SQL)

  20. Job Submission Job Status Result visualization

  21. Performance Record System Status Job Queue Monitoring

  22. Service Discovery

  23. Conclusion • Grid and cluster computing is a key technology that can give us the power. Grid works if use wisely! • Challenges • Grid standard is still rapidly evolving • Things change before you can finish! • Difficult to configure, maintain, Some part is still unstable • Firewall and security concern • Lack of manpower with expertise • Opportunity • Secure infrastructure • Cost reduction by the integration of networked resources on demand

  24. Acknowledgement • HPCNC Team • Somsak Sriprayoonsakul • Nuttaphon Thangkittisuwan • Thanakit Petchprasan • Isiriya Paireepairit

  25. The End

  26. Backup

  27. Process GRID 3D Structure 2D Structure GAMESS GAMESS GAMESS GAMESS GAMESS Molecular Structure Database Optimized 3D Structure Autodock Autodock Autodock Enzyme Grid Enzyme Autodock SOM Neural Network Analysis Results

  28. Workflow Engine Grid Portal Portlet Portlet Portlet Portlet Grid Middleware (OGSA ) OGSA DAI Optimizing Services Docking Services Broker Services Molecule Database Resources ( Computer, Network) Monitoring Services

More Related