1 / 34

eScience and Grid Tools and techniques for the next generation scientist

eScience and Grid Tools and techniques for the next generation scientist. Professor Brian Vinter Head of the Copenhagen eScience Center. e Science. «The next 10 to 20 years will see computational science firmly embedded in the fabric of science

Download Presentation

eScience and Grid Tools and techniques for the next generation scientist

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. eScience and GridTools and techniques for the next generation scientist Professor Brian VinterHead of the Copenhagen eScience Center

  2. eScience «The next 10 to 20 years will see computational science firmly embedded in the fabric of science – the most profound development in the scientific method in over three centuries.» US Department of Energy 2003.

  3. Mega-Science • The next scientific period will be dominated by Mega-Science projects • 104 researchers on a single project • Extreme data production • Highly integrated collaboration between different groups of scientists • Examples • CERN LHC • ALMA • Mars project

  4. Data Production 1 Exabyte = 1000 Petabytes 1 Petabyte = 1000 Terabytes 1 Terabyte = 1000 Gigabytes 1 Gigabyte = 1000 Megabytes • 1997: Total data worldwide app 12 exabytes (incl. documents, film, TV, pictures, …)1 • 1999: 2-3 exabytes data produced2 • 2002: App. 5 exabytes data produced2 Global data availablity doubles every 4-5 years. 1) http://www.lesk.com/mlesk/ksg97/ksg.html 2) http://www.sims.berkeley.edu/research/projects/how-much-info-2003/

  5. eScience Components • Modeling and simulation

  6. eScience Components • Modeling and simulation • Data acquisition and handling

  7. eScience Components • Modeling and simulation • Data acquisition and handling • Visualization

  8. eScience Components • Modeling and simulation • Data acquisition and handling • Visualization • HPC and Grid

  9. Why is it getting more difficult? 54 molecules 442 molecules 1372 molecyles

  10. System sizes and time scales

  11. Nano-modeling • Extremely CPU- and Data-intensive algorithms • Complex structure-calculations • Multiple days of execution even on a supercomputer • Runs of both PCs and Supercomputers

  12. eScience and Bio/Med • We expect very good results form eScience in biology and medicine • The foremost advantages will come from introducing a mathematical causal understanding of biological systems • Bio-informatics are already doing this • An emerging field: Systems Biology • Systems Medicine is also starting internationally

  13. Calculations in treatment • Computational methods are already important in medical planning • Radiation planning • Bypass flow modeling • Robotic surgery • …

  14. Personalized medicine • Every human is unique • Also at the genetic level • In our genome, which is written with the alphabet ACGT, we have a number of micro mutations – called single nucleotide polymorphisms, SNP • These SNPs are often without consequence but • Some make us sick • Some are indicators of a faulty gene • Others influence our reception of a drug • The last complication makes is very hard to make drugs for the general population • We want to move from commodity medicine to custom tailored drugs

  15. An example • app 60% of today's medicines are metabolized by cytochrome P450 enzymes • Some have highly efficient P450 while others have very slow and inefficient P450 • Knowledge of a patients P450 level will allow us to dose medicine to the individual much more efficiently • This is already in early use

  16. And this is eScience how? • Developing a drug is not a linear process • The human genome is written with billions og letters • Any person has millions of SNP mutations • Finding the SNP that has an effect is a highly complex computational task

  17. eScience and geology • Geology and hydrology too has been using computational methods for a long time • There are very interesting aspects in combining different methods • i.e. include biological systems in the models • Inverse mapping of seismic data • It turns out that we use the same techniques in medicine • And soon in industry

  18. Grid Minimum intrusion Grid

  19. GRID Resource User GRID Resource User GRID Resource User Resource Minimum intrusion Grid

  20. Processing plants • Like the power grid the computing Grid has many types of power producers • High yield power plants (fossil fuel, nuclear,…) • Supercomputers and large farms • Low yield producers (windmills, etc) • Individual PCs and games-consoles • Very low yield producers (solar panels, etc.) • Web-browers

  21. One Click

  22. Interactive Applications

  23. VGrids • Best thing since sliced bread  • VGrids are Virtual Organizations in MiG • They are a dead easy way to create collaborations • Share files • Share resources • Private entry page • Public Web-page

  24. Portals • VO’s can generate their own private entry pages including application portals

  25. Files in VGrids • A user must keep her personal home-directory independent of which VGrid she works in • But VGrids have a common directory where only members of the VGrid are allowed • These are represented as directories in the users home-directory • VGrid owners can create sub-VGrids

  26. Examples eScience on Grid

  27. GeneRecon • GeneRecon seeks to identify genetic factors behind heretical deceases • The overall idea is to compare two genomes • One where the decease is observed • One where the decease is not observed • App 1000 individuals in each set • GeneRecon is developed at the Bioinformatics Research Center, Århus University

  28. GeneRecon • The Algorithm is a Markov-chain Monte Carlo method • A test run consists of app. 30.000 individual tests • One test runs form 1 to 10 days on a PC • In total no less than 82 CPU years • MiG hosted the execution on Grid and got the execution down below a month

  29. Statistics • 1315 jobs were submitted to Grid at the same time • 0 jobs were lost • First result • 2:04:44 • Last result • 28 days, 5:42:54

  30. Groundwater modeling on Funen Calibration of the Assens model: 1 model evaluation = 30 min 920 model evaluations = 19 days

  31. Master Client Client Client Client Days to hours AUTOCAL OfficeGRID

  32. Drug Design • Molecular docking is a time consuming calculation process which this project does through two steps • First step is a coarse calculation that can eliminate molecules that won’t dock • This process can run on PCs and PS3’s – a lot of work is being done towards efficient utilization of the CELL CPU for molecular docking • The molecules that survive the first step are then modeled more precisely at quantum level on classic supercomputers and clusters

  33. SeGrid • Still a proposal • The idea is to share sensitive data through Grid and use the Grid technology to manage access control and automatic anonymization

  34. More information • www.eScience.dk • Portal for KUs eScience activities • www.migrid.org • Portal for the Minimum intrusion Grid • www.rcuk.ac.uk/escience/ • The very ambitious UK eScience program

More Related