eScience and Grid Tools and techniques for the next generation scientist

eScience and GridTools and techniques for the next generation scientist Professor Brian VinterHead of the Copenhagen eScience Center

eScience «The next 10 to 20 years will see computational science firmly embedded in the fabric of science – the most profound development in the scientific method in over three centuries.» US Department of Energy 2003.

Mega-Science • The next scientific period will be dominated by Mega-Science projects • 104 researchers on a single project • Extreme data production • Highly integrated collaboration between different groups of scientists • Examples • CERN LHC • ALMA • Mars project

Data Production 1 Exabyte = 1000 Petabytes 1 Petabyte = 1000 Terabytes 1 Terabyte = 1000 Gigabytes 1 Gigabyte = 1000 Megabytes • 1997: Total data worldwide app 12 exabytes (incl. documents, film, TV, pictures, …)1 • 1999: 2-3 exabytes data produced2 • 2002: App. 5 exabytes data produced2 Global data availablity doubles every 4-5 years. 1) http://www.lesk.com/mlesk/ksg97/ksg.html 2) http://www.sims.berkeley.edu/research/projects/how-much-info-2003/

eScience Components • Modeling and simulation

eScience Components • Modeling and simulation • Data acquisition and handling

eScience Components • Modeling and simulation • Data acquisition and handling • Visualization

eScience Components • Modeling and simulation • Data acquisition and handling • Visualization • HPC and Grid

Why is it getting more difficult? 54 molecules 442 molecules 1372 molecyles

System sizes and time scales

Nano-modeling • Extremely CPU- and Data-intensive algorithms • Complex structure-calculations • Multiple days of execution even on a supercomputer • Runs of both PCs and Supercomputers

eScience and Bio/Med • We expect very good results form eScience in biology and medicine • The foremost advantages will come from introducing a mathematical causal understanding of biological systems • Bio-informatics are already doing this • An emerging field: Systems Biology • Systems Medicine is also starting internationally

Calculations in treatment • Computational methods are already important in medical planning • Radiation planning • Bypass flow modeling • Robotic surgery • …

Personalized medicine • Every human is unique • Also at the genetic level • In our genome, which is written with the alphabet ACGT, we have a number of micro mutations – called single nucleotide polymorphisms, SNP • These SNPs are often without consequence but • Some make us sick • Some are indicators of a faulty gene • Others influence our reception of a drug • The last complication makes is very hard to make drugs for the general population • We want to move from commodity medicine to custom tailored drugs

An example • app 60% of today's medicines are metabolized by cytochrome P450 enzymes • Some have highly efficient P450 while others have very slow and inefficient P450 • Knowledge of a patients P450 level will allow us to dose medicine to the individual much more efficiently • This is already in early use

And this is eScience how? • Developing a drug is not a linear process • The human genome is written with billions og letters • Any person has millions of SNP mutations • Finding the SNP that has an effect is a highly complex computational task

eScience and geology • Geology and hydrology too has been using computational methods for a long time • There are very interesting aspects in combining different methods • i.e. include biological systems in the models • Inverse mapping of seismic data • It turns out that we use the same techniques in medicine • And soon in industry

Grid Minimum intrusion Grid

GRID Resource User GRID Resource User GRID Resource User Resource Minimum intrusion Grid

Processing plants • Like the power grid the computing Grid has many types of power producers • High yield power plants (fossil fuel, nuclear,…) • Supercomputers and large farms • Low yield producers (windmills, etc) • Individual PCs and games-consoles • Very low yield producers (solar panels, etc.) • Web-browers

One Click

Interactive Applications

VGrids • Best thing since sliced bread  • VGrids are Virtual Organizations in MiG • They are a dead easy way to create collaborations • Share files • Share resources • Private entry page • Public Web-page

Portals • VO’s can generate their own private entry pages including application portals

Files in VGrids • A user must keep her personal home-directory independent of which VGrid she works in • But VGrids have a common directory where only members of the VGrid are allowed • These are represented as directories in the users home-directory • VGrid owners can create sub-VGrids

Examples eScience on Grid

GeneRecon • GeneRecon seeks to identify genetic factors behind heretical deceases • The overall idea is to compare two genomes • One where the decease is observed • One where the decease is not observed • App 1000 individuals in each set • GeneRecon is developed at the Bioinformatics Research Center, Århus University

GeneRecon • The Algorithm is a Markov-chain Monte Carlo method • A test run consists of app. 30.000 individual tests • One test runs form 1 to 10 days on a PC • In total no less than 82 CPU years • MiG hosted the execution on Grid and got the execution down below a month

Statistics • 1315 jobs were submitted to Grid at the same time • 0 jobs were lost • First result • 2:04:44 • Last result • 28 days, 5:42:54

Groundwater modeling on Funen Calibration of the Assens model: 1 model evaluation = 30 min 920 model evaluations = 19 days

Master Client Client Client Client Days to hours AUTOCAL OfficeGRID

Drug Design • Molecular docking is a time consuming calculation process which this project does through two steps • First step is a coarse calculation that can eliminate molecules that won’t dock • This process can run on PCs and PS3’s – a lot of work is being done towards efficient utilization of the CELL CPU for molecular docking • The molecules that survive the first step are then modeled more precisely at quantum level on classic supercomputers and clusters

SeGrid • Still a proposal • The idea is to share sensitive data through Grid and use the Grid technology to manage access control and automatic anonymization

More information • www.eScience.dk • Portal for KUs eScience activities • www.migrid.org • Portal for the Minimum intrusion Grid • www.rcuk.ac.uk/escience/ • The very ambitious UK eScience program

eScience and Grid Tools and techniques for the next generation scientist

eScience and Grid Tools and techniques for the next generation scientist

Presentation Transcript

Interfaces: The Next Generation Integration Broker and Similar Tools

Next-Generation HIL Design Tools for Next-Generation Vehicles

WebFOCUS Tips and Techniques The Next Generation

Interfaces: The Next Generation Integration Broker and Similar Tools

Search and the Crowd: Next-Generation Software Tools

Contrast Agents for CT-3 and Next Generation CT Techniques

Flashback Techniques for Oracle Database 11g and The Next Generation

Next Generation of Grid Services for the NorduGrid

Grid Workflow Tools, Techniques, Applications

gLite, the next generation middleware for Grid computing

Tools and techniques for:

Tools and Techniques

Tools and Techniques for the Data Grid

Tools and Techniques for the Data Grid

Tools And Techniques

Next Generation Teaching Tools

Contrast Agents and Next Generation CT Techniques

Tools and Techniques for the Data Grid

gLite, the next generation middleware for Grid computing

Mastering Lead Generation Strategies, Techniques, and Tools for Success