1 / 34

The Grid and enabling applications for it

The Grid and enabling applications for it. CCPN/TEMBLOR Workshop, Hinxton, 19th May 2004. Mark Hayes, Technical Director, Cambridge eScience Centre. In the beginning…. "The collection of people, hardware, and software... will become a node in a geographically distributed

anneke
Download Presentation

The Grid and enabling applications for it

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Grid and enabling applications for it CCPN/TEMBLOR Workshop, Hinxton, 19th May 2004 Mark Hayes, Technical Director, Cambridge eScience Centre

  2. In the beginning… "The collection of people, hardware, and software... will become a node in a geographically distributed computer network…. Through the network... all the large computers can communicate with one another. And through them, all the members of the community can communicate with other people, with programs, with data, or with a selected combination of those resources.” J.C.R.Licklider, “The Computer as a Communication Device” Science and Technology, April 1968 The ARPAnet in 1970

  3. International connectivity - 1991

  4. International connectivity - 1997

  5. International bandwidth From “3D geographic network displays” - Cox et al, ACM Sigmod Record - December 1996

  6. What does the Internet look like? http://www.cybergeography.org/

  7. The World Wide Web Invented at CERN by Tim Berners-Lee in 1989 as a tool for collaboration and information sharing in the particle physics community.

  8. The Grid - 1998 Editors: Foster & Kesselman 700 pages 22 chapters 40 authors “A computational Grid is a hardware and software infrastructure that provides dependable, consistent, pervasive and inexpensive access to high-end computational capabilities.” Analogy with the electrical power grid - just plug in.

  9. The Grid - 2003 Editors: Berman, Hey, Fox 1000 pages 43 chapters 116 authors Applications, data sharing and virtual communities.

  10. 4 types of Grid • CPU intensive cycle scavenging (SETI@home) • Data sharing • Application provision • Human-human interaction (e.g. Access Grid)

  11. Early distributed computing 1.2 million CPU years so far... Brute force attempt to crack strong encryption Protein folding

  12. It’s not just compute cycles... An exponential growth in data from many areas of science.

  13. The data explosion - some big numbers • CFD turbulence simulations - 100TB • BaBar particle physics experiment - 1TB/day • CERN LHC will generate 1GB/s or 10PB/year • VLBA radio telescope generates 1GB/s today • NCBI/EMBL database is “only 0.5TB” but doubling each year • brain imaging - 4TB/brain at full colour, 10mm resolution • (4PB/brain at 1mm i.e. cellular resolution) • Pixar - 100TB/movie FTP and GREP are not adequate (Jim Gray)

  14. Application provision • Google - 10K cpus, 2PB database (2 years ago) • free email services - HotMail, Yahoo! 2-10PB storage • netsolve - numerical algorithms on demand • with Matlab & Mathematica plugins • renderfarm.net - graphics rendering on demand

  15. Presenter mic Presenter camera Ambient mic (tabletop) Audience camera The Access Grid High end video conferencing and collaboration technology. O(100) nodes world wide. “...one of the most compelling glimpses into the future I’ve seen since I first saw NCSA Mosaic.” Larry Smarr

  16. The Grid in the UK Pilot projects in particle physics, astronomy, medicine, bioinformatics, environmental sciences... Contributing to international Grid software development efforts 10 regional “eScience Centres”

  17. Some UK Grid resources • Daresbury - loki - 64 proc Alpha cluster • Manchester - green - 512 proc SGI Origin 3800 • Imperial - saturn - large SMP Sun • Southampton - iridis - 400 proc.Intel Linux cluster • Rutherford Appleton Lab - hrothgar - 32 proc Intel Linux • Cambridge - herschel - 32 proc Intel Linux cluster • ... • coming soon: 4x >64 CPU JISC clusters, HPC(X)

  18. Applications on the UK Grid Ion diffusion through radiation damaged crystal structures (Mark Calleja, Earth Sciences, Cambridge) • Monte Carlo simulation lots of independent runs • small input & output • more CPU -> higher temperatures, better stats • access to ~100 CPUs on the UK Grid • Condor-G client tool for farming out jobs

  19. Applications on the UK Grid Reality Grid (Stephen Pickles, Robin Pinning - Manchester) • Fluid dynamics of complex mixtures, e.g • oil, water and solid particles (mud) • Used CPU at London, Cambridge • Remote visualisation using SGI • Onyx in Manchester (from a laptop • in Sheffield) • Computational steering

  20. Applications on the UK Grid GENIE - Grid Enabled Integrated Earth system model (Steven Newhouse, Murtaza Gulamali - Imperial) • Ocean-atmosphere modelling • How does moisture transport from the • atmosphere effect ocean circulation? • ~1000 independent 4000year runs • (3 days real time!) on ~200 CPUs • Flocked condor pools at London & Southampton • Coupled modelling

  21. £1 buys... • 1 day of cpu time • 4 GB ram for a day • 1 GB of network bandwidth • 1 GB of disk storage • 10 M database accesses • 10 TB of disk access (sequential) • 10 TB of LAN bandwidth (bulk)

  22. SpeedMbps Rent$/month $/TBSent Context $/Mbps Time/TB 0.04 40 1,000 3,086 6 years Home phone Home DSL 0.6 70 117 360 5 months T1 1.5 1,200 800 2,469 2 months T3 43 28,000 651 2,010 2 days OC3 155 49,000 316 976 14 hours OC 192 9600 1,920,000 200 617 14 minutes 100 Mpbs 100 1 day Gbps 1000 2.2 hours How do you move a terabyte? Source: Terascale SneaketNet, Jim Gray et al

  23. Some consequences Compute cycles are (almost) free... by comparison with network costs. -The cheapest and fastest way to move 1TB of data out from CERN is still by FedEx. Though this considers only bandwidth, low latency networks are even more expensive! (MPI over WAN doesn’t work well.)

  24. What makes a good Grid application? A distributed community of users. Tiny network input & output, huge compute requirement. Database access & storage is also expensive, therefore put the computation near the data.

  25. Web services • A web service is a network-accessible application • identified by a URI • e.g. http://terraservice.net/TerraService.asmx?op=GetTile • with an interface defined in terms of XML based messages • these messages transported by internet protocols (usually HTTP) • The application & its interface definition should be • ‘discoverable’ by other applications • independent of OS platform & programming language. • W3C standards body: http://www.w3c.org/

  26. Acronym soup XML - eXtensible Markup Language XSLT - eXtensible Stylesheet Language Transformations SOAP - Simple Object Access Protocol WSDL- Web Service Description Language UDDI - Universal Description, Discovery & Integration protocol BPEL - Business Process Execution Language WSIF - Web Services Invocation Framework …..

  27. terraservice.net Web service interface to http://terraserver.microsoft.com/ Example app: US Department of Agriculture have a database of soil properties, ‘federated’ with terraservice.net to provide geographical & topographic detail.

  28. Databases available as Web Services • Google • Amazon • SDSS SkyServer • EMBL • EBI-MSD • EBI Open Bibliographic Query Service • ... • http://www.escience.cam.ac.uk/services/dblist.html

  29. Radar scattering from aircraft Aim: increase the efficiency of the aircraft design engineering process & the scale of radar scattering simulations to otherwise intractable objects (i.e. whole aircraft) A collaboration between the University of Cambridge Department of Applied Mathematics & Theoretical Physics (DAMTP) and the BAE Advanced Technology Centre at Filton. Mark Spivack (PI), Andrew Usher (visualisation programmer), Xiaobo Yang (scientific programmer), CC-HPCF, new cluster, BAE input: expertise, data,...

  30. BAE Cambridge Workflow Portal HPCF Reflection data Visualisation CAD Design

  31. Visualisation tools Based on the Visualisation Toolkit - open source C++ library cross platform, extendable, large user base - http://www.vtk.org Surface currents, virtual fly through, looking for “hot-spots”

  32. Increasing efficiency The calculation can be split into a two stage process: • Initial long-running, high fidelity calculation of induced surface • currents on the HPCF. • 3D electromagnetic fields can be calculated on a cluster. • Using an approximation technique currently under development, • subsequent small changes can be re-calculated on the cluster. • In theory, this would allow interactive design of the aircraft without • the need for scheduling long-running jobs.

  33. Tying it all together… with Web Services

  34. Questions?

More Related