1 / 30

Using Grid Computing

Using Grid Computing. David Groep, NIKHEF 2002-07-15. The Grid, But Why?. Physics @ CERN LHC particle accellerator operational in 2007 5-10 Petabyte per year 150 countries > 10000 Users lifetime ~ 20 years. 40 MHz (40 TB/sec). level 1 - special hardware.

cwolford
Download Presentation

Using Grid Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Grid Computing David Groep, NIKHEF2002-07-15

  2. The Grid, But Why? • Physics @ CERN • LHC particle accellerator • operational in 2007 • 5-10 Petabyte per year • 150 countries • > 10000 Users • lifetime ~ 20 years 40 MHz (40 TB/sec) level 1 - special hardware 75 KHz (75 GB/sec) level 2 - embedded 5 KHz (5 GB/sec) level 3 - PCs 100 Hz (100 MB/sec) data recording & offline analysis

  3. Estimated CPU capacity required at CERN Estimated CPU Capacity at CERN 5,000 4,500 4,000 3,500 3,000 2,500 K SI95 2,000 Other experiments 1,500 LHC experiments 1,000 500 0 Moore’s law – some measure of the capacity technology advances provide for a constant number of processors or investment 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 year Jan 2000:3.5K SI95 < 50% of the main analysis capacity will be at CERN CPU & Data Requirements

  4. More Reasons Why ENVISAT • 3500 MEuro programme cost • 10 instruments on board • 200 Mbps data rate to ground • 400 Tbytes data archived/year • ~100 `standard’ products • 10+ dedicated facilities in Europe • ~700 approved science user projects

  5. And More … Bio-informatics • For access to data • Large network bandwidth to access computing centers • Support of Data banks replicas (easier and faster mirroring) • Distributed data banks • For interpretation of data • GRID enabled algorithms BLAST on distributed data banks, distributed data mining

  6. Common Ground • Large amounts of data • Distributed, ad-hoc user community • Problems are distributable • Need for resources grows faster than market • Network grows faster than the application needs • Willingness to share resources … • … if security and integrity is guaranteed

  7. The One-Liner • Resource sharing and coordinated problem solving in dynamic multi-institutional virtual organisations

  8. What is Grid computing? • Dependable, consistent and pervasive access • Combining resources from various organizations • `Virtual Organizations’ – user-based view on Grid • Technical challenges: • transparent decisions for the user • uniformity in access methods • secure & crack resistant • authentication, authorization, accounting (AAA) &quota

  9. Grid Middleware • Globus Project started 1997 • de facto-standard • Reference implementation of Gridforum standards • Large community effort • Basis of several projects, including EU-DataGrid • Toolkit `bag-of-services' approach • Successful test beds, with single sign-on, etc…

  10. In The Beginning • Distributed Computing • synchronous processing • High-Throughput Computing • asynchronous processing • On-Demand Computing • dynamic resources • Data-Intensive Computing • databases • Collaborative Computing • science Ian Foster and Carl Kesselman, editors, “The Grid: Blueprint for a New Computing Infrastructure,” Morgan Kaufmann, 1999

  11. Applications Grid Architecture Make all resources talk standard protocols Promote interoperability of application toolkit, similar to interoperability of networks by Internet standards Application Toolkits Condor-G DUROC MPICH-G2 VLAM-G Grid Services MDS ReplicaSrv GridFTP GRAM Grid Security Infrastructure (GSI) Grid Fabric Condor MPI PBS SUN Internet Linux

  12. OGSA: new directions • Looks superficially like `web services’ • Based on common standards: • WSDL • SOAP • UDDI • Adds: • Transient services • State of distributed activities • Workflow, videoconf, distributed data analysis • Management of service instances • Grid Security Infrastructure

  13. Looking for Resources • Resource Brokerage based on matchmaking (Condor) • Information Services Mesh • Meta-computing directory • Replica Catalogues DataGrid http://marianne.in2p3.fr/

  14. Submitting a Job

  15. Locating a Replica • Grid Data Mirror Package • Moves data across sites • Replicates both files and individual objects • Catalogue used by Broker • Replica Location Service (giggle) • Read-only copies “owner” by the Replica Manager. http://cmsdoc.cern.ch/cms/grid

  16. Sending Your Data • Tape robots, disks, etc. share GridFTP interface • Supports single-sign-on and confidentiality • Optimize for high-speed >1Gbit/s networks • In the future: automatic optimizations, bandwidth reservations, directory-enabled networking, …

  17. Grid-enabled Databases? • SpitFireuniform access to persistent storage on the Grid • Multiple roles support • Compatible with GSI (single sign-on) though CoG • Uses standard stuff: JDBC, SOAP, XML • Supports various back-end data bases http://hep-proj-spitfire.web.cern.ch/hep-proj-spitfire/

  18. DataGrid Test Bed 1 • DataGrid TB1: • 14 countries • 21 major sites • Growing rapidly • Submitting Jobs: • Login only once,run everywhere • Cross administrativeboundaries in asecure and trusted way • Mutual authorization

  19. Amsterdam Leiden Enschede KNMI Utrecht Delft Nijmegen DutchGrid Platform • DutchGrid: • Test bed coordination • PKI security • Participation by • NIKHEF:FOM, VU, UvA, Utrecht, Nijmegen • KNMI, SARA • AMOLF • DAS-2 (ASCI):TUDelft, Leiden, VU, UvA, Utrecht • Telematics Institute

  20. And now for some Technical Details For Users

  21. Start using the grid • All the necessary “client tools” are on all Linux and Solaris systems • You just need: • Credentials/tokens for the Grid (see next slides) • Authorization to use resources(you get all NIKHEF resources by default) • Information on which resources to use effectively

  22. Your Grid Credentials • You will use resources across several domains • You may not care about security and authorization • But the remote site admin will ! • All communications are authenticated usingX.509 “Public Key” Certificates • The technology used to securecredit card transactions on the web (https://……) • Uniquely binds name/affiliation to a digital token

  23. Certification Authorities • CA’s act as trusted third parties • Remote sites trust the CA for a proper binding • They will not do authentication again, soonly authorization left. • CA’s are highly valuable: crack one to impersonate others on the Grid(and abuse resources) • Registration Authorities do in-person ID checks

  24. CA’s in DataGrid • 10 National CA’s (one per EU country) • Each one has a detailed policy and practice statement • NIKHEF operates the CA for DutchGridSee http://www.dutchgrid.nl/ca • Get a “certificate” from the DutchGrid CAbefore you can start using the Grid • It’s valuable, protect it with a pass phrase • One cert valid for all DataGrid sites

  25. The Proxy • A `proxy certificate’ is a limited-lifetime delegationwithout a pass phrase to protect it • Implements the single sign-on for Grid • Valid for 12 hours (by default) • Use it to: • Run your jobs • Get access to your data • Get it, by running grid-proxy-init

  26. Now see for yourself

  27. Getting a Certificate • Initialize your environment for the Grid • Use the Globus local guide fromhttp://www.dutchgrid.nl/Support/ • Send the result to ca@nikhef.nlyou will be contacted by phone • Put the certificate (sent by mail) in your$HOME/.globus/usercert.pem • Or use the Web at http://certificate.nikhef.nl/userhelp.html

  28. Using the Grid • Request authorization: grid.support@nikhef.nl • Look what is out there using grid-info-searchorhttp://marianne.in2p3.fr/datagrid/giis/giis-browse.html • Try some local hosts: • bilbo, kilogram, triangel kilogram:davidg:1009$ globus-job-run dommel.wins.uva.nl /usr/ucb/quota -v Disk quotas for random (uid 12xxx): Filesystem usage quota limit timeleft files quota limit timeleft /home/random 13067 1500000 2000000 0 0 0 kilogram:davidg:1010$ • Start running your analysis/MC/other jobs

  29. GridFTP • Universal high-performance file transfer • Extends the FTP protocol with: • Single sign-on (GSI, GSSAPI, RFC2228) • Parallel streams for speed-up • Striped access (ftp from multiple sites to be faster) • Clients: gsincftp, globus-url-copy.

  30. What’s Next? • Some of the nice user-features to come: • Finding data files by characteristics(give me all golden decay’s) • Moving your job to where the data is • Automatic partitioning of jobs • Support true-interactive work • Better network utilisation (faster access to data) • ……… • If you are in the DataGrid project, ask your WP leader for authorization in TB1

More Related