1 / 67

EGEE is a project funded by the European Union under contract IST-2003-508833

CERN Helpdesk Grid Educational Seminar 16 th June 2005. Global Grid User Support and the CERN Helpdesk Flavia Donno LCG/GGUS User Support CERN & INFN – Pisa. www.eu-egee.org. EGEE is a project funded by the European Union under contract IST-2003-508833. Why this seminar ? - Outline.

ivrit
Download Presentation

EGEE is a project funded by the European Union under contract IST-2003-508833

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CERN Helpdesk Grid Educational Seminar 16th June 2005 Global Grid User Support and the CERN Helpdesk Flavia DonnoLCG/GGUS User Support CERN & INFN – Pisa www.eu-egee.org EGEE is a project funded by the European Union under contract IST-2003-508833

  2. Why this seminar ? - Outline The Grid: what is it ? Why the Grid at CERN ? A bit of history: Globus How does it work ? Authentication and Certificates What are the Grid projects at CERN ? How are operations organized ? GGUS: The Grid Support infrastructure for Users and Site Admins The GGUS portal What is going to be available in the next months Shall we play ? CERN Helpdesk Educational Seminar – June 16, 2005 - 2

  3. Researchers perform their activities regardless of geographical location, interact with colleagues, share and access data Scientific instruments and experiments provide huge amount of data The GRID: networked data processing centres and ”middleware” software as the “glue” of resources. The Grid: what is it ? • Many definitions: • It's an aggregation of geographically dispersed computing, storage, and network resources, coordinated to deliver improved performance, higher quality of service, better utilization, and easier access to data. • It enables virtual, collaborative organizations, sharing applications and data in an open, heterogeneous environment. CERN Helpdesk Educational Seminar – June 16, 2005 - 3

  4. The Grid: compute and data grids • A compute grid is essentially a collection of distributed computing resources, within or across locations, which are aggregated to act as a unified processing resource or virtual supercomputer. • Collecting these resources into a unified pool involves coordinated usage policies, job scheduling and queuing characteristics, grid-wide security, and user authentication. • A data grid provides wide area, secure access to current data. Data grids enable users and applications to manage and efficiently use database information from distributed locations. Much like compute grids, data grids also rely on software for secure access and usage policies. Data grids can be deployed within one administrative domain or across multiple domains. Compute Grid Data Grid Computing Element Computing Element CERN Helpdesk Educational Seminar – June 16, 2005 - 4

  5. The Grid: clusters, intra-grids, extra-grids Once upon a time…….. mainframe Microcomputer Mini Computer Cluster CERN Helpdesk Educational Seminar – June 16, 2005 - 5

  6. The Grid: clusters, intra-grids, extra-grids …and today CERN Helpdesk Educational Seminar – June 16, 2005 - 6

  7. The Grid: clusters, intra-grids, extra-grids CERN Helpdesk Educational Seminar – June 16, 2005 - 7

  8. Why the Grid @ CERN • Scale of the problems • frontier research in many different fields today requires world-wide collaborations (i.e. multi-domain access to distributed resources) • GRIDs provide access to large data processing power and huge data storagepossibilities • As the grid grows its usefulness increases (more resources available) • Large communities of possible GRID users : • High Energy Physics • Environmental studies: Earthquakes forecast, geologic and climate changes, ozone monitoring • Biology, Genetics, Earth Observation • Astrophysics, • New composite materials research • Astronautics, etc. CERN Helpdesk Educational Seminar – June 16, 2005 - 8

  9. Why the Grid @ CERN CMS ATLAS ~6-8 PetaBytes / year ~108 events/year ~103 batch and interactive users LHCb CERN Helpdesk Educational Seminar – June 16, 2005 - 9

  10. Why the Grid @ CERN • High-throughput computing (based on reliable “commodity” technology) • More than 1000 (dual processor) PCs with Linux • More than 1 Petabyte of data (on disk and tapes) Nowhere near enough! CERN Helpdesk Educational Seminar – June 16, 2005 - 10

  11. Why the Grid @ CERN Europe: 267 institutes 4603 users Elsewhere: 208 institutes 1632 users • Problem: CERN alone can provide only a fraction of the necessary resources • Solution:Computing centers, which were isolated in the past, should now be connected, uniting the computing resources of particle physicists in the world! CERN Helpdesk Educational Seminar – June 16, 2005 - 11

  12. A bit of history: Globus • The idea starts with the publication of the book “The Grid: Blueprint for a New Computing Infrastructure” – July 1998 • Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resource. Enable communities (“virtual organizations”) to share geographically distributed resources as they pursue common goals -- assuming the absence of… • central location, • central control, • omniscience, • existing trust relationships. • The Globus Toolkit implements the basic building blocks Ian Foster Karl Kesselman Steve Tuecke CERN Helpdesk Educational Seminar – June 16, 2005 - 12

  13. Resource Info Server Local Resource Manager Gatekeeper Job Manager The Globus Toolkit Client MDS client API calls to locate resources Site boundary MDS: Grid Index Info Server MDS client API calls to get resource info GRAM client API calls to request resource allocation and process creation. Computing Element Query current status of resource Grid Security Infrastructure Allocate & create processes GRAM client API state change callbacks Process Process RSL Library Monitor & control Parse Process CERN Helpdesk Educational Seminar – June 16, 2005 - 13

  14. The Globus Toolkit: a bit slower ;-) Local Resource Management System (LRMS) • Manage the local computing resources in the Computing Element • Often, batch systems are used • PBS (Portable Batch System) • LSF (Load Sharing Facility) Interaction with local batch system through native batch client example: qsub my.executable Computing Element End user LRMS CERN Helpdesk Educational Seminar – June 16, 2005 - 14

  15. Grid Interface Grid Interface Computing Element Computing Element LRMS LRMS The Globus Toolkit: a bit slower ;-) Globus Resource Management • GRAM (Grid Resource Allocation Manager) • Service that provides a Grid interface to Local Resource Management System • Also referred to as “Gatekeeper” • Provides a general interface to different batch systems like PBS, LSF • User needs to specify the exact hostname of Computing Element End user example: globus-job-submit host1.cern.ch /bin/ls CERN Helpdesk Educational Seminar – June 16, 2005 - 15

  16. Grid Interface Grid Interface Computing Element Computing Element LRMS LRMS The Globus Toolkit: a bit slower ;-) See later: Authentication on the Grid Single Sign-on CERN Helpdesk Educational Seminar – June 16, 2005 - 16

  17. The Globus Toolkit: Other services Globus Replica Catalog (RLS= Replica Location Service) Registration of data Site B Site A GridFTP Data Movement Replication Data Storage, Access CERN Helpdesk Educational Seminar – June 16, 2005 - 17

  18. Data GRID datagrid countryA countryB siteA siteB siteC siteD information providers information providers information providers information providers The Globus Toolkit: the Monitoring and Directory Service (MDS) • There is a top level datagrid GIIS to which all of the country GIISs register • Each country has a GIIS to which all of the site GIISs register • Each Site has a Grid Information Index Server (GIIS) which acts as a single point of contact for all of the sites resources. The GRISs register with their site GIIS • Information providers publish information to a local LDAP server known as a Grid Resource Information Server (GRIS) CERN Helpdesk Educational Seminar – June 16, 2005 - 18

  19. How does it work at CERN ?Main components User Interface (UI): The place where users logon to the Grid Resource Broker (RB): Matches the user requirements with the available resources on the Grid Computing Element (CE): A batch queue on a farm of computers where the user Job gets executed Storage Element (SE): A storage server where Grid files are stored (read/write/copy) or replicated. RLS Catalogues(MDS/RLS): A storage server where Grid files are stored (read/write/copy) or replicated. CERN Helpdesk Educational Seminar – June 16, 2005 - 19

  20. UI Job Submission RLS Network Server RB node Inform. Service Workload Manager Job Contr. - CondorG SE chars. & status CE chars. & status Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 20

  21. UI Job Submission Job Status RLS Network Server submitted RB node Inform. Service Workload Manager UI: allows Alistair to access the functionalities of the WMS (via command line, GUI, C++ and Java APIs) Job Contr. - CondorG SE characts & status CE characts & status Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 21

  22. UI Job Description Language (JDL) to specify job characteristics and requirements • edg-job-submit myjob.jdl Myjob.jdl JobType = “Normal”; Executable = "$(CMS)/exe/sum.exe"; InputSandbox = {"/home/user/WP1testC","/home/file*”, "/home/user/DATA/*"}; OutputSandbox = {“sim.err”, “test.out”, “sim.log"}; Requirements = other. GlueHostOperatingSystemName == “linux" && other. GlueHostOperatingSystemRelease == "Red Hat 6.2“ && other.GlueCEPolicyMaxWallClockTime > 10000; Rank = other.GlueCEStateFreeCPUs; Job Submission Job Status RLS Network Server submitted RB node Inform. Service Workload Manager Job Contr. - CondorG SE characts & status CE characts & status Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 22

  23. submitted waiting UI Job Submission NS: network daemon responsible for accepting incoming requests Job Status RLS Job Network Server RB node Inform. Service Input Sandbox files Workload Manager RB storage Job Contr. - CondorG SE characts & status CE characts & status Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 23

  24. submitted waiting UI Job Submission Job Status RLS Network Server RB node Inform. Service Workload Manager RB storage WM: responsible to take the appropriate actions to satisfy the request Job Contr. - CondorG SE characts & status CE characts & status Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 24

  25. submitted waiting UI Job Submission Job Status RLS Network Server Match- Maker/ Broker RB node Inform. Service Workload Manager Where must this job be executed ? RB storage Job Contr. - CondorG SE characts & status CE characts & status Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 25

  26. submitted waiting UI Job Submission Job Status RLS Network Server Match- Maker/ Broker RB node Inform. Service Workload Manager Matchmaker: responsible to find the “best” CE where to submit a job RB storage Job Contr. - CondorG SE characts & status CE characts & status Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 26

  27. submitted waiting UI Where are (which SEs) the needed data ? Job Submission Job Status RLS Network Server Match- Maker/ Broker RB node Inform. Service Workload Manager What is the status of the Grid ? RB storage Job Contr. - CondorG SE characts & status CE characts & status Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 27

  28. submitted waiting UI Job Submission Job Status RLS Network Server Match- Maker/ Broker RB node Inform. Service Workload Manager CE choice RB storage Job Contr. - CondorG SE characts & status CE characts & status Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 28

  29. submitted waiting UI Job Submission Job Status RLS Network Server RB node Inform. Service Workload Manager Job Adapter RB storage Job Contr. - CondorG SE characts & status CE characts & status JA: responsible for the final “touches” to the job before performing submission (e.g. creation of wrapper script, etc.) Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 29

  30. submitted waiting UI ready Job Submission Job Status RLS Network Server RB node Inform. Service Workload Manager RB storage Job Contr. - CondorG JC: responsible for the actual job management operations (done via CondorG) SE characts & status CE characts & status Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 30

  31. submitted waiting UI ready scheduled Job Submission Job Status RLS Network Server RB node Inform. Service Workload Manager RB storage Job Contr. - CondorG SE characts & status CE characts & status Input Sandbox files Job Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 31

  32. submitted waiting UI ready scheduled running Job Job Submission Job Status RLS Network Server RB node Inform. Service Workload Manager RB storage Job Contr. - CondorG “Grid enabled” data transfers/ accesses Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 32

  33. submitted waiting UI ready scheduled running done Job Submission Job Status RLS Network Server RB node Inform. Service Workload Manager RB storage Job Contr. - CondorG Output Sandbox files Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 33

  34. submitted waiting UI ready scheduled running done Job Submission Job Status RLS Network Server RB node Inform. Service Workload Manager RB storage Job Contr. - CondorG edg-job-get-output <dg-job-id> Computing Element Storage Element CERN Helpdesk Educational Seminar – June 16, 2005 - 34

  35. submitted waiting UI ready scheduled running done Job Submission Job Status RLS Network Server Inform. Service Output Sandbox files Workload Manager RB storage RB node Job Contr. - CondorG Computing Element Storage Element cleared CERN Helpdesk Educational Seminar – June 16, 2005 - 35

  36. edg-job-status <dg-job-id> edg-job-get-logging-info <dg-jobid> UI Job monitoring Network Server LB: receives and stores job events; processes corresponding job status RB node Workload Manager Job status Logging & Bookkeeping Job Contr. - CondorG Log Monitor Log of job events LM: parses CondorG log file (where CondorG logs info about jobs) and notifies LB Computing Element CERN Helpdesk Educational Seminar – June 16, 2005 - 36

  37. The Grid : Authentication and authorization Users need to “logon” only once on the Grid UI The access to the Grid is guaranteed through the use of X.509 certificates A service is presented with the Certificate of the User and if the Certificate is OK, the user is guaranteed access to the Grid. Certificates are used everywhere in Grid whenever you need to identify a user. For instance, the Web Portal for Grid Support will allow access to users with valid Grid certificates. Alistair Grid Grid UI Let's see how it works! CERN Helpdesk Educational Seminar – June 16, 2005 - 37

  38. Cryptography • Mathematical algorithm that provides important building blocks for the implementation of a security infrastructure • Symbology • Plaintext: M • Cyphertext: C • Encryption with key K1: E K1(M) = C • Decryption with key K2: D K2(C) = M • Algorithms • Symmetric: K1 = K2 • Asymmetric: K1 ≠ K2 K1 K2 Encryption Decryption M C M CERN Helpdesk Educational Seminar – June 16, 2005 - 38

  39. Symmetric Algoritms • The same key is used for encryption and decryption • Advantages: • Fast • Disadvantages: • how to distribute the keys? • the number of keys is O(n2) • Examples: • DES • 3DES • Rijndael (AES) • Blowfish • Kerberos Paul John ciao 3$r 3$r ciao Paul John ciao 3$r 3$r ciao CERN Helpdesk Educational Seminar – June 16, 2005 - 39

  40. public Public Key Algorithms • Every user has two keys: one private and one public: • it is impossible to derive the private key from the public one; • a message encrypted by one key can be decripted only by the other one. • No exchange of secrets is necessary • the sender cyphers using the public key of the receiver; • the receiver decripts using his private key; • the number of keys is O(n). • Examples: • Diffie-Helmann (1977) • RSA (1978) Paul John ciao 3$r 3$r ciao Paul John ciao cy7 cy7 ciao Paul keys John keys public private private CERN Helpdesk Educational Seminar – June 16, 2005 - 40

  41. One-Way Hash Functions • Functions (H) that given as input a variable-length message (M) produce as output a string of fixed length (h) • the length of h must be at least 128 bits (to avoid birthday attacks) • given M, it must be easy to calculate H(M) = h • given h, it must be difficult to calculateM = H-1(h) • given M, it must be difficult to find M’ such that H(M) = H(M’) • Examples: • SNEFRU: hash of 128 or 256 bits; • MD4/MD5: hash of 128 bits; • SHA (Standard FIPS): hash of 160 bits. CERN Helpdesk Educational Seminar – June 16, 2005 - 41

  42. This is some message Digital Signature This is some message Paul keys = ? Digital Signature public private Digital Signature • Paul calculates the hash of the message • Paul encrypts the hash using his private key: the encrypted hash is the digital signature. • Paul sends the signed message to John. • John calculates the hash of the message and verifies it with the one received by A and decyphered with A’s publickey. • If hashes equal: message wasn’t modified; Paul cannot repudiate it. Paul This is some message Hash(A) Digital Signature John Hash(B) Hash(A) CERN Helpdesk Educational Seminar – June 16, 2005 - 42

  43. Digital Certificates • Paul’s digital signature is safe if: • Paul’s private key is not compromised • John knows Paul’s public key • How can John be sure that Paul’s public key is really Paul’s public key and not someone else’s? • A third party guarantees the correspondence between public key and owner’s identity. • Both A and B must trust this third party • Two models: • X.509: hierarchical organization; • PGP: “web of trust”. CERN Helpdesk Educational Seminar – June 16, 2005 - 43

  44. X.509 The “third party” is called Certification Authority (CA). • Issue Digital Certificates for users, programs and machines • Check the identity and the personal data of the requestor • Registration Authorities (RAs) do the actual validation • CAs periodically publish a list of compromised certificates • Certificate Revocation Lists (CRL): contain all the revoked certificates yet to expire • CA certificates are self-signed CERN Helpdesk Educational Seminar – June 16, 2005 - 44

  45. X.509 Certificates • An X.509 Certificate contains: • owner’s public key; • identity of the owner; • info on the CA; • time of validity; • Serial number; • digital signature of the CA Structure of a X.509 certificate Public key Subject:C=CH, O=CERN, OU=GRID, CN=Andrea Sciaba 8968 Issuer: C=CH, O=CERN, OU=GRID, CN=CERN CA Expiration date: Aug 26 08:08:14 2005 GMT Serial number: 625 (0x271) CA Digital signature CERN Helpdesk Educational Seminar – June 16, 2005 - 45

  46. State of Illinois ID Certificate Request User generatespublic/privatekey pair. CA confirms identity, signs certificate and sends back to user. CertRequest Public Key Certification Authority Cert Private Key encrypted on local disk User send public key to CA along with proof of identity. CERN Helpdesk Educational Seminar – June 16, 2005 - 46

  47. The Grid Projects at CERN : LCG The LCG (LHC Computing Grid) has started in 2002 Its goal is to build a world-wide computing infrastructure based on Grid middleware to offer a computing platform for the LHC experiments. http://www.cern.ch/lcg More than 23,000 HEP jobs running in a day concurrently. CERN Helpdesk Educational Seminar – June 16, 2005 - 47

  48. The Grid Projects at CERN : EGEE goals • EU funded project (4/2004 - 3/2006). EGEE-II will probably follow • Create a World-wide Grid production quality infrastructure on top of present and future EU infrastructure • Provide distributed European research communities with “round-the-clock” access to major computing resources, independent of geographic location • Change of emphasis from grid development to grid deployment • Support many application domains with one large-scale infrastructure that will attract new resources over time • Provide training and support for end-users CERN Helpdesk Educational Seminar – June 16, 2005 - 48

  49. The Grid Projects at CERN : EGEE strategy • Leverage current and planned national and regional Grid programmes, building on • theresults of existing projectssuch as DataGrid and others • the EU Research NetworkGeantand work closely with relevantindustrial Grid developers andNRENs • Support Grid computing needs common to the different communities • integratethe computing infrastructures and agree oncommon access policies • Exploit International connections (US and AP) • Provide interoperability with other major Grid initiatives such as the US NSF Cyberinfrastructure, establishing aworldwide Grid infrastructure CERN Helpdesk Educational Seminar – June 16, 2005 - 49

  50. The Grid Projects at CERN : EGEE partners • Leverage national resources in a more effective way for broader European benefit • 70 leading institutions in 27 countries organized into regional federations CERN Helpdesk Educational Seminar – June 16, 2005 - 50

More Related