1 / 80

The LHC Grid

The LHC Grid. Peter Kacsuk MTA SZTAKI. EGEE is funded by the European Union under contract IST-2003-508833. Acknowledgement. This tutorial is based on the work of many people: Fabrizio Gagliardi, Flavia Donno and Peter Kunszt (CERN) the EDG developer team the EDG training team

tale
Download Presentation

The LHC Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The LHC Grid Peter Kacsuk MTA SZTAKI EGEE is funded by the European Union under contract IST-2003-508833 DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -1

  2. Acknowledgement • This tutorial is based on the work of many people: • Fabrizio Gagliardi, Flavia Donno and Peter Kunszt (CERN) • the EDG developer team • the EDG training team • the NeSC training team • the SZTAKI training team DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -2

  3. What is LHC Grid? • LHC stands for Large Hadron Collider to be built by CERN http://lhc-new-homepage.web.cern.ch/lhc-new-homepage/ • The LHC will be put in operation in 2007 with many experiments collecting 5-6 PetaB data per year • The LHC Grid was built by CERN in order to provide storage and computing capacity for the process of this huge data set • The LHC Grid current version is called LCG-2 • It was built based on the sw developed by the European DataGrid project and by the Gryphin US project • Now LCG-2 is the first EGEE infrastructure DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -3

  4. What is LHC Grid? • The first EGEE infrastructure - Largest functioning Grid: • more than 70 sites, over 5,000 CPUs, 4,000 TB • 5,000 jobs simultaneously DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -4

  5. What is EGEE ? (I) • EGEE (Enabling Grids for Escience in Europe) is a seamless Grid infrastructure for the support of scientific research, which: • Integrates current national, regional and thematic Grid efforts, especially in HEP (High Energy Physics) • Provides researchers in academia and industry with round-the-clock access to major computing resources, independent of geographic location Applications Grid infrastructure Geantnetwork DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -5

  6. What is EGEE ? (II) • 70 leading institutions in 27 countries, federated in regional Grids • 32 M Euros EU funding (2004-5), O(100 M) total budget • Aiming for a combined capacity of over 20’000 CPUs (the largest international Grid infrastructure ever assembled) • ~ 300 dedicated staff DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -6

  7. EGEE Community DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -7

  8. EGEE infrastructure • Access to networking services provided by GEANT and the NRENs • Production Service: • in place (based on HEP LCG-2) • for production applications • MUST run reliably, runs only proven stable, debugged middleware and services • Will continue adding new sites in EGEE federations • Pre-production Service: • For middleware re-engineering • Certification and Training/Demo testbeds DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -8

  9. What do we expect from the Grid? • Access to a world-wide virtual computing laboratory with almost infinite resources • Possibility to organizedistributed scientific communities in VOs • Transparent access to distributed data and easy workload management • Easy to use application interfaces DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -9

  10. What are the characteristics of a Grid system? Numerous Resources Ownership by Mutually Distrustful Organizations & Individuals Connected by Heterogeneous, Multi-Level Networks Different Security Requirements & Policies Required Different Resource Management Policies Potentially Faulty Resources Geographically Separated Resources are Heterogeneous DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -10

  11. Apps Mware Globus The LCG-2 Architecture Local Computing Local Application Local Database Grid Grid Application Layer Data Management Metadata Management Job Management Collective Services Information & Monitoring Replica Manager Grid Scheduler Underlying Grid Services Computing Element Services Storage Element Services Replica Catalog Authorization Authentication & Accounting Logging & Book-keeping Database Services Grid Fabric services Fabric Monitoring and Fault Tolerance Node Installation & Management Fabric Storage Management Resource Management Configuration Management DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -11

  12. User Interface (UI) Information Service (IS) Computing Element (CE) Frontend Node Worker Nodes (WN) Storage Element (SE) Replica Catalog (RC,RLS) Resource Broker (RB) Main Logical Machine Types (Services) in LCG-2 DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -12

  13. User Interface • The initial point of access to the LCG-2 Grid is the User Interface • This is a machine where • LCG users have a personal account • The user’s certificate is installed • The UI is the gateway to Grid services • It provides a Command Line Interface to perform the following basic Grid operations: • submit a job for execution on a Computing Element; • list all the resources suitable to execute a given job; • replicate and copy files; • cancel one or more jobs; • retrieve the output of one or more finished jobs; • show the status of one or more submitted jobs. • One or more UIs are available at each site part of the LCG-2 Grid DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -13

  14. User Interface (UI) Information Service (IS) Computing Element (CE) Frontend Node Worker Nodes (WN) Storage Element (SE) Replica Catalog (RC,RLS) Resource Broker (RB) Main Logical Machine Types (Services) in LCG-2 DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -14

  15. Computing Element (CE) • Defined as a Grid batch Queue and identified by a pair <hostname>:<port>/<batch queue name> • Several queues defined for the same hostname are considered different CEs. For example: adc0015.cern.ch:2119/jobmanager-lcgpbs-long adc0015.cern.ch:2119/jobmanager-lcgpbs-short • A Computing Element is built on a homogeneous farm of computing nodes (called Worker Nodes) • One node acts as a Grid Gate (GG)or front-end to the Grid and runs: • a Globus gatekeeper • the Globus GRAM (Globus Resource Allocation Manager) • the master server of a Local Resource Management System that can be: • PBS, LSF or Condor • a local Logging and Bookkeeping server • Each LCG-2 site runs at least one CE and a farm of WNs behind it. DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -15

  16. Computing Element Computing Element: entry point into a queue of a batch system • information associated with a computing element is limited only to information relevant to the queue • Resource details relates to the system infoService gatekeeper Batch server Grid Gate node … CPU:PIV RAM:2GB OS:Linux CPU:PIV RAM:2GB OS:Linux CPU:PIII RAM:0.5GB OS:Linux CPU:PIII RAM:0.5GB OS:Linux in the example the red queue is assigned for two hosts DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -16

  17. User Interface (UI) Information Service (IS) Computing Element (CE) Frontend Node Worker Nodes (WN) Storage Element (SE) Replica Catalog (RC,RLS) Resource Broker (RB) Main Logical Machine Types (Services) in LCG-2 DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -17

  18. Storage Element (SE) • A Storage Element (SE) provides uniform access and services to large storage spaces. • Each site includes at least one SE • They use two protocols: • GSIFTPfor file transfer • Remote File Input/Output (RFIO)for file access DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -18

  19. Storage Resource Management (SRM) Data are stored on disk pool servers or Mass Storage Systems • storage resource management needs to take into account • Transparent access to files (migration to/from disk pool) • Space reservation • File status notification • Life time management • SRM (Storage Resource Manager) takes care of all these details • SRM is a Grid Service that takes care of local storage interaction and provides a Grid interface to outside world DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -19

  20. Storage Resource Management • Support for local policy • Each storage resource can be managed independently • Internal priorities are not sacrificed by data movement between Grid agents • Disk and tape resources are presented as a single element • Reservation on demand and advance reservation • Space can be reserved for registering a new file • Plan the storage system usage • File status and estimates for planning • Provides info on file status • Provide estimates on space availability/usage DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -20

  21. A Simple Configuration Computing Element 1 Storage Element 1 “CLOSE” User Interface Resource Broker Replica Catalog Information Service “CLOSE” Storage Element 2 Computing Element 2 DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -21

  22. SZTAKI’s LCG-2 system • LCG-2 local configuration GRID GATE n31.hpcc.sztaki.hu (512MB,Intel Pentium4 2.53GHz) • User Interface • Computing Element • Storage Element (69GB) • Resource Broker • ReplicaManager n27.hpcc.sztaki.hu n28.hpcc.sztaki.hu Workernode #1 (128MB,Genuine Intel PentiumIII Dual Proc. 2x500MHz) (128MB,Genuine Intel PentiumIII Dual Proc. 2x500MHz) Default:Workernode#2 DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -22

  23. User Interface (UI) Information Service (IS) Computing Element (CE) Frontend Node Worker Nodes (WN) Storage Element (SE) Replica Catalog (RC,RLS) Resource Broker (RB) Main Logical Machine Types (Services) in LCG-2 DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -23

  24. Information System (IS) • The Information System (IS) provides information about the LCG-2 Grid resources and their status • The current IS is based on LDAP: a directory service infrastructure which is a specialized database optimized for • reading, • browsing and • searching information. • the LDAP schema used in LCG2 implements the GLUE (Grid Laboratory for a Uniform Environment) Schema DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -24

  25. How to store Information? • The LDAP information model is based on entries. • An entryusually describes an object such as a • person, • a computer, • a server, and so on. • Each entry contains one or more attributes that describe theentry. • Each attribute has a type and one or more values. • Each entry has a name called a DistinguishedName (DN)that uniquely identifies it. • A DN is formed by a sequence of attributes and values. • Example: The DN of a particular CE entry would be: • an attribute identifying the site (site_ID=cern) and • an attribute identifying the CE (CE_ID=lxn1102.cern.ch), • so the complete DN would be: CE_ID=lxn1102.cern.ch,site_ID=cern. DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -25

  26. The Directory Information Tree • Based ontheir DNs, the entries can be arranged into a hierarchical tree-like structure. • This tree of directory entriesis called the Directory Information Tree (DIT). DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -26

  27. Information System (IS) • The IS is a hierarchical system with 3 levels from bottom up: • GRIS (Grid Resource Information Servers)level (CE and SE level) • Grid Index Information Server (GIIS) level (site level) • Top, centralized level (Grid level) • the Globus Monitoring and Discovery Service (MDS) mechanism has been adopted at the GRIS level • The other two levels use the Berkeley DB Information Index (BDII) mechanism DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -27

  28. LCG-2 hierarchical Info system BDII: Berkley DB Information Index GIIS: Grid Index Information Server GRIS: Grid Resource Information Server CE: Computing Element SE: Storage Element DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -28

  29. How to collect and store information? • All services are allowed to enter information into the IS • The BDII at the top • queries every GIIS in every 2 min and • acts as a cache storing information about the Grid status in its LDAP database • The BDII at the GIIS • collects info from every GRIS in every 2 min and • acts as a cache storing information about the site status in its LDAP database • The GRIS updates information according to the MDS protocol DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -29

  30. How to obtain Information? • All users can browse the catalogues • To obtain the information the client should: • Ask BDII about possible GIIS/GRIS • Directly query GIIS/GRIS • Or use BDII cache • The IS scales to ~1000 sites (MDS much less: ~100) DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -30

  31. User Interface (UI) Information Service (IS) Computing Element (CE) Frontend Node Worker Nodes (WN) Storage Element (SE) Replica Catalog (RC,RLS) Resource Broker (RB) Main Logical Machine Types (Services) in LCG-2 DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -31

  32. Data Management • The Data Management services are provided by • the Replica Management System (RMS)of EDG • and the LCG Data Managementclient tools • In LCG, the data files are replicated: • on a temporary basis, • to many different sites depending on • where the data is needed. • The users or applications do not need to know where the data is located, they use logical files names • the Data Management services are responsible for locating and accessing the data. DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -32

  33. Replica Manager:‘atomic’ replication operationsingle client interfaceorchestrator File Management Motivation Replica Catalog: Map Logical to Site files Replica Selection: Get ‘best’ file Security Pre- Post-processing: Prepare files for transfer Validate files after transfer Replication Automation: Data Source subscription Site A Site B Load balancing: Replicate based on usage Metadata: LFN metadata Transaction information Access patterns Storage Element A Storage Element B File Transfer File A File X File A File C File B File Y File B File D DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -33

  34. Data Management Tools • Tools for • Locating data • Copying data • Managing and replicating data • Meta Data management • In LCG-2 you have • Replica Manager (RM) • Replica Location Service (RLS) • Replica Metadata Catalog (RMC) RM RLS RMC DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -34

  35. Replication Services: Basic Functionality Each file has a unique Grid ID. Locations corresponding to the GUID are kept in the Replica Location Service. Users may assign aliases to the GUIDs. These are kept in the Replica Metadata Catalog. Files have replicas stored at many Grid sites on Storage Elements. Replica Metadata Catalog Replica Location Service Replica Manager The Replica Manager provides atomicity for file operations, assuring consistency of SE and catalog contents. Storage Element Storage Element DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -35

  36. Interactions with other Grid components Virtual Organization Membership Service User Interface or Worker Node Resource Broker Applications and users interface to data through the Replica Manager either directly or through the Resource Broker. Management calls should never go directly to the SRM. Information Service Replica Metadata Catalog Replica Location Service Replica Manager Storage Element Storage Element DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -36

  37. 1 2 6 3 4 5 6 Simplified Interaction Replica Manager – Storage Resource Manager Replica Catalog Replica Manager client SRM Storage • The Client asks a catalog to provide the location of a file • The catalog responds with the name of an SRM • The client asks the SRM for the file • The SRM asks the storage system to provide the file • The storage system sends the file to the client through the SRM or • directly DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -37

  38. Replica Manager (RM) • High level data management on the Grid, takes care of: • Location of data • Replication of data • Efficient access to data • Hides the SRM (Storage Resource Manager): • User cannot access directly the SRM, only through the RM • Coordinates the use of • Replica Location Service • Replica Metadata Catalog DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -38

  39. Interaction of the Replica Manager (RM) with other Grid services • The RM presents a single interface to the user or other services • Some of the RM functionalities have been replaced by a new, faster interface: the LCG Data Management client tools. DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -39

  40. File References and Replica Catalogs • The files in the Grid are referenced by different names: • Grid Unique IDentifier (GUID) • Logical File Name (LFN) • Storage URL (SURL) • Transport URL (TURL). • the GUID or LFN refer to files and not replicas, and say nothing about locations • the SURLs and TURLs give information about where a physical replica is located. RMC : ReplicaMetadata Catalog LRC : Local ReplicaCatalog DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -40

  41. Abstract file names • GUID • A file can always be identified by its GUID • GUID is assigned at data registration time • GUID is based on the UUID standard to guarantee unique IDs • A GUID is of the form: guid:<unique string> • All the replicas of a file will share the same GUID • LFN • In order to locate a Grid accessible file, the human user will normally use a LFN • LFNs are human-readable strings, they are allocated by the user as GUID aliases • LFN’s form is: lfn:<any alias> DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -41

  42. Physical file names • SURL • used by the RMS to find where a replica is physically stored and by the SE to locate the file • SURLs are of the form: sfn:<SE hostname>/<local string> • where <local string> is used internally by the SE to locate the file. • TURL • TURL gives the necessary information to retrieve a physical replica, including • hostname • path • protocol • port (as any conventional URL); DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -42

  43. Replica Location Service (RLS) • RLS maintains information about the physical location of the replicas (mapping with the GUIDs). • It is composed of several Local Replica Catalogs (LRCs) which hold the information of replicas for a single VO. DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -43

  44. Replica Metadata Catalog (RMC) • The RMC stores the mapping between GUIDs and the respective aliases (LFNs) • Maintains other metada information (sizes, dates, ownerships. . . ) DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -44

  45. User Interfaces for Data Management • Users are mainly referred to use the interface of the Replica Manager client: • Management commands • Catalogcommands • File Transfer commands • The services RLS and RMC provide additional user interfaces • Mainly for additional catalog operations • Additional server administration commands • Should mainly be used by administrators • Can also be used to check the availability of a service RM RLS RMC DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -45

  46. The Replica Manager Interface –Management Commands • copyAndRegisterFileargs: source, dest, lfn, protocol, streams • Copy a file into grid-aware storage and register the copy in the Replica Catalog as an atomic operation. • replicateFile args: source/lfn, dest, protocol, streams • Replicate a file between grid-aware stores and register the replica in the Replica Catalog as an atomic operation. • deleteFileargs: source/seHost, all • Delete a file from storage and unregister it. DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -46

  47. The Replica Manager Interface –Catalog Commands(1) • registerFileargs:source, lfn • Register a file in the Replica Catalog that is already stored on a Storage Element. • unregisterFileargs: source, guid • Unregister a file from the Replica Catalog. • listReplicas args: lfn/surl/guid • List all replicas of a file. • registerGUID args: surl, guid • Register an SURL with a known GUID in the Replica Catalog. • listGUID args: lfn/surl • Print the GUID associated with an LFN or SURL. DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -47

  48. The Replica Manager Interface –Catalog Commands (2) • addAlias args: guid, lfn • Add a new alias to GUID mapping • removeAlias args: guid, lfn • Remove an alias LFN from a known GUID. • printInfo() • Print the information needed by the Replica Manager to screen or to a file. • getVersion() • Get the versions of the replica manager client. DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -48

  49. The Replica Manager Interface –File Transfer Commands • copyFile args: source, dest • Copy a file to a non-grid destination. • listDirectory args: dir • List the directory contents on an SRM or a GridFTP server. DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -49

  50. User Interface (UI) Information Service (IS) Computing Element (CE) Frontend Node Worker Nodes (WN) Storage Element (SE) Replica Catalog (RC,RLS) Resource Broker (RB) Main Logical Machine Types (Services) in LCG-2 DAPSYS Tutorial: LCG-2 Overview – Sep 19th, 2004 -50

More Related