1 / 9

Loon on GRID

Loon on GRID. Nick West. The GRID: Why Bother?. After all we have the resources we need Farms at FNAL, RAL, Cambridge, … Why complicate things? Because it is the future! Like it or not. Farms are merging with the GRID. And the future is getting closer!

makara
Download Presentation

Loon on GRID

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Loon on GRID Nick West

  2. The GRID: Why Bother? • After all we have the resources we need • Farms at FNAL, RAL, Cambridge, … • Why complicate things? • Because it is the future! • Like it or not. • Farms are merging with the GRID. • And the future is getting closer! • RAL claim GRID-only access to their farm by end of this year. • In the UK we don’t have a choice • Not so much jumping, more like being pushed!

  3. The GRID is …. • Exciting! • The prospect of huge resources round the world… • Frustrating! • Not a single coherent entity. • Instead multiple, overlapping, conflicting, evolving systems with documentation to match! • As a group manager I still have a very poor idea of what my responsibilities are, who provides support and what support is needed. • As a GRID user, when things go wrong (and they do) I still have little idea how to figure out what has gone wrong and who I should report it to. • A quote from a GRID expert:-“Welcome to the Grid. :-/ There are a lot of things in grid-dom that one would assume should be automatic, and one would be wrong. Just remember the axiom: if it sounds too good to be true, it probably is” • Acronym Heaven! • Just a few: BDII CA Castor CE dCache DPM GIIS GLUE GOC GRIS GSIDCAP GSIFTP GSIRFIO GUID IS LCF LDAP LFN MDS Proxy RA RB RFIO R-GMA SE SRM SURL TURL UI VO VOMS

  4. A Bit of GRID History(and some more acronyms) • Globus • The earliest Grid tools. Basic security (digital certificates), remote job submission and distributed information about resources. Still the foundation of the current Grid. • Condor • Originally allowed the use of otherwise idle time on machines. Now used in job submission systems. • Global Grid Forum (GGF) • Worldwide body aiming to standardise GRID services and protocols. Work in progress! • European DataGrid (EDG) • EU-funded project based at CERN which ran from 2001-04. Produced a useful, widespread Grid. Strongly coupled to HEP. • Enabling Grids for E-sciencE (EGEE) • Much larger successor to EDG for scientific research in Europe with links to projects in many other countries. Runs 2004-08. Likely to be succeeded by a permanent organisation to operate the Grid. • The LHC Computing Grid (LCG/LGC-2). • CERN project for the LHC experiments, which are due to start taking data in 2007. Now in second phase LCG-2. • There is a strong, complex and often confusing relationship between LCG and EGEE. Both are based at CERN, with much overlap but distinct goals, with LCG extending support in HEP and EGEE serving a wider community. • The Open Science Grid (OSG) • US equivalent of EGEE, and provides most of the US computing resources for LCG. It runs much of the same middleware as EGEE, and many things are similar, but there are also substantial differences. There is currently a lot of work in progress to make the two systems interoperable in a transparent way.Much of this cribbed from: http://www.gridpp.ac.uk/deployment/users/basics.html

  5. VO/VOMS • Virtual Organisation (VO) • The key to resource allocation. Resources allocated to VOs. Individuals join a VO and can then store data and run jobs. • Virtual Organisation Membership Service (VOMS) • An authentication service: The list of VO users authorised to use VO resources comes from the VOMS and is propagated to the resources. • The MINOS VO • Currently MINOS uses LCG VOMS server at Manchester with two sites: RAL Tier 1 and Tier 2. • Only one active member so far – guess who? • A couple of weeks ago, out of the blue, got a third siteDepartment of Physics, Purdue University, West Lafayette, INwho ordered that ? • It turns out the Fermilab are a OSG VOMS and have a MINOS VO!! • There’s almost nothing to stop anyone from using any name! • We are trying to reconcile. The plan is to switch to Fermilab. • Will see how well LCG and OSG VOMS interoperate! • In order to control resource access have 2 subgroups:- • ukminos and usminos • Allows us to limit access, site by site, to one or both groups.

  6. UI RB SE FC CE head WN WN Grid: Some Key Concepts Resource Broker Determines best CE and submits job to it Job Description File User Interface GRID job submission File Catalogue Holds location of data files Job Description File Log files Look-up data location Storage Element Mass storage + protocols for secure access Main Data flows Computing Element Head node with installed software Worker Nodes + batch system

  7. Installing minossoft on the GRID • The Standard Model • Each CE (Computing Element) has:- • A head node on which software is installed • WN (worker nodes) that can access it • Installation should be done remotely using standard job submission. • RSD (Remote Software Deployment) • A system I have written (perl + sh scripts). • ASSEMBLE set of tar files, including RSD on web visible directory. • LAUNCH job from UI • INSTALL on remote head node by bootstrapping first RSD then all software from web. • Have used it to install R1.18 and R1.21 on RAL Tier 1 and run simple loon jobs.

  8. Grid Data Files • Grid Data Files are • Registered in the LFC (LHC File Catalogue) • Stored in SEs (Storage Elements) • May be replicated • So that data for a CE can come from a nearby SE. • Access via the LFC • Files have a LFN (Logical File Name) – a unix-like directory structure. • From LFN can get SURL (Storage URL) for each replica. • Can copy out file or access directly if SE supports it. • MINOS SE at RAL • dCache with 0.1TB disk and 10TB tape • Have run a GRID loon job passing LFN data file:- • Used the LFC to map LFN  SURL  dCache URL • Directly access from loon using dcap library • (not ROOT’s TDCacheFile – still working on that)

  9. Database Access • At least for now MySQL access is fine • Standard operating system is Scientific Linux • so has MySQL client. • At Ral CEs (Tier 1 and Tier 2) • Within RAL firewall so can access RAL DB. • At FNAL CE • Use FNAL DB. • What about other sites? • For low demand • In Europe: RAL would consider allowing off-site read-access. • In US: FNAL already allows off-site read access. • For high demand • Request access to a local server and use DBMauto.

More Related