1 / 23

Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002. CMS distributed computing. CMS wanted to build a distributed computing system all along! CMS CTP (Dec 1996): One integrated computing system with a single global view of the data

matia
Download Presentation

Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Caltech and CMS Grid Work Overview Koen Holtman Caltech/CMS May 22, 2002

  2. CMS distributed computing • CMS wanted to build a distributed computing system all along! • CMS CTP (Dec 1996): • One integrated computing system with a single global view of the data • Used by the 1000s of CMS collaborators around the world • We now call this the `CMS Data Grid System'

  3. PPDG: Mission-Oriented Pragmatic Methodology • End-to-end integration and deployment of experiment applications using existing and emerging Grid services • Deployment of Grid technologies and services in production (24x7) environments • With stressful performance needs • Collaborative development of Grid middleware and extensions between application and middleware groups • Leading to pragmatic and acceptable-risk solutions. • HENP experiments extend their adoption of common infrastructures to higher layers of their data analysis and processing applications. • Much attention to integration, coordination, interoperability and interworking • With emphasis on incremental deployment of increasingly functional working systems

  4. CMS Grid Requirements 2003 CMS data grid system vision 28Pages • Major Grid requirements effort completed • Document writing by Caltech group • Catania CMS week Grid workshop (June 2001, about 12 hours over various sessions) • CMS consensus on many strategic issues • Division of labor between Grid projects and CMS Computing group • Needed for planning, manpower estimates • Grid job execution model • Grid data model, replication model • Object handling and the Grid • Main Grid Requirements Document: CMS Data Grid System Overview and Requirements. CMS Note 2001/037http://kholtman.home.cern.ch/kholtman/cmsreqs.pdf • Additional documents on object views, hardware sizes, workload model, data model (K. Holtman) CMS Note 2001/047

  5. Objects and Files in the Grid • CMS computing is object-oriented, and database oriented • Fundamentally we have a persistent data model with 1 object = 1 piece of physics data (KB-MB size) • Much of the thinking in the Grid projects and Grid community is file oriented • `Computer center' view of large applications • Do not look inside application code • Think about application needs in terms of CPU batch queues, disk space for files, file staging and migration • How to reconcile this ? • CMS requirements 2001-2003: • Grid project components do not need to deal with objects directly • Specify file handling requirements in such a way that a CMS layer for object handling can be built on top • LCG Project (SC2, PEB) has started to develop new object handling layer

  6. Provided by CMS • Mapping between objects and files (persistency layer) • Local and remote extraction and packaging of objects to/from files • Consistency of software configuration for each site • Configuration meta-data for each sample • Aggregation of sub-jobs • Policy for what we want to do (e.g. priorities for what to run first, the production manager) • Some error recovery too Not needed by 2003 • Auto-discovery of arbitrary identical/similar samples Needed from Somebody • Tool to implement common CMS configuration on remote sites ? Provided by the Grid • Distributed job scheduler: if a file is remote the Grid will run appropriate CMS software (often remotely; split over systems) • Resource management, monitoring, and accounting tools and services • Query estimation tools (to WHAT DEPTH?) • Resource optimisation with some user hints / control (coherent management of local copies, replication, caching) • Transfer of collections of data • Error recovery tools (from e.g. job/disk crashes.) • Location information of Grid-managed files • File management such as creation, deletion, purging, etc. • Remote virtual login and authentication / authorisation Grid Services for CMS: Division of Labor(CMS Week,June 2001)

  7. Application = initial solution is operational Catalog Services Monitoring Planner MCAT; GriPhyN catalogs Info Services Repl. Mgmt. MDS Executor Policy/Security GSI, CAS Reliable Transfer Service Compute Resource Storage Resource Ian Foster, Carl Kesselman, Mike Wilde, others GriPhyN/PPDG Architecture

  8. Common Prod. tools (IMPALA) Digitization Simulation GDMP No PU PU CERN   Fully operational FNAL   Moscow  In progress  INFN (10)  Caltech   UCSD   UFL   Imperial College   Worldwide Productionat 21 Sites Bristol   Wisconsin   IN2P3   Helsinki   CMS Production

  9. OBJECTIVITY DATATOTAL = 29 TB TYPICAL EVENT SIZES Simulated • 1 CMSIM event= 1 OOHit event = 1.4 MB Reconstructed • 1 “1033” event = 1.2 MB • 1 “2x1033” event = 1.6 MB • 1 “1034” event = 5.6 MB Simulated EventsTOTAL = 8.4 M CERN 14 TB Caltech 2.50 M FNAL 12 TB FNAL 1.65 M Caltech 0.60 TB Bristol/RAL 1.27 M Moscow 0.45 TB CERN 1.10 M INFN 0.40 TB INFN 0.76 M Bristol/RAL 0.22 TB Moscow 0.43 M UCSD 0.20 TB IN2P3 0.31 M Helsinki 0.13 M IN2P3 0.10 TB Wisconsin 0.07 M Wisconsin 0.05 TB UCSD 0.06 M Helsinki - UFL 0.05 M UFL 0.08 TB Data Produced in 2001

  10. Authors: Caltech, CERN/CMS, FNAL, CERN/IT; PPDG, GriPhyN, EU DataGrid WP2 Integration with ENSTORE; HPSS, Castor Tape Systems GDMP • Tool to transfer and manage files in production • Easy to handle this manually with a few centers, • impossible with lots of data at many centers • GDMP is based around Globus Middleware and a Flexible architecture • Globus Replica Catalogue • Provided an early model of collaboration between HEP and Grid middleware providers • Successfully used to replicate > 1TB of CMS data • Now a PPDG/EU DataGrid joint project • Grid Data Management Pilot (GDMP): A Tool for Wide Area Replication Applied Informatics Conference (AI2001), Innsbruck, Austria, 2/1001.

  11. PPDG MOP system • PPDG Developed MOP System • Allows submission of CMS prod. Jobs from a central location, run on remote locations, and returnresults • Relies on GDMP for replication • Globus GRAM • Condor-G and local queuing systems for Job Scheduling • IMPALA for Job Specification • Shown in SC2001 demo • Now being deployed in USCMS testbed • Proposed as basis for next CMS-wide production infrastructure

  12. US CMS Prototypes and Test-beds All U.S. CMS S&C Institutions are involved in DOE and NSF Grid Projects • Integrating Grid softwareinto CMS systems • Bringing CMS Productionon the Grid • Understanding the operational issues • MOP used as first pilot application • MOP system got official CMS production assignment of 200K CMSIM events • 50K have been produced and registered already

  13. Installing middleware • Virtual Data Toolkit • Globus 2.0 beta • Essential Grid Tools • Essential Grid Services I & II • Grid API • Condor-G 6.3.1 • Condor 6.3.1 • ClassAds 0.9 • GDMP 3.0 alpha 3 • We found the VDT to be very easy to install, but a little bit more challenging to configure

  14. Prototype VDG System (production) Planner Executor Compute Resource Storage Resource Local Tracking DB Concrete Planner/ WP1 Abstract Planner MOP/ WP1 CMKIN BOSS Wrapper Scripts CMSIM Local Grid Storage User ORCA/COBRA Materialized Data Catalog Replica Catalog GDMP Objectivity Metadata Catalog Virtual Data Catalog RefDB Catalog Services Replica Mngmt

  15. Planner Executor Compute Resource Storage Resource Local Tracking DB Concrete Planner/ WP1 Abstract Planner MOP/ WP1 CMKIN BOSS Wrapper Scripts CMSIM Local Grid Storage User ORCA/COBRA Materialized Data Catalog Replica Catalog GDMP Objectivity Metadata Catalog Virtual Data Catalog RefDB Catalog Services Replica Mngmt Prototype VDG System (production) = no code = existing = implemented using MOP

  16. Analysis part • Physics data analysis will be done by 100s of users • Caltech taking responsibility for developing the analysis part of the vertically integrated system • Analysis part is connected to same catalogs • Maintain a global view of all data • Big analysis jobs can use production job handling mechanisms • Analysis services based on tags

  17. Optimization of “Tag” Databases • Tags are small (~0.2 - 1 kbyte) summary objects for each event • Crucial for fast selection of interesting event subsets;this will be an intensive activity • Past work concentrated in three main areas: • Integration of CERN’s “HepODBMS” generic Tag system with the CMS “COBRA[*]” framework • Investigations of Tag bitmap indexing to speed queries • Comparisons of OO and traditional databases (SQL Server, soon Oracle 9i) as efficient stores for Tags • New work concentrates on tag based analysis services

  18. CLARENS: a Portal to the Grid • Grid-enabling the working environment for non-specialist physicists' data analysis • Clarens consists of a server communicating with various clients via the commodity XML-RPC protocol. This ensures implementation independence. • The server is implemented in C++ to give access to the CMS OO analysis toolkit. • The server will provide a remote API to Grid tools: • Security services provided by the Grid (GSI) • The Virtual Data Toolkit: Object collection access • Data movement between Tier centers using GSI-FTP • CMS analysis software (ORCA/COBRA), • Current prototype is running on the Caltech proto-Tier2 • More information at http://clarens.sourceforge.net, along with a web-based demo

  19. Discovery Lookup Service Proxy Lookup Service Client (other service) Registration RC Monitor Service • Component Factory • GUI marshaling • Code Transport • RMI data access Farm Monitor Farm Monitor Globally Scalable Monitoring Service CMS (Caltech and Pakistan) Push & Pull rsh & ssh existing scripts snmp

  20. Current events • GDMP and MOP just had very favorable internal reviews in PPDG • Testbed: currently MOP deployment under way • Stresses the Grid middleware in new ways: new issues and bugs being discovered in Globus, Condor • Testbed MOP production request: • 200K CMSIM events requested, now 50K (~10 GB) finished and validated. • New fully integrated system: first versions expected by summer • System will be the basis for demos at SC2002 • Upcoming: CMS workshop on Grid based production (CERN) • Upcoming: PPDG analysis workshop (Berkeley)

  21. 2000 - 2001 Main `Grid task' activities in 2000 - 2001: • Ramp-up of Grid projects, establish a new mode of working • Grid project requirements documents, architecture • GDMP • Started as griddified package for data transport in CMS production, is now a more generic project • Used widely in 2001 production • Also demo of mode of working • MOP • Vertical integration of CMS production software, GDMP, Condor • Both GDMP and MOP just had very succesful internal reviews in PPDG

  22. 2002 • Grid task main activities (in US) in 2002: • Build USCMS test grid • Deploy Globus 2.0, EU DataGrid components • Use MOP as a basis for developing a larger vertically integrated system with • Virtual data features • Central catalogs and a global view of data • Production facilities • Participate in real CMS production with non-trivial jobs • Analysis facilities • Caltech team's main role is towards analysis facilities

  23. Summary: 2000 - 2002 • Main `Grid task' activities in 2000 - 2001: • Grid project requirements documents, architecture • GDMP • MOP • Main `Grid task' activities (in US) in 2002: • Build USCMS test grid • Deploy Globus 2.0, EU DataGrid components • Use MOP as a basis for developing a larger vertically integrated system with • Virtual data features • Central catalogs and a global view of data • Production facilities • Participate in real CMS production • Analysis facilities

More Related