1 / 20

Virtual Data In CMS Production

Virtual Data In CMS Production. A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh , S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics, 2003 UC San Diego. Virtual Data Motivations in Production.

zarola
Download Presentation

Virtual Data In CMS Production

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtual Data In CMS Production A. Arbree, P. Avery, D. Bourilkov, R. Cavanaugh, S. Katageri, G. Graham, J. Rodriguez, J. Voeckler, M. Wilde CMS & GriPhyN Conference in High Energy Physics, 2003 UC San Diego

  2. Virtual Data Motivations in Production • Data track-ability and result audit-ability • Universally sought by scientists • Facilitates tool and data sharing and collaboration • Data can be sent along with its recipe • Recipe is useful in searching for data • Workflow management • A new, structured paradigm for organizing, locating, and specifying data products • Performance optimizations • Ability to delay execution planning until as late as possible CHEP 2003

  3. Initial CMS Production tests using the Chimera Virtual Data System • Motivation • Simplify CMS production in a Grid environment • Evaluate current state of Virtual Data technology • Understand issues related to provenance of CMS data • Use-case • Implement a simple 5-stage CMS production pipeline on the US CMS Test Grid • Solution • Wrote an interface between Chimera and the CMS production software • Wrote a simple grid scheduler • Ran sample simulations to evaluate system CHEP 2003

  4. Job A Job B Job C Job D What is a DAG? Directed Acyclic Graph • A DAG is the data structure used to represent job dependencies. • Each job is a “node” in the DAG. • Each node can have any number of “parent” or “children” nodes – as long as there are no loops! • We usually talk about workflow in units of "DAGs" CHEP 2003 Picture Taken from Peter Couvares

  5. Generator Formator Simulator Digitiser writeESD writeAOD writeTAG ODBMS Analysis Scripts Example CMS Data/ Workflow Calib. DB writeESD writeAOD writeTAG CHEP 2003

  6. Generator Formator Simulator Digitiser writeESD writeAOD writeTAG MC Production Team ODBMS (Re)processing Team Physics Groups Analysis Scripts Online Teams Data/workflow is a collaborative endeavour! Calib. DB writeESD writeAOD writeTAG CHEP 2003

  7. A Simple CMS Production 5-Stage Workflow Use-case .ntpl Events are generated (pythia). CMKIN .fz Detector’s response is simulated for each event (geant3). CMSIM Event Database Events are reformatted and written into a database. OOHITS Original events are digitised and reconstructed. OODIGI .ntpl Reconstructed data is reduced and written to flat file. NTUPLE CHEP 2003

  8. .ntpl CMKIN .fz CMSIM Event DB OOHITS OODIGI .ntpl NTUPL 2-stage DAG Representation of the 5-stage Use-case Responsibility of a Workflow Generator: creates the abstract plan • Fortran job wraps the CMKIN and CMSIM stages. • DB job wraps the OOHITS, OODIGI, and NTUPLE stages. Initially used a simple script to generate Virtual Data Language (VDL) McRunJob is now used to generate the Workflow in VDL (see talk by G. Graham) Fortran DB This structure was used to enforce policy constraints on the Workflow (i.e. Objectivity/DB license required for DB stages) CHEP 2003

  9. Mapping Abstract Workflows onto Concrete Environments VDL • Abstract DAGs (virtual workflow) • Resource locations unspecified • File names are logical • Data destinations unspecified • build style • Concrete DAGs (stuff for submission) • Resource locations determined • Physical file names specified • Data delivered to and returned from physical • locations • make style XML VDC XML Abs. Plan Logical DAX RC C. Plan. DAG Physical DAGMan In general there are a range of planning steps between abstract workflows and concrete workflows CHEP 2003

  10. Concrete DAG Representation of the CMS Pipeline Use-case • Responsibility of the Concrete Planner: • Binds job nodes with physical grid sites • Queries Replica and Transformation Catalogs for existence and location. • Dresses job nodes with stage-in/out nodes. Fortran DB Stage File In Execute Job Stage File Out Register File CHEP 2003

  11. Default middleware configurationfrom the Virtual Data Toolkit submit host remote host Chimera gatekeeper gahp_server Local Scheduler (Condor, PBS, etc.) DAGman compute machines Condor-G CHEP 2003

  12. Modified middleware configuration (to enable massive CMS production workflows) submit host remote host McRunJob: Generic Workflow Generator RefDB gatekeeper gahp_server Chimera Local Scheduler (Condor, PBS, etc.) Condor-G WorkRunner compute machines DAGman CHEP 2003

  13. Modified middleware configuration (to enable massive CMS production workflows) submit host remote host McRunJob: Generic Workflow Generator RefDB gatekeeper The CMS Metadata Catalog: - contains parameter/cards files - contains production requests - contains production status - etc gahp_server Chimera Local Scheduler (Condor, PBS, etc.) Condor-G WorkRunner compute machines See Veronique Lefebure's talk on RefDB DAGman CHEP 2003

  14. Linker VDL Generator VDL Config RefDB Module Modified middleware configuration (to enable massive CMS production workflows) submit host remote host McRunJob: Generic Workflow Generator RefDB gatekeeper The CMS Workflow Generator: - Constructs production workflow from a request in the RefDB - Writes workflow description in VDL (via ScriptGen) gahp_server Chimera Local Scheduler (Condor, PBS, etc.) Condor-G See Greg Graham's talk on MCRunJob WorkRunner compute machines DAGman CHEP 2003

  15. Chimera Interface Condor-G Monitor Job Tracking Module Modified middleware configuration (to enable massive CMS production workflows) submit host remote host McRunJob: Generic Workflow Generator RefDB gatekeeper Workflow Grid Scheduler gahp_server Chimera - very simple placeholder (due to lack of interface to resource broker) - submits Chimera workflows based on simple job monitoring information from Condor-G Local Scheduler (Condor, PBS, etc.) Condor-G WorkRunner compute machines DAGman CHEP 2003

  16. Modified middleware configuration (to enable massive CMS production workflows) submit host remote host McRunJob: Generic Workflow Generator RefDB gatekeeper gahp_server Chimera Local Scheduler (Condor, PBS, etc.) Condor-G WorkRunner compute machines DAGman CHEP 2003

  17. Initial Results • Production Test • Results • 678 DAG’s (250 events each) • 167,500 test events computed (not delivered to CMS) • 350 CPU/days on 25 dual-processor Pentium (1 GHz) machines over 2 weeks of clock time • 200 GB simulated data • Problems • 8 failed DAG’s • Cause • Pre-emption by another user CHEP 2003

  18. Initial Results (cont) • Scheduling Test • Results • 5954 DAG’s (1 event each, not used by CMS) • 300 CPU/days on 145 CPU’s in 6 sites • University of Florida: USCMS Cluster (8), HCS Cluster (64), GriPhyN Cluster (28) • University of Wisconsin, Milwaukee, CS Dept. Cluster (30) • University of Chicago, CS Dept. Cluster (5) • Argonne National Lab DataGrid Cluster (10) • Problems • 395 failed DAG’s • Causes • Failure to post final data from UF GriPhyN Cluster (200-300) • Globus Bug, 1 DAG in 50 fails when communication is lost • Primarily limited by the performance of lower-level grid middleware CHEP 2003

  19. The Value of Virtual Data • Provides full reproducibility (fault tolerance) of one's results: • tracks ALL dependencies between transformations and their derived data products • something like a "Virtual Logbook" • records the provenance of data products • Provides transparency with respect to location and existence. The user need not know: • the data location • how many data files are in a data set • if the requested derived data exists • Allows for optimal performance in planning. Should the derived data be: • staged-in from a remote site? • send the job to the data • send the data to the job • re-created locally on demand? CHEP 2003

  20. Summary:Grid Production of CMS Simulated Data • CMS production of simulated data (to date) • O(10) sites • O(1000) CPUs • O(100) TB of data • O(10) production managers • Goal is to double every year—without increasing the number of production managers! • More automation will be needed for upcoming Data Challenges! • Virtual Data provides • parts of the necessary abstraction required for automation and fault tolerance. • mechanisms for data provenance (important for search engines) • Virtual Data technology is "real" and maturing, but still in its childhood • much functionality currently exists • still requires placeholder components for intelligent planning and optimisation CHEP 2003

More Related