Federica Fanzago INFN PADOVA. CRAB: a tool for CMS distributed analysis in grid environment. Introduction. CMS “Compact Muon Solenoid” is one of the four particle physics experiment that will collect data at LHC “Large Hadron Collider” starting in 2007 at CERN
CMS “Compact Muon Solenoid” is one of the four particle physics experiment that will collect data at LHC “Large Hadron Collider” starting in 2007 at CERN
CMS will produce a large amount of data (events) that should be made available for analysis to world-wide distributed physicists
“bunch crossing” every 25 nsecs.
100 “triggers” per second
Each triggered event ~1 MB in size
How to manage and where to store this huge quantity of data?
How to assure data access to physicists of CMS collaboration?
How to have enough computing power for processing and data analysis?
How to ensure resources and data availability?
How to define local and global policy about data access and resources?
CMS will use a distributed architecture based on grid infrastructure
Tools for accessing distributed data and resources are provided by WLCG (World LHC Computing Grid) with two main different flavours
LCG/gLite in Europe, OSG in the US
CERN Computer center
. . .
The CMS offline computing system is arranged in four Tiers and is geographically distributed
User writes his own analysis code and configuration parameter card
Starting from CMS specific analysis software
Builds executable and libraries
He apply the code to a given amount of events, whose location is known, splitting the load over many jobs
But generally he is allowed to access only local data
He writes wrapper scripts and uses a local batch system to exploit all the computing power
Comfortable until data you’re looking for are sitting just by your side
Then he submits all by hand and checks the status and overall progress
Finally collects all output files and store them somewhere
The distributed analysis is a more complex computing task because it assume to know:
which data are available
where data are stored and how to access them
which resources are available and are able to comply with analysis requirements
grid and CMS infrastructure details
But users don't want deal with these kind of problem
Users want to analyze data in “a simple way” as in local environment
To allow analysis in distributed environment, the CMS collaboration is developing some tools interfaced with grid services, that include
Installation of CMS software via grid on remote resources
Data transfer service: to move and manage a large flow of data among tiers
Data validation system: to ensure data consistency
Data location system: to keep track of data available in each site and to allow data discovery, composed by
Central database (RefDB) that knows what kind of data (dataset) have been produced in each Tier
Local database (PubDB) in each Tier, with info about where data are stored and their access protocol
CRAB: Cms Remote Analysis Builder...
CRAB is a user-friendly tool whose aim is to simplify the work of users with no knowledge of grid infrastructure to create, submit and manage job analysis into grid environments.
written in python and installed on UI (grid user access point)
Users have to develop their analysis code in a interactive environment and decide which data to analyse.
They have to provide to CRAB:
Dataset name, number of events
Analysis code and parameter card
Output files and handling policy
CRAB handles data discovery, resources availability, job creation and submission, status monitoring and output retrieval
Job creation: crab –create N (or all)
data discovery: sites storing data are found querying RefDB and local PubDBs
packaging of user code: creation of a tgz archive with user code (bin, lib and data)
wrapper script (sh) for the real user executable
JDL file, script which drives the real job towards the “grid”
splitting: according to user request (number of events per job and in total)
Job submission: crab –submit N (or all) -c
jobs are submitted to the Resource Broker using BOSS, the submitter and tracking tool interfaced with CRAB
jobs are sent to those sites which host data
Job monitoring: crab –status (n_of_job)
the status of all submitted jobs is checked using Boss
Job output management: crab –getoutput (n_of_job)
following user request CRAB can
copy them back to the UI ...
... or copy to a Storage Element
Job resubmission: crab –resubmit n_of_job
if job suffers grid failure (aborted or cancelled status)
Used by tens of users to access remote MC data for Physics TDR analysis
~7000 Datasets available for O(10^8) total events, full MC production
CMS users, via CRAB, use two dedicated Resources Brokers (at CERN and at CNAF) knowing all CMS sites
CRAB proves that CMS users are able to use available grid services and that the full analysis chain works in a distributed environment!
Top 20 CE where
Top 20 dataset/owner
requested from users
CRAB is currently used to analyse
data for the CMS Physics TDR
(being written now…)
The total number of jobs submitted to the grid using CRAB during the second half of the last year is more than 300’000 by 40-50 users.
CRAB was born in April ’05
A big effort has been done to understand user needs and how to use in the best way services provided by grid
Lot of work have been made to make it robust, flexible and reliable
Users appreciate the tool and are asking for further improvements
CRAB has been used by many CMS collaborators to analyze remote data for CMS Physics TDR, otherwise not accessible
CRAB is used to continuously test CMS Tiers to prove the whole infrastructure robustness
The use of CRAB proves the complete computing chain for distributed analysis works for a generic CMS user !
# of jobs
From 10-07-05 to 22.01.06
The weekly rate of the
CRAB-jobs flow is:
% of jobs which arrive
to WN (remote CE) and run