Alexei klimentov brookhaven national laboratory
Download
1 / 24

Alexei Klimentov Brookhaven National Laboratory - PowerPoint PPT Presentation


  • 383 Views
  • Updated On :

XXII- th International Symposium on Nuclear Electronics and Computing. Varna Sep 6-13, 2009 . ATLAS Distributed Computing Computing Model, Data Management, Production System, Distributed Analysis, Information System, Monitoring. Alexei Klimentov Brookhaven National Laboratory.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Alexei Klimentov Brookhaven National Laboratory' - Donna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Alexei klimentov brookhaven national laboratory l.jpg

XXII-th International Symposium on Nuclear Electronics and Computing. Varna Sep 6-13, 2009

ATLAS Distributed Computing

Computing Model, Data Management,

Production System, Distributed Analysis,

Information System, Monitoring

Alexei Klimentov

Brookhaven National Laboratory


Introduction l.jpg
Introduction

  • The title that Vladimir gave me cannot be done in 20 mins.

  • I’ll talk about Distributed Computing Components, but I am certainly biased as any Operations person.


Slide3 l.jpg

ATLAS Collaboration

6 Continents

37 Countries

169 Institutions

2800 Physicists

700 Students

>1000 Technical and support staff

Albany, Alberta, NIKHEF Amsterdam, Ankara, LAPP Annecy, Argonne NL, Arizona, UT Arlington, Athens, NTU Athens, Baku, IFAE Barcelona, Belgrade, Bergen, Berkeley LBL and UC, HU Berlin, Bern, Birmingham, Bogotá, Bologna, Bonn, Boston, Brandeis, Bratislava/SAS Kosice, Brookhaven NL, Buenos Aires, Bucharest, Cambridge, Carleton, Casablanca/Rabat, CERN, Chinese Cluster, Chicago, Chilean Cluster (Santiago+Valparaiso), Clermont-Ferrand, Columbia, NBI Copenhagen, Cosenza, AGH UST Cracow, IFJ PAN Cracow, DESY, Dortmund, TU Dresden, JINR Dubna, Duke, Frascati, Freiburg, Geneva, Genoa, Giessen, Glasgow, Göttingen, LPSC Grenoble, Technion Haifa, Hampton, Harvard, Heidelberg, Hiroshima, Hiroshima IT, Indiana, Innsbruck, Iowa SU, Irvine UC, Istanbul Bogazici, KEK, Kobe, Kyoto, Kyoto UE, Lancaster, UN La Plata, Lecce, Lisbon LIP, Liverpool, Ljubljana, QMW London, RHBNC London, UC London, Lund, UA Madrid, Mainz, Manchester, Mannheim, CPPM Marseille, Massachusetts, MIT, Melbourne, Michigan, Michigan SU, Milano, Minsk NAS, Minsk NCPHEP, Montreal, McGill Montreal, FIAN Moscow, ITEP Moscow, MEPhI Moscow, MSU Moscow, Munich LMU, MPI Munich, Nagasaki IAS, Nagoya, Naples, New Mexico, New York, Nijmegen, BINP Novosibirsk, Ohio SU, Okayama, Oklahoma, Oklahoma SU, Oregon, LAL Orsay, Osaka, Oslo, Oxford, Paris VI and VII, Pavia, Pennsylvania, Pisa, Pittsburgh, CAS Prague, CU Prague, TU Prague, IHEP Protvino, Regina, Ritsumeikan, UFRJ Rio de Janeiro, Rome I, Rome II, Rome III, Rutherford Appleton Laboratory, DAPNIA Saclay, Santa Cruz UC, Sheffield, Shinshu, Siegen, Simon Fraser Burnaby, SLAC, Southern Methodist Dallas, PNPI St.Petersburg, Stockholm, KTH Stockholm, Stony Brook, Sydney, AS Taipei, Tbilisi, Tel Aviv, Thessaloniki, Tokyo ICEPP, Tokyo MU, Toronto, TRIUMF, Tsukuba, Tufts, Udine/ICTP, Uppsala, Urbana UI, Valencia, UBC Vancouver, Victoria, Washington, Weizmann Rehovot, FH Wiener Neustadt, Wisconsin, Wuppertal, Yale, Yerevan


Necessity of distributed computing l.jpg
Necessity of Distributed Computing?

  • ATLAS will collect RAW data at 320 MB/s for 50k seconds/day and ~100 days/year

    • RAW data: 1.6 PB/year

  • Processing (and re-processing) these events will require ~10k CPUs full time the first year of data-taking, and a lot more in the future as data accumulate

  • Reconstructed events will also be large, as people want to study detector performance as well as do physics analysis using the output data

    • ESD data: 1.0 PB/year, AOD data: 0.1 PB/year

  • At least 10k CPUs are also needed for continuous simulation production of at least 30% of the real data rate and for analysis

  • There is no way to concentrate all needed computing power and storage capacity at CERN

    • The LEP model will not scale to this level

  • The idea of distributed computing, and later of the computing grid, became fashionable at the turn of the century and looked promising when applied to HEP experiments’ computing needs


Slide5 l.jpg

Computing Model : Main Operations

Copy RAW data to CERN Castor Mass Storage System tape for archival

Copy RAW data to Tier-1s for storage and reprocessing

Run first-pass calibration/alignment (within 24 hrs)

Run first-pass reconstruction (within 48 hrs)

Distribute reconstruction output (ESDs, AODs & TAGs) to Tier-1s

Archive a fraction of RAW data

(Re)run calibration and alignment

Re-process data with better calib/align

or/and algo

Distribute derived data to Tier-2s

Run HITS reconstruction and large-scale

event selection and analysis jobs

TAG

Run MC simulation

Keep AOD and TAG for the analysis

Run analysis jobs

(36 Tier-2s, ~80 sites)

AOD

TAG

Incomplete list of Data Formats:

ESD : Event Summary Data

AOD : Analysis Object Data

DPD : Derived Physics Data

TAG : event meta-information

RAW

5

Calibration

Tier 2

5 sites in Europe and US

Tier 3

Contribute to MC simulation

Users Analysis

O(100) sites

Worldwide


Atlas grid sites and data distribution l.jpg
ATLAS Grid Sites and Data Distribution

3 Grids, 10 Tier-1s, ~80 Tier-2(3)s

Tier-1 and associated Tier-ns form

cloud. ATLAS clouds have from 2 to

15 sites. We also have T1-T1

associations.

ATLAS Tier-1s Data Shares

Tier-0

IN2P3 MoU & CM

RAW, ESD 15%,

AOD,DPD,TAG 100%

BNL MoU & CM

RAW 24%,

ESD, AOD,DPD,TAG 100%

Tier-1

ASGC

ASGC

ASGC

IN2P3

ASGC

BNL

MWT2

SWT2

FZK

Input Rates Estimation (Tier-1s)

AGLT2

SLAC

FZK MoU and CM

RAW, ESD 10%,

AOD,DPD,TAG 100%

NET2

Data export from CERN

reProcessed and MC data distribution


Ubiquitous wide area network bandwidth l.jpg
Ubiquitous Wide Area NetworkBandwidth

  • First Computing TDR’s assumed not enough network bandwidth

  • The Monarch project proposed multi Tier model with this in mind

  • Today network bandwidth is our least problem

  • But we still have the Tier model in the LHC experiments

  • Not in all parts of the world ideal network yet (last mile)

  • LHCOPN provides excellent backbone for Tier-0 and Tier-1’s

  • Each LHC experiment has adopted differently

K.Bos. “Status and Prospects of The LHC Experiments Computing”. CHEP’09


Distributed computing components l.jpg
Distributed Computing Components

  • The ATLAS Grid architecture is based on :

    • Distributed Data Management (DDM)

    • Distributed Production System (ProdSys, PanDA)

    • Distributed Analysis (DA), GANGA, PanDA

    • Monitoring

    • Grid Information System

    • Accounting

    • Networking

    • Databases


Atlas distributed data management 1 2 l.jpg
ATLAS Distributed Data Management. 1/2

  • The second generation of ATLAS DDM system (DQ2)

  • DQ2 is built on top of Grid data transfer tools

    • Moved to dataset based approach

      • Datasets : an aggregation of files plus associated DDM metadata

      • Datasets is a unit of storage and replication

      • Automatic data transfer mechanisms using distributed site services

        • Subscription system

        • Notification system

  • Technicalities :

    • Global services

      • dataset repository

      • dataset location catalog

      • logical file names only, no global physical file catalog

    • Local Site services (LocalFileCatalog)

      • It provides logical to physical file name mapping.


Atlas distributed data management 2 2 l.jpg
ATLAS Distributed Data Management. 2/2

Data export from CERN to Tiers

day/average

MB/s

STEP09

Reprocessed datasets replication between Tier-1s

(ΔΤ [hours] =

T_last_file_transfer –

T_subscription)

Days of running

One dataset wasn’t

replicated after 3 days

99% of data were transferred

within 4 hours

Latency in reprocessing

or site issue


Atlas production system 1 2 l.jpg
ATLAS Production System 1/2

  • Manages ATLAS simulation (full chain) and reprocessing jobs on the wLCG

    • Task request interface to define a related group of jobs

    • Input : DQ2 dataset(s) (with the exception of some event generation)

    • Output : DQ2 dataset(s) (the jobs are done only when the output is at the Tier-1)

    • Due to temporary site problems, jobs are allowed several attempts

    • Job definition and attempt state are stored in Production Database (Oracle DB)

    • Jobs are supervised by ATLAS Production System

  • Consists of many components

    • DDM/DQ2 for data management

    • PanDA task request interface and job definitions

    • PanDA for job supervision

    • ATLAS Dashboard and PanDA monitor for monitoring

    • Grid Middlewares

    • ATLAS software


Atlas production system 2 2 l.jpg
ATLAS Production System 2/2

Job brokering is done by

the PanDA Service (bamboo) according to input data

and site availability

Production Database:

job definition, job states, metadata

Task request interface

Tasks Input: DQ2 datasets

Task states

Tasks Output: DQ2 datasets

3 Grids/10 Clouds/90+Production Sites

A.Read, Mar09

Monitor sites, tasks, jobs


Data processing cycle l.jpg
Data Processing Cycle

  • Data processing at CERN (Tier-0 processing)

    • First-pass processing of the primary event stream

    • The derived datasets (ESD, AOD, DPD, TAG) are distributed from the Tier-0 to the Tier-1s

    • RAW data (received from Event Filter Farm) are exported within 24h. This is why first-pass processing can be done by Tier-1s (though this facility was not used during LHC beam and cosmic ray runs)

  • Data reprocessing at Tier-1s

    • 10 Tier-1 centers world wide. Each takes a subset of RAW data (Tier-1 shares from 5% to 25%), ATLAS production facilities at CERN can be used in case of emergency.

    • Each Tier-1 reprocessed its share of RAW data. The derived datasets are distributed ATLAS-wide.

See P.Nevski’ talk NEC2009, LHC Computing


Atlas data simulation and reprocessing l.jpg
ATLAS Data Simulation and Reprocessing

Running

Jobs

  • Production System in continuous operations

  • 10 clouds use LFC as file catalog and Panda as jobs executor

  • CPUs are under utilized in average, peak rate 33kjobs/day

  • ProdSys can produce 100 TB/week of MC

  • Average walltime efficiency is over 90%

  • System does : Data simulation and data reprocessing

Sep08-Sep09

Reprocessing


Atlas distributed analysis l.jpg
ATLAS Distributed Analysis

ATLAS jobs go to the data

J.Elmsheuser Sep09

Probably the most important area at this point

It depends on a functional data management and job management system

Two widely used distributed analysis tools (Ganga and pathena)

They capture the great majority of users

We expect the usage to grow substantially in the preparation and especially in the 2009/10 run

Present/traditional use cases: AOD/DPD analysis clearly very important

But also run over selected RAW (for detector debugging, studying etc…)


Atlas grid information system agis l.jpg
ATLAS Grid Information System (AGIS)

  • The overall purpose of ATLAS Grid Information System is to store and to expose static, dynamic and configuration parameters needed by ATLAS Distributed Computing (ADC) applications. AGIS is a database oriented system.

  • The first AGIS proposal from G.Poulard. The pioneering work of R.Pezoa and R.Rocha in summer 2008, and definition of basic design principles implemented in ‘dashboards’. Now development is leaded by ATLAS BINP group.

  • Today’s situation :

    • various configuration parameters and information about available resources, services and its status and properties are extracted from different sources or they are defined in different configuration files (sometimes Grid information is hard coded in application programs).


Agis architecture overview l.jpg
AGIS Architecture Overview

  • System architecture should allow to add new classes of information or sites configuration parameters, to reconfigure ATLAS clouds topology and production queues, to add and to modify users information.

  • AGIS is ORACLE based information system.

  • AGIS stores as read-only data extracted from the external databases (f.e, OIM, GOCDB, BDII) and ADC configuration information which can be modified.

  • The synchronization of AGIS content with the external sources will be done by agents (data providers), agents will access databases via standard interfaces.


Agis components l.jpg
AGIS Components

ATP

Logging Service

A.Anisenkov, D.Krivashin. Sep09


Agis information l.jpg
AGIS Information

  • ATLAS clouds, tiers and sites

    • Topology : clouds, tiers, sites specifics (f.e. geography, names, etc)

  • Site Resources and Services information

    • list of resources and services (FTS servers, SRM, LFC)

    • site service properties (name, status, type, endpoints)

  • Site information and configurations

    • available CE and SE information (CPU,disk information, status, available resources)

    • availability and various status information, like site status in ATLAS Data distribution, Monte-Carlo Production, Functional Tests. Site downtime periods

    • relations to currently running/planned tests, tasks or runs

  • Data replication. Sites shares and pairing.

    • List of activities (f.e. reprocessing), activity start and end time • Global configuration parameters needed by ADC applications

  • Users related information (privileges, roles, account info)



  • Atlas distributed computing monitoring next l.jpg
    ATLAS Distributed Computing Monitoring (Next)

    • Simplify into one monitoring application (where it is possible)

    • Standardize monitoring messages

      • https://svnweb.cern.ch/trac/dashboard/wiki/WorkInProgress

      • HTTP for transport

      • JSON for data serialization

    • Attempt to have a common (single) dashboard client application

      • Built using the Google Web Toolkit (GWT)

    • Source data exposed directly from its source (like the Panda database)

      • Avoid aggregation databases like we have today

      • Server side technology left open

    R.Rocha Sep09


    Summary conclusions l.jpg
    Summary & Conclusions

    • The ATLAS Collaboration has developed a set of software and middleware tools that enable access to data for physics analysis purposes to all members of the collaboration, independently of their geographical location.

    • Main building blocks of this infrastructure are

      • the Distributed Data Management system;

      • the Ganga and pathena for distributed analysis on the Grid.

      • Production System to (re)process and to simulate ATLAS data

    • Almost all required functionalities are already provided; and extensively used for simulated, as well as real data from beam and cosmic ray events.

    • Grid Information System technical proposal is finalized and the system must be in production by the end of the year

    • Monitoring system standardization is in progress


    Slide23 l.jpg

    МНОГО

    БЛАГОДАРЮ


    Acknoledgements l.jpg
    Acknoledgements

    • Thanks to A.Anisenkov, D.Barberis, K.Bos, M.Branco, S.Campana, A.Farbin, J.Elmsheuser, D.Krivashin, A.Read, R.Rocha, A.Vaniachine, T.Wenaus,…

      For pictures and slides used in this presentation