1 / 21

ATLAS Computing

ATLAS Computing. XXI International Symposium on Nuclear Electronics & Computing Varna, Bulgaria, 10-17 September 2007 Alexandre Vaniachine Invited talk for ATLAS Collaboration. Outline. ATLAS Computing Model Distributed computing facilities: Tier-0 plus Grids: EGEE, OSG, and NDGF

archie
Download Presentation

ATLAS Computing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ATLAS Computing XXI International Symposium on Nuclear Electronics & Computing Varna, Bulgaria, 10-17 September 2007 Alexandre Vaniachine Invited talk for ATLAS Collaboration

  2. Outline • ATLAS Computing Model • Distributed computing facilities: • Tier-0 plus Grids: EGEE, OSG, and NDGF • Components: • Production system (jobs) • Data management (files) • Databases to keep track of jobs and files, etc • Status: • Transition from development to operations • Commissioning at pit and Tier-0 with cosmics data • Commissioning with simulations data by grid operations teams • Distributed analysis of these data by physicists • M4 cosmics run in August validated all these separate operations Alexandre Vaniachine

  3. Credits • I wish to thank the Symposium organizers for their invitation and for their hospitality • This overview is biased by my own opinions • The CHEP07 last week simplified my task • I wish to thank my ATLAS collaborators for their contributions • I have added references to their CHEP07 contributions • All ATLAS CHEP contributions will be published as ATLAS Computing Notes and serve as a foundation for collaboration paper on ATLAS Computing to be prepared towards the end of 2007 • which will provide updates for ATLAS Computing TDR: http://atlas-proj-computing-tdr.web.cern.ch/atlas-proj-computing-tdr/Html/Computing-TDR.htm Alexandre Vaniachine

  4. ATLAS Multi-Grid Infrastructure • ATLAS computing operates uniformly on three Grids with different interfaces • Focus is shifting to physics analysis performance • testing, integration, validation, deployment, documentation • then operations! Alexandre Vaniachine

  5. 40+ sites Worldwide Farbin [id 83] ATLAS Computing Model: Roles of Distributed Tiers • Reprocessing of full data a few months after data taking, as soon as improved calibration and alignment constants are available • Managed Tape Access: RAW, ESD • Disk Access: AOD, fraction of ESD Alexandre Vaniachine

  6. Ongoing Transition from Development to Operations • To facilitate operational efficiency ATLAS computing separates development activities from operations • No more “clever developments” - pragmatic addition of vital residual services to support operations: • Such as monitoring tools (critical to operations) • ATLAS CHEP07 contributions covered in detail both development reports and operational experience: Alexandre Vaniachine

  7. Job – unit for data processing workflow management in Grid Computing - Managed by ATLAS Production System: • File - unit for data management in Grid Computing - Managed by ATLAS Distributed Data Management (DDM) System ATLAS job configuration and job completion are stored in prodDB • ATLAS files are grouped in Datasets: • ATLAS jobs are grouped in Tasks: AMI: ATLAS Metadata Information DB – “Where is my dataset?” DDM Central Catalogs AKTR DB for physics tasks Keeping Track of Jobs and Files Needs Databases • To achieve robust operations on the grids ATLAS splits data processing tasks of petabytes of event data into smaller units - jobs and files • Database Operations will be covered in a separate talk later in this session Alexandre Vaniachine

  8. ATLAS Multi-Grid Operations Architecture and Results • Leveraging underlying database infrastructure ATLAS Production System and ATLAS DDM successfully manage simulations workflow on three production grids: EGEE, OSG and NDGF • Statistics of success - from ATLAS production database Alexandre Vaniachine

  9. Leveraging Growing Resources on Three Grids • Latest snapshot of ATLAS resources [Yuri Smirnov, CHEP talk 184] Alexandre Vaniachine

  10. CERN to Tier-1 Transfer Rates • ATLAS has the largest nominal CERN to Tier-1 transfer rate • Tests this spring reached ~75% of the nominat target • Successful use of all ten ATLAS Tier-1centers: Alexandre Vaniachine

  11. Validation Computing Model with Realistic Data Rates • M3 Cosmics Run (mid-July) • Cosmics produced about 100 TB in 2 weeks • Stressed offline by running at 4 times the nominal rate • LAr 32 samples test • M4 Cosmics Run: August 23 – September 3 • Metrics for success • full rate Tier-0 processing • data exported to 5 of 10 Tier-1s and stored • for 2 of 5 Tier-1s exports to at least two Tier-2s • quasi real-time analysis in at least one Tier-2 • reprocessing in September in at least one Tier-1 • M5 Cosmics Run scheduled for October 16-23 • M6 Cosmics Run will run from end December until real data • Incremental goals, reprocessing between runs • Will run close to nominal rate • Maybe ~420 TB by start of run, plus Monte Carlo Alexandre Vaniachine

  12. Validating Computing Model with Realistic Data Rates • M4 Cosmics Run: August 23 – September 3 • Raw data distribution in real time from online to Tier-0 and to all Tier-1s • The full chain worked with all ten Tier-1s at a target rate Throughput in MB/s from T0 to all T1’s Expected max rate Every day rates were ramping-up Last day of run Alexandre Vaniachine

  13. Real-time M4 Data Analysis • Tracks in the muon chambers (right) and in the TRT (below) • Analysis done simultaneously in European and US T1/2 sites Alexandre Vaniachine

  14. An Important Milestone • Metrics for success • Full rate T0 processing OK • Data exported to 5 / 10 T1’s and stored OK, and did more ! • For 2 / 5 T1’s exports to at least 2 T2’s OK • Quasi-rt analysis in at least 1 T2 OK, and did more ! • Reprocessing in Sept. in at least 1 T1 in preparation • Last week ATLAS has shown to master for the first time the whole data chain: from a measurement of a real cosmic ray muon in the detector until an almost real-time analysis in sites in Europe and the US with all steps in between Alexandre Vaniachine

  15. ATLAS Event Data Model Alexandre Vaniachine

  16. ATLAS Full Dress Rehearsal • Simulated events injected in the TDAQ • Realistic physics mix in bytestream format incl. luminosity blocks • Real data file and dataset sizes, trigger tables, data streaming • Tier-0/Tier-1 data quality, express line, calibration running • Use of Conditions DB • Tier-0 reconstruction: ESD, AOD, TAG, DPD • Data exports to Tier-1 and Tier-2s • Remote analysis • at the Tier-1s: • Reprocessing from RAW  ESD, AOD, DPD, TAG • Remake AOD from ESD • Group based analysis DPD • at the Tier-2s and Tier-3s: • Root based analysis • Trigger aware analysis with Cond. and Trigger db • No MC truth, user analysis • MC/Reco production in parallel Alexandre Vaniachine

  17. FDR Schedule Round 1 • Data streaming tests: DONE • Sept/Oct 07: Data preparation STARTS SOON • End Oct 07: Tier-0 operations tests • Nov 07 - Feb 08: Reprocess at Tier-1, make group DPD's Round 2ASSUMING NEW G4 • Dec 07 – Jan 08: New data production for final round • Feb 08: Data prep for final round using • Mar 08: Reco final round ASSUMING SRM v2.2 • Apr 08: DPD production at Tier-1s • Apr 08: More simulated data prod in preparation for first data. • May 08: final FDR • First pass production should be validated by year-end • Reprocessing will be validated months later • Analysis roles will be validated Alexandre Vaniachine

  18. Ramping Up Computing Resources for LHC Datataking • Change of LHC schedule makes little change to the resource profile • Recall the early data is for calibration and commissioning • This is needed either from collisions or cosmics Alexandre Vaniachine

  19. ATLAS Analysis Model • Basic principle: Smaller data can be read faster • Skimming - Keep interesting events • Thinning - Keep interesting objects in events • Slimming - Keep interesting info in objects • Reduction - Build higher-level data • Derived Physics Data • Share the schema with objects in the AOD/ESD • Can be analyzed interactively 19 Farbin id 83 Alexandre Vaniachine

  20. Analysis: Grid Tools and Experiences • On the EGEE and NDGF infrastructure ATLAS uses direct submission to the middleware using GANGA • EGEE: LCG RB and gLite WMS • NDGF: ARC middleware • On OSG PANDA system • Pilot based system • Also available at some EGEE sites • Many users have been exposed to the grid • Work is getting done • Simple user interface is essential to simplify the usage • But experts required to understand the problem • Sometimes user have the impression that they are debugging the grid Alexandre Vaniachine

  21. Conclusions • ATLAS computing is addressing unprecedented challenges • we are in a final stages of mastering how to handle those challenges • ATLAS experiment mastered complex multi-grid computing infrastructure at the scale close to the expectations for running conditions • Resource utilization for simulated event production • Transfers from CERN • A coordinated shift from development to operations/services is happening in the final year of preparation • An increase in scale is expected in facility infrastructure and the corresponding ability to use new capacities effectively • User analysis activities are ramping up Alexandre Vaniachine

More Related