1 / 43

CHEP’06 Highlights

CHEP’06 Highlights. Tony Chan. CHEP’06 Highlights. 478 registered participants 467 submitted abstracts President of India address Warm temperatures (90+ degrees) Traveler’s diarrhea, mosquitoes, etc. CHEP’06 Highlights. LHC status Status of various computer facilities

delano
Download Presentation

CHEP’06 Highlights

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CHEP’06 Highlights Tony Chan

  2. CHEP’06 Highlights • 478 registered participants • 467 submitted abstracts • President of India address • Warm temperatures (90+ degrees) • Traveler’s diarrhea, mosquitoes, etc

  3. CHEP’06 Highlights • LHC status • Status of various computer facilities • Grid Middleware reports • Distributed computing models • Other interesting reports

  4. Barrel Toroid installation status The mechanical installation is complete, electrical and cryogenic connections are being made now, for a first in-situ cool-down and excitation test in spring 2006

  5. LCG 2005 today 2006 cosmics 2007 First beams First physics 2008 Full physics run Building the Service SC1 -Nov04-Jan05 - data transfer between CERN and three Tier-1s (FNAL, NIKHEF, FZK) SC2 –Apr05 - data distribution from CERN to 7 Tier-1s – 600 MB/sec sustained for 10 days (one third of final nominal rate) SC3 –Sep-Dec05 - demonstrate reliable basic service – most Tier-1s, some Tier-2s; push up Tier-1 data rates to 150 MB/sec (60 MB/sec to tape) SC4 –May-Aug06 - demonstrate full service – all Tier-1s, major Tier-2s; full set of baseline services; data distribution and recording at nominal LHC rate (1.6 GB/sec) LHC Service in operation– Sep06 – over following six months ramp up to full operational capacity & performance LHC service commissioned – Apr07

  6. Conclusions The LHC project (machine; detectors; LCG) is well underway for physics in 2007 Detector construction is generally proceeding well, although not without concerns in some cases; an enormous integration/installation effort is ongoing – schedules are tight but are also taken very seriously. LCG (like machine and detectors at a technological level that defines the new ‘state of the art’) needs to fully develop the functionality required; new ‘paradigm’. Large potential for exciting physics.

  7. Status of FNAL Tier 1 • Sole Tier 1 in the Americas for CSM • 2006 is first year of 3-year procurement ramp-up • Currently have 1MSI2K, 100 TB dCache storage, single 10 Gb link • Expect to have by 2008: • 4.3 MSI2K (2000 CPU’s) • 2 PB storage (200 servers, 1600 MB/s I/O) • 15 Gb/s between FNAL and CERN • 30 FTE

  8. Status of FNAL Tier 1 (cont.) • Supports both LCG and OSG • 50% usage by local (450+) users, 50% by grid • Batch switched to Condor in 2005 – scaling well so far • Enstore/dCache deployed • dCache performed well in stress test (2-3 GB/s, 200 TB/day) • SRM v.2 to be deployed for dCache storage element in early 2006

  9. ATLAS Canada Tier 1

  10. ATLAS Canada Tier 1 (cont.)

  11. ATLAS Canada Tier 1 (cont.)

  12. ATLAS Canada Tier 1 (cont.)

  13. Other Facilities • Tier 2 center in Manchester  scalable remote cluster management & monitoring and provisioning software (nagios, cfengine, kickstart) • Indiana/Chicago USATLAS Tier 2 center • RAL Tier 1 center

  14. Multi Core CPUs & ROOT http://www.intel.com/technology/computing/archinnov/platform2015/ This is going to affect the evolution of ROOT in many areas

  15. Moore’s law revisited Your laptop in 2016 with 32 processors 16 Gbytes RAM 16 Tbytes disk > 50 today’s laptop

  16. Impact on ROOT • There are many areas in ROOT that can benefit from a multi core architecture. Because the hardware is becoming available on commodity laptops, it is urgent to implement the most obvious asap. • Multi-Core often implies multi-threading. There are several areas to be made not only thread-safe but also thread aware. • PROOF obvious candidate. By default a ROOT interactive session should run in PROOF mode. It would be nice if this could be made totally transparent to a user. • Speed-up I/O with multi-threaded I/O and read-ahead • Buffer compression in parallel • Minimization function in parallel • Interactive compilation with ACLIC in parallel • etc..

  17. Gridview Project Goal • Provide a high level view of the various Grid resources and functional aspects of the LCG • Central Archival, Analysis, Summarization Graphical Presentation and Pictorial Visualization of Data from various LCG sites and monitoring tools • Useful in GOCs/ROCs and to site admins/VO admins

  18. Gridview Architecture • Loosely coupled components with independent sensors, transport, archival, analysis and visualization components. • Sensors are the various LCG information providers and monitoring tools at sites • Transport used is R-GMA • Gridview provides Archival, Analysis and Visualization

  19. On-Going work in Gridview • Service Availability Monitoring • Being interfaced with SFT (Site Functional Tests) for monitoring availability of various services such as CE, SE, RB, BDII etc. • Rating of sites according to average resource availability and acceptable thresholds • Service availability metrics such as MTTR, uptime, failure rate to be computed and visualised • gLite FTS • Gridview to be adapted to monitor file transfer statistics like successful transfers, failure rates etc for FTS channels across grid sites • Enhancement of GUI & Visualisation module to function as full-fledged dashboard for LCG

  20. JobMon

  21. JobMon (cont.)

  22. JobMon (cont.)

  23. Introduction (Terapaths) • The problem: support efficient/reliable/predictable peta-scale data movement in modern high-speed networks • Multiple data flows with varying priority • Default “best effort” network behavior can cause performance and service disruption problems • Solution: enhance network functionality with QoS features to allow prioritization and protection of data flows

  24. The TeraPaths Project • The TeraPaths project investigates the integration and use of LAN QoS and MPLS/GMPLS-based differentiated network services in the ATLAS data intensive distributed computing environment in order to manage the network as a critical resource • DOE: The collaboration includes BNL and the University of Michigan, as well as OSCARS (ESnet), LambdaStation (FNAL), and DWMI (SLAC) • NSF: BNL participates in UltraLight to provide the network advances required in enabling petabyte-scale analysis of globally distributed data • NSF: BNL participates in a new network initiative: PLaNetS (Physics Lambda Network System ), led by CalTech

  25. dCache • New version (availability unknown?) • Features • Resilient dCache (n < copies < m) • SRM v2 • Partitioning (one instance, multiple pool configurations) • Support for xrootd protocol • Performance • multiple I/O queues • multiple file system servers

  26. Computing Resources (ATLAS) • Computing Model fairly well evolved, documented in C-TDR • Externally reviewed • http://doc.cern.ch//archive/electronic/cern/preprints/lhcc/public/lhcc-2005-022.pdf • There are (and will remain for some time) many unknowns • Calibration and alignment strategy is still evolving • Physics data access patterns MAY be exercised from June • Unlikely to know the real patterns until 2007/2008! • Still uncertainties on the event sizes , reconstruction time • Lesson from the previous round of experiments at CERN (LEP, 1989-2000) • Reviews in 1988 underestimated the computing requirements by an order of magnitude!

  27. ATLAS Facilities • Event Filter Farm at CERN • Located near the Experiment, assembles data into a stream to the Tier 0 Center • Tier 0 Center at CERN • Raw data  Mass storage at CERN and to Tier 1 centers • Swift production of Event Summary Data (ESD) and Analysis Object Data (AOD) • Ship ESD, AOD to Tier 1 centers  Mass storage at CERN • Tier 1 Centers distributed worldwide (10 centers) • Re-reconstruction of raw data, producing new ESD, AOD • Scheduled, group access to full ESD and AOD • Tier 2 Centers distributed worldwide (approximately 30 centers) • Monte Carlo Simulation, producing ESD, AOD, ESD, AOD  Tier 1 centers • On demand user physics analysis • CERN Analysis Facility • Analysis • Heightened access to ESD and RAW/calibration data on demand • Tier 3 Centers distributed worldwide • Physics analysis

  28. Processing • Tier-0: • Prompt first pass processing on express/calibration physics stream • 24-48 hours later, process full physics data stream with reasonable calibrations • Implies large data movement from T0 →T1s • Tier-1: • Reprocess 1-2 months after arrival with better calibrations • Reprocess all resident RAW at year end with improved calibration and software • Implies large data movement from T1↔T1 and T1 → T2

  29. Dulcinea Dulcinea CE Dulcinea Dulcinea Lexor CondorG CE ProdDB ATLAS Prodsys Dulcinea PANDA Dulcinea Dulcinea RB CG RB RB CE

  30. Analysis model Analysis model broken into two components • Scheduled central production of augmented AOD, tuples & TAG collections from ESD • Derived files moved to other T1s and to T2s • Chaotic user analysis of augmented AOD streams, tuples, new selections etc and individual user simulation and CPU-bound tasks matching the official MC production • Modest job traffic between T2s

  31. Initial experiences • PANDA on OSG • Analysis with the Production System • GANGA

  32. Summary • Systems have been exposed to selected users • Positive feedback • Direct contact to the experts still essential • For this year – power users and grid experts … • Main issues • Data distribution → New DDM • Scalability → New Prodsys/PANDA/gLite/CondorG • Analysis in parallel to Production → Job Priorities

  33. ATLAS T0 Resources

  34. ATLAS T1 Resources

  35. ATLAS T2 Resources

  36. DIAL Performance • The reference dataset was run as a single job • Athena clock time was 70 minutes • I.e. 43 ms/event, 3.0 MB/s • Actual data transfer is about half that value • Some of the event data is not read • Following figure shows results • Local fast queue (LSF) • Green squares • Local short queue (Condor preemptive) • Blue triangles • Condor-G to local fast • Red diamonds • PANDA • Violet circles

  37. CMS Distributed Computing • Distributed model for computing in CMS • Cope with computing requirements for storage, processing and analysis of data provided by the experiment • Computing resources are geographically distributed, interconnected via high throughput networks and operated by means of Grid software • Running expectations • Beam time: 2-3x106 secs in 2007, 107 secs in 2008, 2009 and 2010 • Detector output rate: ~250 MB/s  2.5 PetaBytes raw data in 2008 • Aggregate computing resources required • CMS computing model document (CERN-LHCC-2004-035) • CMS computing TDR released on June 2005 (CERN-LHCC-2005-023)

  38. Resources and data flows in 2008 40 MB/s (RAW, RECO, AOD) Tier 1 2.5 MSI2K 0.8 PB disk 2.2 PB tape 10 Gbps WAN Tier-0 AOD Tier-1s tape AOD 48 MB/s (MC) 280 MB/s (RAW, RECO, AOD) Tier-2s 240 MB/s (skimmed AOD, Some RAW+RECO) 280 MB/s (RAW, RECO, AOD) 900 MB/s (AOD skimming, data reprocessing) Tier 0 4.6 MSI2K 0.4 PB disk 4.9 PB tape 5 Gbps WAN 225 MB/s (RAW) Tier-1s WNs 60 MB/s (skimmed AOD, Some RAW+RECO) Tier 2 0.9 MSI2K 0.2 PB disk 1Gbps WAN 225 MB/s (RAW) Tier-1 WNs 12 MB/s (MC) Up to 1 GB/s (AOD analysis, calibration) WNs

  39. FNAL 64 bit Tests • Benchmark tests of single/dual cores (32 and 64 bit OS/applications) • Dual cores provide 2x improvement over single core (same as BNL tests) • Better performance with 64/64 (app dependent) • Dual cores provides 2x improvement in performance/watt compared to single core

  40. Network Infrastructure • Harvey Newmann’s talk • 10 Gbs backbone becoming widespread, move to 10’s (100’s?) Gbs in LHC era • PC’s moving in similar direction • Digital divide (Europe/US/Japan compared to rest of the world) • Next CHEP in Victoria, BC (Sep. 07)

More Related