chep 06 highlights n.
Skip this Video
Loading SlideShow in 5 Seconds..
CHEP’06 Highlights PowerPoint Presentation
Download Presentation
CHEP’06 Highlights

Loading in 2 Seconds...

play fullscreen
1 / 43

CHEP’06 Highlights - PowerPoint PPT Presentation

  • Uploaded on

CHEP’06 Highlights. Tony Chan. CHEP’06 Highlights. 478 registered participants 467 submitted abstracts President of India address Warm temperatures (90+ degrees) Traveler’s diarrhea, mosquitoes, etc. CHEP’06 Highlights. LHC status Status of various computer facilities

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'CHEP’06 Highlights' - delano

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chep 06 highlights1
CHEP’06 Highlights
  • 478 registered participants
  • 467 submitted abstracts
  • President of India address
  • Warm temperatures (90+ degrees)
  • Traveler’s diarrhea, mosquitoes, etc
chep 06 highlights2
CHEP’06 Highlights
  • LHC status
  • Status of various computer facilities
  • Grid Middleware reports
  • Distributed computing models
  • Other interesting reports

Barrel Toroid installation status

The mechanical installation is complete, electrical and cryogenic connections are

being made now, for a first in-situ cool-down and excitation test in spring 2006

building the service







First beams

First physics


Full physics


Building the Service

SC1 -Nov04-Jan05 - data transfer between CERN and three Tier-1s (FNAL, NIKHEF, FZK)

SC2 –Apr05 - data distribution from CERN to 7 Tier-1s – 600 MB/sec sustained for 10 days (one third of final nominal rate)

SC3 –Sep-Dec05 - demonstrate reliable basic service – most Tier-1s, some Tier-2s; push up Tier-1 data rates to 150 MB/sec (60 MB/sec to tape)

SC4 –May-Aug06 - demonstrate full service – all Tier-1s, major Tier-2s; full set of baseline services; data distribution and recording at nominal LHC rate (1.6 GB/sec)

LHC Service in operation– Sep06 – over following six months ramp up to full operational capacity & performance

LHC service commissioned – Apr07


The LHC project (machine; detectors; LCG) is well underway for physics in 2007

Detector construction is generally proceeding well, although not without concerns in some cases; an enormous integration/installation effort is ongoing – schedules are tight but are also taken very seriously.

LCG (like machine and detectors at a technological level that defines the new ‘state of the art’) needs to fully develop the functionality required; new ‘paradigm’.

Large potential for exciting physics.

status of fnal tier 1
Status of FNAL Tier 1
  • Sole Tier 1 in the Americas for CSM
  • 2006 is first year of 3-year procurement ramp-up
  • Currently have 1MSI2K, 100 TB dCache storage, single 10 Gb link
  • Expect to have by 2008:
    • 4.3 MSI2K (2000 CPU’s)
    • 2 PB storage (200 servers, 1600 MB/s I/O)
    • 15 Gb/s between FNAL and CERN
    • 30 FTE
status of fnal tier 1 cont
Status of FNAL Tier 1 (cont.)
  • Supports both LCG and OSG
  • 50% usage by local (450+) users, 50% by grid
  • Batch switched to Condor in 2005 – scaling well so far
  • Enstore/dCache deployed
  • dCache performed well in stress test (2-3 GB/s, 200 TB/day)
  • SRM v.2 to be deployed for dCache storage element in early 2006
other facilities
Other Facilities
  • Tier 2 center in Manchester  scalable remote cluster management & monitoring and provisioning software (nagios, cfengine, kickstart)
  • Indiana/Chicago USATLAS Tier 2 center
  • RAL Tier 1 center
multi core cpus root
Multi Core CPUs & ROOT

This is going to affect the evolution of ROOT in many areas

moore s law revisited
Moore’s law revisited

Your laptop in 2016 with

32 processors

16 Gbytes RAM

16 Tbytes disk

> 50 today’s laptop


Impact on ROOT

  • There are many areas in ROOT that can benefit from a multi core architecture. Because the hardware is becoming available on commodity laptops, it is urgent to implement the most obvious asap.
  • Multi-Core often implies multi-threading. There are several areas to be made not only thread-safe but also thread aware.
    • PROOF obvious candidate. By default a ROOT interactive session should run in PROOF mode. It would be nice if this could be made totally transparent to a user.
    • Speed-up I/O with multi-threaded I/O and read-ahead
    • Buffer compression in parallel
    • Minimization function in parallel
    • Interactive compilation with ACLIC in parallel
    • etc..
gridview project goal
Gridview Project Goal
  • Provide a high level view of the various Grid resources and functional aspects of the LCG
  • Central Archival, Analysis, Summarization Graphical Presentation and Pictorial Visualization of Data from various LCG sites and monitoring tools
  • Useful in GOCs/ROCs and to site admins/VO admins
gridview architecture
Gridview Architecture
  • Loosely coupled components with independent sensors, transport, archival, analysis and visualization components.
  • Sensors are the various LCG information providers and monitoring tools at sites
  • Transport used is R-GMA
  • Gridview provides Archival, Analysis and Visualization
on going work in gridview
On-Going work in Gridview
  • Service Availability Monitoring
    • Being interfaced with SFT (Site Functional Tests) for monitoring availability of various services such as CE, SE, RB, BDII etc.
    • Rating of sites according to average resource availability and acceptable thresholds
    • Service availability metrics such as MTTR, uptime, failure rate to be computed and visualised
  • gLite FTS
    • Gridview to be adapted to monitor file transfer statistics like successful transfers, failure rates etc for FTS channels across grid sites
  • Enhancement of GUI & Visualisation module to function as full-fledged dashboard for LCG
introduction terapaths
Introduction (Terapaths)
  • The problem: support efficient/reliable/predictable peta-scale data movement in modern high-speed networks
    • Multiple data flows with varying priority
    • Default “best effort” network behavior can cause performance and service disruption problems
  • Solution: enhance network functionality with QoS features to allow prioritization and protection of data flows
the terapaths project
The TeraPaths Project
  • The TeraPaths project investigates the integration and use of LAN QoS and MPLS/GMPLS-based differentiated network services in the ATLAS data intensive distributed computing environment in order to manage the network as a critical resource
  • DOE: The collaboration includes BNL and the University of Michigan, as well as OSCARS (ESnet), LambdaStation (FNAL), and DWMI (SLAC)
  • NSF: BNL participates in UltraLight to provide the network advances required in enabling petabyte-scale analysis of globally distributed data
  • NSF: BNL participates in a new network initiative: PLaNetS (Physics Lambda Network System ), led by CalTech
  • New version (availability unknown?)
  • Features
    • Resilient dCache (n < copies < m)
    • SRM v2
    • Partitioning (one instance, multiple pool configurations)
    • Support for xrootd protocol
  • Performance
    • multiple I/O queues
    • multiple file system servers
computing resources atlas
Computing Resources (ATLAS)
  • Computing Model fairly well evolved, documented in C-TDR
    • Externally reviewed
  • There are (and will remain for some time) many unknowns
    • Calibration and alignment strategy is still evolving
    • Physics data access patterns MAY be exercised from June
      • Unlikely to know the real patterns until 2007/2008!
    • Still uncertainties on the event sizes , reconstruction time
  • Lesson from the previous round of experiments at CERN (LEP, 1989-2000)
    • Reviews in 1988 underestimated the computing requirements by an order of magnitude!
atlas facilities
ATLAS Facilities
  • Event Filter Farm at CERN
    • Located near the Experiment, assembles data into a stream to the Tier 0 Center
  • Tier 0 Center at CERN
    • Raw data  Mass storage at CERN and to Tier 1 centers
    • Swift production of Event Summary Data (ESD) and Analysis Object Data (AOD)
    • Ship ESD, AOD to Tier 1 centers  Mass storage at CERN
  • Tier 1 Centers distributed worldwide (10 centers)
    • Re-reconstruction of raw data, producing new ESD, AOD
    • Scheduled, group access to full ESD and AOD
  • Tier 2 Centers distributed worldwide (approximately 30 centers)
    • Monte Carlo Simulation, producing ESD, AOD, ESD, AOD  Tier 1 centers
    • On demand user physics analysis
  • CERN Analysis Facility
    • Analysis
    • Heightened access to ESD and RAW/calibration data on demand
  • Tier 3 Centers distributed worldwide
    • Physics analysis
  • Tier-0:
    • Prompt first pass processing on express/calibration physics stream
    • 24-48 hours later, process full physics data stream with reasonable calibrations
      • Implies large data movement from T0 →T1s
  • Tier-1:
    • Reprocess 1-2 months after arrival with better calibrations
    • Reprocess all resident RAW at year end with improved calibration and software
      • Implies large data movement from T1↔T1 and T1 → T2










ATLAS Prodsys










analysis model
Analysis model

Analysis model broken into two components

  • Scheduled central production of augmented AOD, tuples & TAG collections from ESD
  • Derived files moved to other T1s and to T2s
  • Chaotic user analysis of augmented AOD streams, tuples, new selections etc and individual user simulation and CPU-bound tasks matching the official MC production
  • Modest job traffic between T2s
initial experiences
Initial experiences
  • PANDA on OSG
  • Analysis with the Production System
  • Systems have been exposed to selected users
    • Positive feedback
    • Direct contact to the experts still essential
    • For this year – power users and grid experts …
  • Main issues
    • Data distribution → New DDM
    • Scalability → New Prodsys/PANDA/gLite/CondorG
    • Analysis in parallel to Production → Job Priorities
dial performance
DIAL Performance
  • The reference dataset was run as a single job
    • Athena clock time was 70 minutes
      • I.e. 43 ms/event, 3.0 MB/s
      • Actual data transfer is about half that value
        • Some of the event data is not read
  • Following figure shows results
    • Local fast queue (LSF)
      • Green squares
    • Local short queue (Condor preemptive)
      • Blue triangles
    • Condor-G to local fast
      • Red diamonds
    • PANDA
      • Violet circles
cms distributed computing
CMS Distributed Computing
  • Distributed model for computing in CMS
    • Cope with computing requirements for storage, processing and analysis of data provided by the experiment
    • Computing resources are geographically distributed, interconnected via high throughput networks and operated by means of Grid software
      • Running expectations
        • Beam time: 2-3x106 secs in 2007, 107 secs in 2008, 2009 and 2010
        • Detector output rate: ~250 MB/s  2.5 PetaBytes raw data in 2008
      • Aggregate computing resources required
    • CMS computing model document (CERN-LHCC-2004-035)
    • CMS computing TDR released on June 2005 (CERN-LHCC-2005-023)
resources and data flows in 2008
Resources and data flows in 2008

40 MB/s


Tier 1

2.5 MSI2K

0.8 PB disk

2.2 PB tape

10 Gbps WAN






48 MB/s


280 MB/s



240 MB/s

(skimmed AOD,


280 MB/s


900 MB/s

(AOD skimming,

data reprocessing)

Tier 0

4.6 MSI2K

0.4 PB disk

4.9 PB tape

5 Gbps WAN

225 MB/s




60 MB/s

(skimmed AOD,


Tier 2

0.9 MSI2K

0.2 PB disk

1Gbps WAN

225 MB/s




12 MB/s


Up to 1 GB/s

(AOD analysis,



fnal 64 bit tests
FNAL 64 bit Tests
  • Benchmark tests of single/dual cores (32 and 64 bit OS/applications)
  • Dual cores provide 2x improvement over single core (same as BNL tests)
  • Better performance with 64/64 (app dependent)
  • Dual cores provides 2x improvement in performance/watt compared to single core
network infrastructure
Network Infrastructure
  • Harvey Newmann’s talk
  • 10 Gbs backbone becoming widespread, move to 10’s (100’s?) Gbs in LHC era
  • PC’s moving in similar direction
  • Digital divide (Europe/US/Japan compared to rest of the world)
  • Next CHEP in Victoria, BC (Sep. 07)