chep 06 highlights n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CHEP’06 Highlights PowerPoint Presentation
Download Presentation
CHEP’06 Highlights

Loading in 2 Seconds...

play fullscreen
1 / 43

CHEP’06 Highlights - PowerPoint PPT Presentation


  • 253 Views
  • Uploaded on

CHEP’06 Highlights. Tony Chan. CHEP’06 Highlights. 478 registered participants 467 submitted abstracts President of India address Warm temperatures (90+ degrees) Traveler’s diarrhea, mosquitoes, etc. CHEP’06 Highlights. LHC status Status of various computer facilities

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CHEP’06 Highlights' - delano


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
chep 06 highlights1
CHEP’06 Highlights
  • 478 registered participants
  • 467 submitted abstracts
  • President of India address
  • Warm temperatures (90+ degrees)
  • Traveler’s diarrhea, mosquitoes, etc
chep 06 highlights2
CHEP’06 Highlights
  • LHC status
  • Status of various computer facilities
  • Grid Middleware reports
  • Distributed computing models
  • Other interesting reports
slide5

Barrel Toroid installation status

The mechanical installation is complete, electrical and cryogenic connections are

being made now, for a first in-situ cool-down and excitation test in spring 2006

building the service

LCG

2005

today

2006

cosmics

2007

First beams

First physics

2008

Full physics

run

Building the Service

SC1 -Nov04-Jan05 - data transfer between CERN and three Tier-1s (FNAL, NIKHEF, FZK)

SC2 –Apr05 - data distribution from CERN to 7 Tier-1s – 600 MB/sec sustained for 10 days (one third of final nominal rate)

SC3 –Sep-Dec05 - demonstrate reliable basic service – most Tier-1s, some Tier-2s; push up Tier-1 data rates to 150 MB/sec (60 MB/sec to tape)

SC4 –May-Aug06 - demonstrate full service – all Tier-1s, major Tier-2s; full set of baseline services; data distribution and recording at nominal LHC rate (1.6 GB/sec)

LHC Service in operation– Sep06 – over following six months ramp up to full operational capacity & performance

LHC service commissioned – Apr07

conclusions
Conclusions

The LHC project (machine; detectors; LCG) is well underway for physics in 2007

Detector construction is generally proceeding well, although not without concerns in some cases; an enormous integration/installation effort is ongoing – schedules are tight but are also taken very seriously.

LCG (like machine and detectors at a technological level that defines the new ‘state of the art’) needs to fully develop the functionality required; new ‘paradigm’.

Large potential for exciting physics.

status of fnal tier 1
Status of FNAL Tier 1
  • Sole Tier 1 in the Americas for CSM
  • 2006 is first year of 3-year procurement ramp-up
  • Currently have 1MSI2K, 100 TB dCache storage, single 10 Gb link
  • Expect to have by 2008:
    • 4.3 MSI2K (2000 CPU’s)
    • 2 PB storage (200 servers, 1600 MB/s I/O)
    • 15 Gb/s between FNAL and CERN
    • 30 FTE
status of fnal tier 1 cont
Status of FNAL Tier 1 (cont.)
  • Supports both LCG and OSG
  • 50% usage by local (450+) users, 50% by grid
  • Batch switched to Condor in 2005 – scaling well so far
  • Enstore/dCache deployed
  • dCache performed well in stress test (2-3 GB/s, 200 TB/day)
  • SRM v.2 to be deployed for dCache storage element in early 2006
other facilities
Other Facilities
  • Tier 2 center in Manchester  scalable remote cluster management & monitoring and provisioning software (nagios, cfengine, kickstart)
  • Indiana/Chicago USATLAS Tier 2 center
  • RAL Tier 1 center
multi core cpus root
Multi Core CPUs & ROOT

http://www.intel.com/technology/computing/archinnov/platform2015/

This is going to affect the evolution of ROOT in many areas

moore s law revisited
Moore’s law revisited

Your laptop in 2016 with

32 processors

16 Gbytes RAM

16 Tbytes disk

> 50 today’s laptop

slide17

Impact on ROOT

  • There are many areas in ROOT that can benefit from a multi core architecture. Because the hardware is becoming available on commodity laptops, it is urgent to implement the most obvious asap.
  • Multi-Core often implies multi-threading. There are several areas to be made not only thread-safe but also thread aware.
    • PROOF obvious candidate. By default a ROOT interactive session should run in PROOF mode. It would be nice if this could be made totally transparent to a user.
    • Speed-up I/O with multi-threaded I/O and read-ahead
    • Buffer compression in parallel
    • Minimization function in parallel
    • Interactive compilation with ACLIC in parallel
    • etc..
gridview project goal
Gridview Project Goal
  • Provide a high level view of the various Grid resources and functional aspects of the LCG
  • Central Archival, Analysis, Summarization Graphical Presentation and Pictorial Visualization of Data from various LCG sites and monitoring tools
  • Useful in GOCs/ROCs and to site admins/VO admins
gridview architecture
Gridview Architecture
  • Loosely coupled components with independent sensors, transport, archival, analysis and visualization components.
  • Sensors are the various LCG information providers and monitoring tools at sites
  • Transport used is R-GMA
  • Gridview provides Archival, Analysis and Visualization
on going work in gridview
On-Going work in Gridview
  • Service Availability Monitoring
    • Being interfaced with SFT (Site Functional Tests) for monitoring availability of various services such as CE, SE, RB, BDII etc.
    • Rating of sites according to average resource availability and acceptable thresholds
    • Service availability metrics such as MTTR, uptime, failure rate to be computed and visualised
  • gLite FTS
    • Gridview to be adapted to monitor file transfer statistics like successful transfers, failure rates etc for FTS channels across grid sites
  • Enhancement of GUI & Visualisation module to function as full-fledged dashboard for LCG
introduction terapaths
Introduction (Terapaths)
  • The problem: support efficient/reliable/predictable peta-scale data movement in modern high-speed networks
    • Multiple data flows with varying priority
    • Default “best effort” network behavior can cause performance and service disruption problems
  • Solution: enhance network functionality with QoS features to allow prioritization and protection of data flows
the terapaths project
The TeraPaths Project
  • The TeraPaths project investigates the integration and use of LAN QoS and MPLS/GMPLS-based differentiated network services in the ATLAS data intensive distributed computing environment in order to manage the network as a critical resource
  • DOE: The collaboration includes BNL and the University of Michigan, as well as OSCARS (ESnet), LambdaStation (FNAL), and DWMI (SLAC)
  • NSF: BNL participates in UltraLight to provide the network advances required in enabling petabyte-scale analysis of globally distributed data
  • NSF: BNL participates in a new network initiative: PLaNetS (Physics Lambda Network System ), led by CalTech
dcache
dCache
  • New version (availability unknown?)
  • Features
    • Resilient dCache (n < copies < m)
    • SRM v2
    • Partitioning (one instance, multiple pool configurations)
    • Support for xrootd protocol
  • Performance
    • multiple I/O queues
    • multiple file system servers
computing resources atlas
Computing Resources (ATLAS)
  • Computing Model fairly well evolved, documented in C-TDR
    • Externally reviewed
    • http://doc.cern.ch//archive/electronic/cern/preprints/lhcc/public/lhcc-2005-022.pdf
  • There are (and will remain for some time) many unknowns
    • Calibration and alignment strategy is still evolving
    • Physics data access patterns MAY be exercised from June
      • Unlikely to know the real patterns until 2007/2008!
    • Still uncertainties on the event sizes , reconstruction time
  • Lesson from the previous round of experiments at CERN (LEP, 1989-2000)
    • Reviews in 1988 underestimated the computing requirements by an order of magnitude!
atlas facilities
ATLAS Facilities
  • Event Filter Farm at CERN
    • Located near the Experiment, assembles data into a stream to the Tier 0 Center
  • Tier 0 Center at CERN
    • Raw data  Mass storage at CERN and to Tier 1 centers
    • Swift production of Event Summary Data (ESD) and Analysis Object Data (AOD)
    • Ship ESD, AOD to Tier 1 centers  Mass storage at CERN
  • Tier 1 Centers distributed worldwide (10 centers)
    • Re-reconstruction of raw data, producing new ESD, AOD
    • Scheduled, group access to full ESD and AOD
  • Tier 2 Centers distributed worldwide (approximately 30 centers)
    • Monte Carlo Simulation, producing ESD, AOD, ESD, AOD  Tier 1 centers
    • On demand user physics analysis
  • CERN Analysis Facility
    • Analysis
    • Heightened access to ESD and RAW/calibration data on demand
  • Tier 3 Centers distributed worldwide
    • Physics analysis
processing
Processing
  • Tier-0:
    • Prompt first pass processing on express/calibration physics stream
    • 24-48 hours later, process full physics data stream with reasonable calibrations
      • Implies large data movement from T0 →T1s
  • Tier-1:
    • Reprocess 1-2 months after arrival with better calibrations
    • Reprocess all resident RAW at year end with improved calibration and software
      • Implies large data movement from T1↔T1 and T1 → T2
slide31

Dulcinea

Dulcinea

CE

Dulcinea

Dulcinea

Lexor

CondorG

CE

ProdDB

ATLAS Prodsys

Dulcinea

PANDA

Dulcinea

Dulcinea

RB

CG

RB

RB

CE

analysis model
Analysis model

Analysis model broken into two components

  • Scheduled central production of augmented AOD, tuples & TAG collections from ESD
  • Derived files moved to other T1s and to T2s
  • Chaotic user analysis of augmented AOD streams, tuples, new selections etc and individual user simulation and CPU-bound tasks matching the official MC production
  • Modest job traffic between T2s
initial experiences
Initial experiences
  • PANDA on OSG
  • Analysis with the Production System
  • GANGA
summary
Summary
  • Systems have been exposed to selected users
    • Positive feedback
    • Direct contact to the experts still essential
    • For this year – power users and grid experts …
  • Main issues
    • Data distribution → New DDM
    • Scalability → New Prodsys/PANDA/gLite/CondorG
    • Analysis in parallel to Production → Job Priorities
dial performance
DIAL Performance
  • The reference dataset was run as a single job
    • Athena clock time was 70 minutes
      • I.e. 43 ms/event, 3.0 MB/s
      • Actual data transfer is about half that value
        • Some of the event data is not read
  • Following figure shows results
    • Local fast queue (LSF)
      • Green squares
    • Local short queue (Condor preemptive)
      • Blue triangles
    • Condor-G to local fast
      • Red diamonds
    • PANDA
      • Violet circles
cms distributed computing
CMS Distributed Computing
  • Distributed model for computing in CMS
    • Cope with computing requirements for storage, processing and analysis of data provided by the experiment
    • Computing resources are geographically distributed, interconnected via high throughput networks and operated by means of Grid software
      • Running expectations
        • Beam time: 2-3x106 secs in 2007, 107 secs in 2008, 2009 and 2010
        • Detector output rate: ~250 MB/s  2.5 PetaBytes raw data in 2008
      • Aggregate computing resources required
    • CMS computing model document (CERN-LHCC-2004-035)
    • CMS computing TDR released on June 2005 (CERN-LHCC-2005-023)
resources and data flows in 2008
Resources and data flows in 2008

40 MB/s

(RAW, RECO, AOD)

Tier 1

2.5 MSI2K

0.8 PB disk

2.2 PB tape

10 Gbps WAN

Tier-0

AOD

Tier-1s

tape

AOD

48 MB/s

(MC)

280 MB/s

(RAW, RECO, AOD)

Tier-2s

240 MB/s

(skimmed AOD,

Some RAW+RECO)

280 MB/s

(RAW, RECO, AOD)

900 MB/s

(AOD skimming,

data reprocessing)

Tier 0

4.6 MSI2K

0.4 PB disk

4.9 PB tape

5 Gbps WAN

225 MB/s

(RAW)

Tier-1s

WNs

60 MB/s

(skimmed AOD,

Some RAW+RECO)

Tier 2

0.9 MSI2K

0.2 PB disk

1Gbps WAN

225 MB/s

(RAW)

Tier-1

WNs

12 MB/s

(MC)

Up to 1 GB/s

(AOD analysis,

calibration)

WNs

fnal 64 bit tests
FNAL 64 bit Tests
  • Benchmark tests of single/dual cores (32 and 64 bit OS/applications)
  • Dual cores provide 2x improvement over single core (same as BNL tests)
  • Better performance with 64/64 (app dependent)
  • Dual cores provides 2x improvement in performance/watt compared to single core
network infrastructure
Network Infrastructure
  • Harvey Newmann’s talk
  • 10 Gbs backbone becoming widespread, move to 10’s (100’s?) Gbs in LHC era
  • PC’s moving in similar direction
  • Digital divide (Europe/US/Japan compared to rest of the world)
  • Next CHEP in Victoria, BC (Sep. 07)