slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Focusing on the first beams CHEP 2006 TIFR, Mumbai 15 February 2006 PowerPoint Presentation
Download Presentation
Focusing on the first beams CHEP 2006 TIFR, Mumbai 15 February 2006

Loading in 2 Seconds...

play fullscreen
1 / 29

Focusing on the first beams CHEP 2006 TIFR, Mumbai 15 February 2006 - PowerPoint PPT Presentation


  • 92 Views
  • Uploaded on

Focusing on the first beams CHEP 2006 TIFR, Mumbai 15 February 2006. Jamie Shiers on Monday gave a view of how ready we are for the start-up of LHC This talk puts the LCG service into the context of the evolving HEP and scientific computing environment ..

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Focusing on the first beams CHEP 2006 TIFR, Mumbai 15 February 2006' - reuben-delaney


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Focusing on thefirst beams

CHEP 2006

TIFR, Mumbai

15 February 2006

slide2
Jamie Shiers on Monday gave a view of how ready we are for the start-up of LHC

This talk puts the LCG service into the context of the evolving HEP and scientific computing environment ..

.. looks at where our expectations were and were not fulfilled

.. and outlines where we need to focus our efforts now as we prepare for the first beams

mission of lcg
Mission of LCG

Prepare and deploy the LHC computing environmentto help the experiments analyse the datacoming from the detectors

With a significant Funding Principle – LHC computing resources will NOT be centralised at CERN

And a few external constraints

a bit of history

CERN

Tier-1

Tier-2

--

A bit of history
  • 1999 – the MONARC project
    • A straightforward distributed model
    • An inverted tree with data flowing out along the branches
    • Gave us the Tier nomenclature
  • 2000 - CHEP Padova – growing interest in grid technology
    • HEP community main driver in launching the DataGrid project in Europe
    • PPDG  GriPhyN in the US
    • middleware & testbeds for operational grids
  • 2001 - CHEP Beijing
    • Saw HEP infrastructure projects being prepared for launch -- LCG, national projects
  • 2003 – CHEP San Diego - production grids
    • LCG-1 – integrating a number of national grid infrastructures
    • Grid 3 growing out of a Supercomputer demo
  • 2004 – CHEP Interlaken - expanding to other communities and sciences
    • EU EGEE project with major EU funding - starts from the LCG grid
    • Open Science Grid
the worldwide lcg collaboration
The Worldwide LCG Collaboration
  • Members
    • The experiments
    • The computing centres – Tier-0, Tier-1, Tier-2
  • Memorandum of understanding
    • Resources, services, defined service levels
    • Resource commitments pledged for the next year, with a 5-year forward look
slide6
LCG services – built on two major

science grid infrastructures

EGEE - Enabling Grids for E-Science

OSG - US Open Science Grid

enabling grids for e science
Enabling Grids for E-SciencE
  • EU supported project
  • Develop and operate a multi-science grid
  • Assist scientific communities to embrace grid technology
  • First phase concentrated on operations and technology
  • Second phase (2006-08) Emphasis on extending the scientific, geographical and industrial scope
  • world-wide Grid infrastructure
  • international collaboration
  • in phase 2 will have > 90 partners in 32 countries
applications
Applications

>20 applications from 7 domains

  • High Energy Physics
  • Biomedicine
  • Earth Sciences
  • Computational Chemistry
  • Astronomy
  • Geo-Physics
  • Financial Simulation

Another 8 applications from

4 domains are in evaluation stage

sustainability beyond egee ii
Sustainability: Beyond EGEE-II
  • Need to prepare for permanent Grid infrastructure
    • Maintain Europe’s leading position in global science Grids
    • Ensure a reliable and adaptive support for all sciences
    • Independent of project funding cycles
    • Modelled on success of GÉANT
      • Infrastructure managed centrally in collaboration with National Grid Initiatives
  • Proposal: European Grid Organisation (EGO)
open science grid
Open Science Grid
  • Multi-disciplinary Consortium
    • Running physics experiments: CDF, D0, LIGO, SDSS, STAR
    • US LHC Collaborations
    • Biology, Computational Chemistry
    • Computer Science research
    • Condor and Globus
    • DOE Laboratory Computing Divisions
    • University IT Facilities
  • OSG today
    • 50 Compute Elements
    • 6 Storage Elements
    • VDT 1.3.9
    • 23 VOs
osg funding situation
OSG Funding Situation
  • Core middleware: Condor and Globus supported for 5 more years from NSF.
  • OSG Proposal: 5 year program of work being submitted to multiple program offices in NSF and DOE. Expect to know by summer 2006. Three Thrusts:
    • OSG Facility
    • Education, Training and Outreach
    • Science Driven Extensions.
  • Cooperating Proposals in Parallel being submitted to SciDAC-2 and NSF:
    • dCache extensions within the dCache collaboration.
    • Advanced networks and monitoring.
    • Distributed systems and cybersecurity.
    • Data and storage management;
    • Petastore data analysis systems.
open science grid as part of the worldwide lhc computing grid
Open Science Grid as part of the Worldwide LHC Computing Grid
  • Directly through the CMS & ATLAS US Tier-1 Facilities.
  • Collaborating with EGEE on interoperability of services, operations etc.
  • LCG VOs can be registered on OSG - ATLAS, CMS, Geant3, DTEAM.
  • OSG roadmap and baselines services defined to meet LHC needs and schedule.
processing storage technology expectation v reality

No thought of

GRID in 1996!

Processing & Storage Technology – Expectation v. Reality

Some of the predictions of PASTA I – 1996

  • Processor Conclusions
    • Processor performance in 2005: 4,000 SPECint92 = 1,000 SPECint2000
    • “SMPs with modest number of processors will provide excellent price/performance and will be the basic building block ..”
  • Storage Conclusions
    • “It does not seem likely that alternative technologies such as optical disk, flash memories or holographic storage will provide serious competition for magnetic disk in the LHC time-frame.”
    • “Cheap disk could change the way in which tape based storage is used.”
    • Tape drive performance: “.. a conservative estimate for standard drives would be 50 MB/sec ..”
    • “There may not be a suitable [storage management] product - it may be necessary to implement an HEP solution. “
    • “The use of object databases could simplify the problem [of storage management]”

wide area networks expectation v reality
Wide Area NetworksExpectation v. Reality
  • Monarc Phase 2 Report – March 2000

“.. it should be possible to build a useful distributed architecture computing system provided the availability of CERN->Tier-1 Regional Centre network bandwidth is of the order of 622 Mbps per Regional Centre. This is an important result, as all the projections for the future indicate that such connections should be commonplace in 2005.”

  • The reality in most of the countries involved in LCG is well ahead of the expectations for bandwidth & cost

slide16

LCG

T2

T2

T2

T2

T2

T2

Nordic

T2

T2

T2

T2

T2

T2

Wide Area Network

T2

T2

T2s and T1s are inter-connectedby the general purpose researchnetworks

T2

Any Tier-2 may

access data at

any Tier-1

GridKa

IN2P3

Dedicated10 Gbit links

TRIUMF

Brookhaven

Each T1 with

~10 Gbps to

the local NREN

ASCC

Fermilab

RAL

CNAF

PIC

SARA

affordable heterogeneous cluster management tools

Node

Configuration

Management

Node

Management

Affordable heterogeneous cluster management tools

  • A major concern for CERN was automation of the management of the very large clusters needed for LHC
  • ELFms - Extremely Large Fabric Management System
grid expectations v reality
Grid Expectations v. Reality
  • Grid technology is not the panacea that some of us hoped for in 2000 at CHEP in Padova  the off-the-shelf tools to implement a flexible Monarc model
  • The grid projects have generated wide interest,-- helped build an active community of service providers-- made new computing resources available-- paid for a good deal of HEP operationand largely with non-HEP funding
  • But

.. it has taken longer than we expected to get to a basic computing service that is reasonably reliable

but if we were realists like the gartner group analysts that is exactly what we would have expected
But if we were realists – like the Gartner Group analysts – that is exactly what we would have expected

Gartner Group

HEP Grid on the CHEP timeline

Beijing

San Diego

Victoria?

Padova

Mumbai

Interlaken

production grids what has been achieved
Production GridsWhat has been achieved
  • Basic middleware
  • A set of baseline services agreed and initial versions in production
  • All major LCG sites active
  • 1 GB/sec distribution data rate mass storage to mass storage, > 50% of the nominal LHC data rate
  • Grid job failure rate 5-10% for most experiments,down from ~30% in 2004
  • Sustained 10K jobs per day
  • > 10K simultaneous jobs during prolonged periods
operations process infrastructure
Operations Process & Infrastructure
  • EGEE operation
    • Started November 2004
    • There is no “central” operation – but 6 teams working in weekly rotation
      • CERN, IN2P3, CNAF, RAL, Russia,Taipei
    • Monitoring and alarm management tools
    • Crucial in improving site stability and management
  • OSG Operations Centre in Indiana
  • Joint workshops EGEE/OSG examining common procedures
service metrics

total grid sites

number of sites passing the SFT tests

Log data lost

Service Metrics
  • Grid level accounting in place
  • Site Functional Test (SFT) framework
    • Regular testing of basic services, .. VO-specific tests
  • Framework for service level monitoring (MoU)
  • Marked improvement in site availability since introduced
  • Investigating using SFT framework in OSG
lcg service deadlines

Pilot Services – stable service from 1 June 06

LHC Service in operation– 1 Oct 06over following six months ramp up to full operational capacity & performance

2006

cosmics

LHC service commissioned – 1 Apr 07

first physics

2007

full physics

run

2008

LCG Service Deadlines

Service Challenge 4

priorities
Priorities
  • There are a lot of energetic and imaginative people involved in this enterprise -- there is a real risk that we are still trying to be do things that are too complicated, while simple and robust models have not yet been demonstrated
  • There are very few people with the right knowledge and in the right place to work on some of the really important things
  • Operating reliable services is more difficult than it looks – after the hard work of debugging & automation has been done.. and operating distributed services is even more difficult

 Now – 2006 - we must be very clear about the priorities

service challenges
Service Challenges
  • Purpose
    • Understand what it takes to operate a real grid service – run for days/weeks at a time (not just limited to experiment Data Challenges)
    • Trigger and verify Tier1 & large Tier-2 planning and deployment – - tested with realistic usage patterns
    • Get the essential grid services ramped up to target levels of reliability, availability, scalability, end-to-end performance
  • Four progressive steps from October 2004 thru September 2006
    • End 2004 - SC1 – data transfer to subset of Tier-1s
    • Spring 2005 – SC2 – include mass storage, all Tier-1s, some Tier-2s
    • 2nd half 2005 – SC3 – Tier-1s, >20 Tier-2s –first set of baseline services
    • Jun-Sep 2006 – SC4 – pilot service
sc4 the pilot lhc service from june 2006
SC4 – the Pilot LHC Service from June 2006
  • Must be able to support a demonstration of the complete chain
  • DAQ  Tier-0  Tier-1data recording, calibration, reconstruction
  • simulation, batch and end-user analysisTier-1  Tier-2 data exchange
  • Service metrics  MoU service levels
  • Extension of the service to most Tier-2 sites
sc4 planning
SC4 Planning
  • 3-day workshop just prior to CHEP to finalise the planning
  • We have just about enough now to underlie a basic physics service
    • Functionality - modest evolution from current service
    • Deploying software that is already in the hands of the integration and test team
    • Focus on reliability, performance
  • Some functions still have to be provided by the experiments
  • In the longer term the evolution must continue, with additional services and enhancements
  • But now is the time to concentrate on what may be the hardest part of any complicated distributed system – making it work
medium term

New

functionality

Evaluation

&

developmentcycles

Possible

components

for later

years

Additional

planned

Functionality

to be agreed

& completedin the nextfew months

then - testeddeployed

Subject to progress& experience

??

Medium Term

SRM 2

test and deployment

Plan being

elaborated

SC4

3D

distributed

database

services

development

test

October?

summary
Summary
  • Two grid infrastructures are now in operation, on which we are able to complete the computing services for LHC
  • Reliability and performance have improved significantly over the past year
  • The focus of Service Challenge 4 is to demonstrate a basic but reliable service that can be scaled up by April 2007 to the capacity and performance needed for the first beams.
  • Development of new functionality andservices must continue, but we must be careful that this does not interferewith the main priority for this year –

reliable operation of the baseline services