cms software computing
Download
Skip this Video
Download Presentation
CMS Software & Computing

Loading in 2 Seconds...

play fullscreen
1 / 32

CMS Software & Computing - PowerPoint PPT Presentation


  • 240 Views
  • Uploaded on

CMS Software & Computing. C. Charlot / LLR-École Polytechnique, CNRS & IN2P3 for the CMS collaboration. The Context. LHC challenges Data Handling & Analysis Analysis environments Requirements & constraints. Challenges: Complexity. Detector:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CMS Software & Computing' - Sharon_Dale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cms software computing

CMS Software & Computing

C. Charlot / LLR-École Polytechnique, CNRS & IN2P3

for the CMS collaboration

the context

The Context

LHC challenges

Data Handling & Analysis

Analysis environments

Requirements & constraints

challenges complexity
Challenges: Complexity

Detector:

~2 orders of magnitude more channels than today

Triggers must choose correctly only 1 event in every 400,000

Level 2&3 triggers are software-based (reliability)

  • Computer resources
  • will not be available
  • in a single location

ACAT02, 24-28 june 2002, Moscow

challenges geographical spread
Challenges: Geographical Spread

1700 Physicists

150 Institutes

32 Countries

CERN state 55 %

NMS 45 %

~ 500 physicists analysing data

in 20 physics groups

  • Major challenges associated with:

Communication and collaboration at a distance

Distribution of existing/future computing resources

Remote software development and physics analysis

ACAT02, 24-28 june 2002, Moscow

hep experiment data analysis
HEP Experiment-Data Analysis

Quasi-online

Reconstruction

Environmental data

Detector Control

Online Monitoring

store

Request part

of event

Store rec-Obj

Request part

of event

Event Filter

Object Formatter

Request part

of event

store

Persistent Object Store Manager

Database Management System

Store rec-Obj

and calibrations

Physics

Paper

store

Request part of event

Data Quality

Calibrations

Group Analysis

Simulation

User Analysis

on demand

ACAT02, 24-28 june 2002, Moscow

data handling baseline
Data handling baseline

CMS data model for computing in year 2007

  • typical objects 1KB-1MB
  • 3 PB of storage space
  • 10,000 CPUs
  • Hierarchy of sites:

1 tier0+5 tier1+25 tier2

all over the world

  • Network bw between sites

.6-2.5Gbit/s

ACAT02, 24-28 june 2002, Moscow

analysis environments
Data

Code

Analysis environments

Real Time Event Filtering and Monitoring

    • Data driven pipeline
    • Emphasis on efficiency (keep up with rate!) and reliability

Simulation, Reconstruction and Event Classification

    • Massive parallel batch-sequential process
    • Emphasis on automation, bookkeeping, error recovery and rollback mechanisms

Interactive Statistical Analysis

    • Rapid Application Development environment
    • Efficient visualization and browsing tools
    • Easy of use for every physicist
  • Boundaries between environments are fuzzy
    • e.g. physics analysis algorithms will migrate to the online to make the trigger more selective

ACAT02, 24-28 june 2002, Moscow

architecture overview
Architecture Overview

Data Browser

Generic analysis

Tools

GRID

Distributed

Data Store

& Computing

Infrastructure

Analysis job

wizards

ODBMS

tools

ORCA

COBRA

OSCAR

FAMOS

Detector/Event

Display

CMS

tools

Federation

wizards

Coherent set of basic tools and mechanisms

Software development

and installation

Consistent

User Interface

ACAT02, 24-28 june 2002, Moscow

today

TODAY

Data production and analysis challenges

Transition to Root/IO

Ongoing work on baseline software

cms production stream
CMS Production stream

ACAT02, 24-28 june 2002, Moscow

production 2002 the scales
Production 2002: the scales

ACAT02, 24-28 june 2002, Moscow

production center setup
12 MB/s

~60 MB/s

client

client

client

client

Pile-up server

client

Pile-up

DB

Production center setup

Most critical task is digitization

  • 300 KB per pile-up event
  • 200 pile-up events per signal event  60 MB
  • 10 s to digitize 1 full event on a 1 GHz CPU
  • 6 MB / s per CPU (12 MB / s per dual processor client)
  • Up to ~ 5 clients per pile-up server (~ 60 MB / s on its network card Gigabit)
  • Fast disk access

~5 clients

per

server

ACAT02, 24-28 june 2002, Moscow

slide13
Spring02: production summary

6M

CMSIM:

1.2 seconds / event for 4 months

requested

Nbr of events

produced

3.5M

requested

February 8

May 31

1034

produced

High luminosity Digitization:

1.4 seconds / event for 2 months

April 19

June 7

ACAT02, 24-28 june 2002, Moscow

production processing
Regional Center

IMPALA

decomposition

(Job scripts)

“Produce 100000 events

dataset mu_MB2mu_pt4”

Production

manager

coordinates

tasks distribution to

Regional Centers

RC farm

JOBS

Production

Interface

RC

BOSS

DB

Farm

storage

Data location

through

Production DB

Production

“RefDB”

IMPALA

monitoring

(Job scripts)

Request

Summary

file

Production processing

ACAT02, 24-28 june 2002, Moscow

refdb assignement interface
RefDB Assignement Interface
  • Selection of a set of Requests and their Assignment to an RC
      • the RC contact persons get an automatic email with the assignment ID to be used as argument to IMPALA scripts (“DeclareCMKINJobs.sh -a “)
  • Re-assignment of a Request to another RC or production site
  • List and Status of Assignments

ACAT02, 24-28 june 2002, Moscow

impala
IMPALA
  • Data product is a DataSet (typically few 100 jobs)
  • Impala performs production task decomposition and script generation
    • Each step in the production chain is split into 3 sub-steps
    • Each sub-step is factorized into customizable functions

JobDeclaration

Search for something to do

JobCreation

Generate jobs from templates

JobSubmission

Submit jobs to the scheduler

ACAT02, 24-28 june 2002, Moscow

job declaration creation submission
Job declaration, creation, submission
  • Jobs to-do are automatically discovered:
    • looking for output of previous step at predefined directory for the Fortran Steps
    • querying the Objectivity/DB federation for Digitization, Event Selection, Analysis
  • Once the to-dolist is ready, the site manager can actually generate instances of jobs starting from a template
  • Job execution includes validation of produced data
  • Thank to the sub-step decomposition into customizable functions site managers can:
    • Define local actions to be taken to submit the job (local job scheduler specificities, queues, ..)
    • Define local actions to be taken before and after the start of the job (staging input, staging output from MSS)
  • Auto-recovery of crashed jobs
    • Input parameters are automatically changed to restart job at crash point

ACAT02, 24-28 june 2002, Moscow

boss job monitoring
Wrapper

farm node

farm node

BOSS job monitoring
  • Accepts job submission from users
  • Stores info about job in a DB
  • Builds a wrapper around the job (BossExecuter)
  • Sends the wrapper to the local scheduler
  • The wrapper sends to the DB info about the job

BOSS

Local

Scheduler

boss submit

boss query

boss kill

BOSS

DB

ACAT02, 24-28 june 2002, Moscow

getting info from the job
BossExecuter

get job info from DB

create & go to workdir

run preprocess

update DB

fork user executable

fork monitor

wait for user exec.

kill monitor

run postprocess

update DB

exit

BossMonitor

get job info from DB

while(user exec is running)

run runtimeprocess

update DB

wait some time

exit

Getting info from the job
  • A registered job has scripts associated to it which are able to understand the job output

User’s executable

ACAT02, 24-28 june 2002, Moscow

cms transition to root io
CMS transition to ROOT/IO
  • CMS work up to now with Objectivity
    • We manage to make it work, at least for production
      • Painful to operate, a lot of human intervention needed
    • Now being phased out, to be replaced by LCG software
  • Hence being in a major transition phase
    • Prototypes using ROOT+RDBMS layer being worked on
    • This is done within LCG context (persistency RTAG)
    • Aim to start testing new system as it becomes available
      • Target early 2003 for first realistic tests

ACAT02, 24-28 june 2002, Moscow

oscar geant4 simulation
OSCAR: Geant4 simulation
  • CMS plan is to replace cmsim (G3) by OSCAR (G4)
  • A lot of work since last year
    • Many problems from the G4 side have corrected
    • Now integrated in the analysis chain

Generator->OSCAR->ORCA using COBRA persistency

    • Under geometry & physics validation

Overall is rather good

  • Still more to do before using it in production

SimTrack

Cmsim 122

OSCAR 1 3 2 pre 03

HitsAssoc

ACAT02, 24-28 june 2002, Moscow

oscar track finding
OSCAR: Track Finding
  • Number of rechits/simhits per track vs eta

RecHits

SimHits

ACAT02, 24-28 june 2002, Moscow

detector description database
Detector Description Database
  • Several applications (simulation, reconstruction, visualization) needed geometry services
    • Use a common interface to all services
  • On the other hand several detector description sources currently in use
    • Use a unique internal representation derived from the sources
  • Prototype now existing
    • co-works with OSCAR
    • co-works with ORCA (Tracker, Muons)

ACAT02, 24-28 june 2002, Moscow

orca visualization
ORCA Visualization
  • IGUANA framework for visualization
  • 3D visualization
    • mutliple views, slices, 2D proj, zoom
  • Co-works with ORCA
    • Interactive 3D detector geometry for sensitive volumes
    • Interactive 3D representations of reconstructed and simulated events, including display of physics quantities
    • Access event by event or automatically fetching events
    • Event and run numbers

ACAT02, 24-28 june 2002, Moscow

tomorrow

TOMORROW

Deployment of a distributed data system

Evolve software framework to match with LCG components

Ramp up computing systems

toward one grid
Toward ONE Grid
  • Build a unique CMS-GRID framework (EU+US)
  • EU and US grids not interoperable today. Need for help from the various Grid projects and middleware experts
    • Work in parallel in EU and US
  • Main US activities:
    • PPDG/GriPhyN grid projects
    • MOP
    • Virtual Data System
    • Interactive Analysis: Clarens system
  • Main EU activities:
    • EDG project
    • Integration of IMPALA with EDG middleware
    • Batch Analysis: user job submission & analysis farm

ACAT02, 24-28 june 2002, Moscow

slide27
PPDG MOP system

PPDG Developed MOP production System

Allows submission of CMS prod. Jobs from a central location, run

on remote locations, and returnresults

Relies on GDMP for replication

Globus GRAM

Condor-G and local queuing systems

for Job Scheduling

IMPALA for Job Specification

DAGMAN for management of dependencies between jobs

Being deployed in USCMS testbed

ACAT02, 24-28 june 2002, Moscow

slide28
CMS EU Grid Integration

CMS EU developed integration of production tools with EDG middleware

Allows submission of CMS production jobs using WP1 JSS from any site that has client part (UI) installed

Relies on GDMP for replication

WP1 for Job Scheduling

IMPALA for Job Specification

Being deployed in CMS

DataTAG testbed

UK, France, INFN, Russia

ACAT02, 24-28 june 2002, Moscow

cms edg production prototype
CMSIM

ORCA

BOSS

Query

Read

from track-

ing DB

Tracking DB

Job specific

information

Submission

Build

tracking

wrapper

Update

Write to tracking DB

Worker nodes

Computing Element

Job Submission Service

GRAM

Resource Broker

Finds suitable

Location for execution

Local Scheduler

Condor-G

Storage Element

Local Objy

FDDB

Local Storage

Information Services

LDAP server

Resource

information

CMS EDG Production prototype

Reference DB

has all information

needed by IMPALA

to generate a dataset

User Interface

IMPALA

Get request for a production

Create location independent jobs

ACAT02, 24-28 june 2002, Moscow

slide30
GriPhyN/PPDG VDT Prototype

= no code

= existing

= implemented using MOP

User

Planner

Executor

Compute

Resource

Storage

Resource

Concrete DAG

Abstract DAG

Concrete

Planner/

WP1

BOSS

Abstract

Planner

(IMPALA)

MOP/

DAGMan

WP1

Local

Tracking DB

CMKIN

Wrap-

per

Scripts

CMSIM

Local Grid

Storage

Etc.

ORCA/COBRA

Script

Catalog Services

Replica Mgmt

RefDB

Virtual

Data

Catalog

Materia-lized

Data

Catalog

Replica

Catalog

GDMP

Objecti-vity

Federation

Catalog

ACAT02, 24-28 june 2002, Moscow

clarens a portal to the grid
CLARENS: a Portal to the Grid
  • Grid-enabling environment for remote data analysis
  • Clarens is a simple way to implement web services on the server
  • No Globus needed on client side, only certificate
  • The server will provide a remote API to Grid tools:
    • Security services provided by the Grid (GSI)
    • The Virtual Data Toolkit: Object collection access
    • Data movement between Tier centres using GSI-FTP
    • Access to CMS analysis software (ORCA/COBRA)

ACAT02, 24-28 june 2002, Moscow

conclusions
Conclusions
  • CMS has performed large scale distributed production of Monte Carlo events
  • Baseline software is progressing and this is done now within the new LCG context
  • Grid is the enabling technology for the deployment of a distributed data analysis
  • CMS is engaged in testing and integrating grid tools in its computing environment
  • Much work to be done to be ready for a distributed data analysis at LHC startup

ACAT02, 24-28 june 2002, Moscow

ad