next g eneration of event data processing frameworks
Download
Skip this Video
Download Presentation
Next G eneration of Event Data Processing Frameworks

Loading in 2 Seconds...

play fullscreen
1 / 17

Next G eneration of Event Data Processing Frameworks - PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on

Next G eneration of Event Data Processing Frameworks . Preliminary Ideas for a New P roject P roposal. Outline. Motivation Vision More details Impact for Geant4 Project and Timeline. Motivation. For the last 40 years HEP event processing frameworks have had the same structure

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Next G eneration of Event Data Processing Frameworks ' - paloma


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
next g eneration of event data processing frameworks

Next Generation of Event Data Processing Frameworks

Preliminary Ideas for a New Project Proposal

outline
Outline

P. Mato/CERN

Motivation

Vision

More details

Impact for Geant4

Project and Timeline

motivation
Motivation

P. Mato/CERN

  • For the last 40 years HEP event processing frameworks have had the same structure
    • initialize; loop events {loop modules {…} }; finalize
    • O-O has not added anything substantial
    • It is simple, intuitive, easy to manage, scalable
  • Current frameworks designed late 1990’s
    • We know now better what is really needed
    • Unnecessary complexity impacts on performance
  • Inadequate for the many-core era
    • Multi-process, multi-threads, GPU’s, vectorization, etc.
    • The one job-per-core scales well but requires too much memory and sequential file merging
vision
Vision

P. Mato/CERN

Same framework for simulation, reconstruction, analysis, high level trigger applications

Common framework for use by any experiment

Decomposition of the processing of each event into ‘chunks’ that can executed concurrently

Ability to process several [many] events concurrently

Optimized scheduling and associated data structures

Minimize any processing requiring exclusive access to resources because it breaks concurrency

Supporting various hardware/software technologies

Facilitate the integration of existing LHC applications code (algorithmic part)

Quick delivery of running prototypes. The opportunity of the 18 months LHC shutdown

universal framework
Universal Framework

P. Mato/CERN

  • Current frameworks used by LHC experiments supports all data processing applications
    • High-level trigger, reconstruction, analysis, etc.
    • Nothing really new here
  • But, simulation applications are designed with a big ‘chunk’ in which all Geant4 processing is happening
    • We to improve the full and fast simulation using the set common services and infrastructure
    • See later the implications for Geant4
  • Running on the major platforms
    • Linux, MacOSX, Windows
common framework
Common Framework

P. Mato/CERN

  • Frameworks can be shared between experiments
    • E.g. Gaudi used by LHCb, ATLAS, HARP, MINERVA, GLAST, BES3, etc.
  • We can do better this time :-)
    • Expect to work closely with LHC experiments
    • Aim to support ATLAS and CMS at least
  • Special emphasis to requirements from:
    • New experiments
      • E.g. Linear Collider, SuperB, etc.
    • Different processing paradigms
      • E.g. fix target experiments, astroparticles
concurrent chunk processing
Concurrent ‘chunk’ processing

Processing

Input

Output

Time

P. Mato/CERN

  • Framework with the ability to schedule concurrent tasks
    • Full data dependency analysis would be required (no global data or hidden dependencies)
  • Not much gain expected with today’s designed ‘chunks’
    • See CMS estimates at CHEP’10 (*)
    • Algorithm decomposition can be influenced by the framework capabilities
  • ‘Chunks’ could be processed by different hardware/software
    • CPU, GPU, threads, process, etc.
many concurrent events
Many Concurrent Events

P. Mato/CERN

  • Need to deal with tails of sequential processing
    • See Rene’s presentation (*)
  • Introducing Pipeline processing
    • Never tried before!
    • Exclusive access to resources can be pipelinede.g. file writing
  • Need to design a verypowerful scheduler
concurrent data processing i

In Out

Processor 1

Processor 2

In Out

In Out

Processor 3

Concurrent Data Processing (i)
  • Start with a sea of ‘chunks’
  • Then combine them according to required inputs/outputs
    • Input/Outputs define dependencies => solve them
  • Organize the ‘chunks’ in queues according to their ‘state’
    • Running, Ready, Waiting, etc.

Histogramm 1

In Out

Input Module

In Out

…..

Markus Frank’s

Design ideas

concurrent data processing ii

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Worker

Task

Task

Concurrent Data Processing (ii)

Idle queue

Waiting work

Busy queue

Dataflow

Manager(Scheduler)

Worker

Worker

Worker

Worker

Markus Frank’s

Design ideas

P. Mato/CERN

  • Task: Formal workload to be given to a worker thread
  • To schedule an Task
    • acquire worker from idle queue
    • attach task to worker
    • start worker
  • Once finished
    • put worker back to idle queue
    • Task back to “sea”
    • Check work queue for rescheduling
exclusive access to resources
Exclusive Access to Resources

P. Mato/CERN

    • Resource ‘locking’ can limit strongly parallelism
  • Need to restrict/limit access to some resources
    • E.g. Database connections, file for writing, share memory for writing, etc.
  • The blocking time should be reduced to the bare-minimum
  • Best would be to have only one processing instance accessing these resources
    • ‘Only-one-writer’ rule
why the framework managing the concurrency
Why the Framework managing the concurrency?

P. Mato/CERN

  • Concrete algorithms can be parallelized with some effort
    • Making use of Threads, OpenMP, MPI, GPUs, etc.
    • But difficult to integrate them in a complete application
      • E.g. MT-G4 with Parallel Gaudi
    • Performance-wise only makes sense to parallelize the complete application and not only parts
  • Developing and validating parallel code is difficult
    • ‘Physicists’ should be saved from this
    • In any case, concurrency will limit what can be done and not in algorithmic code
  • At the Framework level you have the overall view and control of the application
impact of vision for geant4 i
Impact of Vision for Geant4 (i)

P. Mato/CERN

  • Re-engineer G4 to use the new framework
    • Make use of the common set of foundation packages (math, vectors, utility classes, etc.)
    • With this we can get an effortless integration with non G4-core functionality (visualization, I/O, configurability, interactivity, analysis, etc.)
  • Concurrent processing of sets of ‘tracks’ and ‘events’
    • Development of Rene’s ideas of ‘baskets’ of particles organized by particle type, volume shape, etc.
    • Need to develop an efficient summing (‘reduce’) of the results
    • Study reproducibility of results (random number sequence)
impact of vision for geant4 i1
Impact of Vision for Geant4 (i)

P. Mato/CERN

  • Major cleanup of obsolete physics and functionality
    • Needed in any case for a 15 years old software
  • Ability to run full and fast MC together using common infrastructure (e.g. geometry, conditions, etc.)
    • Today’s frameworks allow to run e.g. different ‘tacking algorithms’ in the same program
    • Defining clearly the input and output types
project
Project

P. Mato/CERN

  • Collaboration of CERN with FNAL, DESY and possible other Labs
    • Start with small number of people (at the beginning)
    • Open to people willing to collaborate
    • Strong interactions with ATLAS and CMS (and others)
      • E.g. Instrumentation of existing applications to provide requirements
    • Strong collaboration with Geant4 team
  • Quick delivery of running prototypes (I and II)
    • First prototype in 12 months :-)
  • Agile project management with ‘short’ cycles
    • Weekly meetings to review progress and update plans
r d activities
R&D Activities

P. Mato/CERN

  • We need to evaluate some of the existing and technologies and design partial prototypes of critical parts
    • Examples: OpenCL, impact of vectorization, transactional memory, fast scheduling algorithms, etc.
  • The idea would be to organize these R&D activities in short cycles
    • Coordinating the interested people to cover all aspects
    • Coming with conclusions (yes/no) within few months
project timeline
Project Timeline

Project definition

Today

2011

R&D, technology evaluation and design of critical parts

2012

Complete Prototype I

Initial adaptation of LHC and Geant4 applications

LHC shutdown

2013

Complete Prototype II with experience of porting LHC applications

2014

First production quality release

P. Mato/CERN

ad