The PH-SFT Group. Mandate, Organization, Achievements Software Components Software Services Summary. SFT Group Mandate.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
The PH-SFT Group Mandate, Organization, Achievements Software Components Software Services Summary
SFT Group Mandate • The group develops and maintains common scientific software for the physics experiments in close collaboration with the PH experimental groups, the IT department and external HEP institutes • The majority of the group is directly involved in projects organized as part of the Applications Area of the LHC Computing Grid Project (LCG-AA) • In addition, several group members have direct responsibilities in the software projects of the LHC experiments http://sftweb.cern.ch
Activity Organization • Group activities are organized around the main products and services • Baseline projects: • Simulation: Geant4, Physics Validation and Event Generators • ROOT: ROOT Core, ROOT Analysis, PROOF • SPI: builds, tests, externals, web, Savannah • R&D projects: • Multicore: parallel frameworks, performance studies • CernVM: image production, file system • Geant: investigation of new approaches in event simulation • We meet regularly with the leaders of the core software projects of the LHC experiments • Architects Forum (AF), with representation from each of theexperiments and IT, monitors and steers regularly these projects http://lcgapp.cern.ch/project/mgmt/af.html
Applications Area Organization MB LHCC Alice Atlas CMS LHCb IT-GDIT-FIO Work plans QuarterlyReports Reviews Resources Chairs Architects Forum AA Manager Application Area Meeting Decisions LCG AA Projects ROOT POOL SIMULATION SPI WP1 WP1 WP2 Subproject1 WP2 WP2 WP3 WP3 WP1 External Collaborations Geant4 EGEE ROOT
Simulation Project • Development and validation of simulation software components. • Bringing together developers with physicists working on simulating the real LHC detectors • Geant4. Development, maintenance and support of several modules, including the geometry and a number of physics models. Infrastructure for the collaboration for testing and releasing the software. • Generator Services (Genser). Provides validated MC generators for the theoretical and experimental communities at the LHC. • Physics validation. Comparisons of the main detector simulation engines for LHC (Geant4 and FLUKA) with experimental data. • Meetings with the experiments: • Physics Validation meetings • MC Generator meetings
ROOT Project • Development and maintenance of the ROOT framework and set of foundation and utility class libraries that are used as a basis for developing HEP application codes • The main components are: • Utilities and extensions to C++, I/O system, interpreters (CINT, Python), data visualization (2D & 3D), graphical user interface, mathematical and data analysis libraries, etc. • PROOF enables distributed data sets to be analyzed in a transparent way.
Software Process & Infrastructure (SPI) Project • Provides a common software development infrastructure and delivers complete and validated set of software components to the LHC experiments • Software configuration • External libraries (~100 open-source and public domain libraries) • Software development tools and services • Nightly build, testing and integration service • Documentation tools (doxygen, opengrok) • Quality assurance (coverity, qmtest) • Collaborating tools (Savannah) • Infrastructure in general (build servers, web, etc.)
R&D • Multi-core - Parallelization of Software Frameworks to exploit multi-core Processors • Investigate current and future multi-core architectures. Measure and analyze performance of current LHC application software • Investigate solutions to parallelize current LHC physics software at application framework level and also investigate solutions to parallelize algorithms • Virtualization - Portable Analysis Environment using Virtualization Technology • Development the “CernVM” virtual appliance common to all the experiments • Deployment of a read-only distributed file system with aggressive caching schema, as well as the pilot infrastructure to serve the software installation on demand
Main Achievements 2010 • ROOT • Consolidation and speed-up of the ROOT I/O system • EVE event display for Alice, CMS and a number of non-LHC • PROOF in production in Alice, tests in ATLAS and CMS. PROOF-lite widely used on multi-core machines • Development of RooStats in collaboration with ATLAS and CMS • SIMULATION • Comparisons of 2010 MC productions and first LHC real data show an excellent agreement • Implementation of FTFP and CHIPS models have been tested by ATLAS and CMS and will be production quality in 9.4 release. • These lists give smoother response in transition region • Alternative to parameterizations for anti-protons, kaons, hyperons etc.
Main Achievements 2010 (2) • R&D • Finalized Gaudi Parallel framework (ATLAS and LHCb) • Performance instrumentation of CMSSW, Gaudi and Geant4 • CernVM virtual platform being taken up by ATLAS, LHCb and CMS • The CernVM-FS has created a lot of interest from the Tier1,2,3 sites • User Support and Training • Answering in average 40 issues/requests per working day from users • Provided 12 lecturers to 4 schools and tutorials (CSC, G4, INFN, etc.) • Supporting Experiments Software Development • Provided complete releases and a service for continuous integration and testing to ATLAS, CMS, LHCb and all AA projects • Deployment of AA software for the CERN Theory group • Identified and deployed a tool that is helping the experiments to eradicate many thousands of “defects” in their codes
Simplified Software Structure Applications are built on top of frameworks and implementing the required algorithms(e.g. simulation, reconstruction, analysis, trigger) Applications Event DetDesc. Calib. Every experiment has a framework for basic services and various specialized frameworks: event model, detector description, visualization, persistency, interactivity, simulation, calibrarion Experiment Framework Simulation DataMngmt. Distrib. Analysis Specialized domains that are common among the experiments(e.g. Geant4, COOL, Generators, etc.) Core Libraries Core libraries and services that are widely used and provide basic functionality (e.g. ROOT, HepMC,…) non-HEP specific software packages Many non-HEP libraries widely used(e.g. Xerces, GSL, Boost, etc.)
One or more implementations of each component exists for LHC Software Components • Simulation Toolkits • Event generators • Detector simulation • Statistical Analysis Tools • Histograms, N-tuples • Fitting • Interactivity and User Interfaces • GUI • Scripting • Interactive analysis • Data Visualization and Graphics • Event and Geometry displays • Distributed Applications • Parallel processing • Grid computing • Foundation Libraries • Basic types • Utility libraries • System isolation libraries • Mathematical Libraries • Special functions • Minimization, Random Numbers • Data Organization • Event Data • Event Metadata (Event collections) • Detector Description • Detector Conditions Data • Data Management Tools • Object Persistency • Data Distribution and Replication genser geant4 ROOT: core ROOT ROOT: math ROOT: gui PyROOT, CINT ROOT: geom COOL ROOT: evo ROOT: io Proof xrootd Ganga
Programming Languages • C++ used almost exclusively by all LHC Experiments • LHC experiments with an initial FORTRAN code base have completed the migration to C++ long time ago • Large common software projects in C++ are in production for many years • ROOT, Geant4, … • FORTRAN still in use mainly by the MC generators • Large developments efforts are being put for the migration to C++ (Pythia8, Herwig++, Sherpa,…) • Java is almost non-existent for LHC • Exception is the ATLAS event display ATLANTIS
Scripting Languages • Scripting has been an essential component in the HEP analysis software for the last decades • PAW macros (kumac) in the FORTRAN era • C++ interpreter (CINT) in the C++ era • Pythonis widely used by 3 out of 4 LHC experiments • Most of the statistical data analysis and final presentation is done with scripts • Interactive analysis • Rapid prototyping to test new ideas • Scripts are also used to “configure” complex C++ programs developed and used by the experiments • “Simulation” and “Reconstruction” programs with hundreds or thousands of options to configure
Role of Python • Python language is really interesting for two main reasons: • High level programming language • Simple, elegant, easy to learn language • Ideal for rapid prototyping • Used for scientific programming (www.scipy.org) • Framework to “glue” different functionalities • Any two pieces of software can be glued at runtime if they offer a “Python interface” • With PyROOT any C++ class can be easily used from Python
Standard Data Formats • HepMC- Event record written in C++ for HEP MC Generators • Many extensions from HEPEVT (the Fortran HEP common block) • Agreed between MC authors and clients (LHC exp., Geant4, …) • I/O support (ASCII files and ROOT I/O) • GDML - Geometry Description Markup Language (XML) • Low level (materials, shapes, volumes and placements) • Quite verbose to edit directly • Directly understood by Geant4 and ROOT • Standard Event Data models? • Within an experiment the Event model spans all applications • Algorithms can be easily re-used between reconstruction, high-level trigger, simulation, etc. • Sharing of Event Data models between LHC expts. has not happened • On the contrary, LCIO is a very successful Event Model (and I/O system) for the ILC community
ROOT I/O • Highly optimized (speed & size) platform independent I/O system developed for more than 10 years • Able to write/read any C++ object (event model independent) • Almost no restrictions (default constructor needed) • Make use of ‘dictionaries’ • Self-describing files • Support for automatic and complex ‘schema evolution’ • Usable without ‘user libraries’ • All the LHC experiments will rely on ROOT I/O for years to come
Data Processing Frameworks • Experiments have developed Application Frameworks • General architecture of any event processing applications (simulation, trigger, reconstruction, analysis, etc.) • To achieve coherency and to facilitate software re-use • Hide technical details to the end-user Physicists • Help the Physicists to focus on their physics algorithms • Applications are developed by customizing the Framework • By the “composition” of elemental Algorithms to form complete applications • Using third-party components wherever possible and configuring them • ALICE: AliROOT; ATLAS+LHCb: Athena/Gaudi; CMS: CMSSW
Example: GAUDI Framework • GAUDI is a mature software framework for event data processing used by several HEP experiments • ATLAS, LHCb, HARP, Fermi, Daya Bay, Minerva, BES III, LBNE/WCD • The same framework is used for all applications • All applications behave the same way (configuration, logging, control, etc.) • Re-use of ‘Services’(e.g. Det. description) • Re-use of ‘Algorithms’ (e.g. Recons -> HTL)
Software Configurations Windows MacOSX 10.6 Linux (slc4, slc5) 64 bit LHC Experiment Software LHC Experiment Software LHC Experiment Software Windows (XP) AliRoot AliRoot AliRoot Gaudi Gaudi Gaudi 32 bit LCG 60 X Mac OSX (10.5) CMSSW CMSSW CMSSW Athena Athena Athena LCG Configuration Linux (slc4, slc5) LCG / AA projects vc 7.1 icc 11 gcc 4.0 CORAL COOL POOL gcc 4.3 llvm 2.4 RELAX ROOT gcc 3.4 Java LCG / AA external software vc 9 … Xerces Python ~70 pack ages Grid Qt valgrind Boost GSL = ~ 20 different platforms
Continuous Integration &Testing Every day Different platforms Build & Test All LCG/AA projects Different Configurations Test History
Benefits • Self-consistent sets of basic software packages • Use of recent packages / tools • HEP specific patches when needed • Tested in complete configurations • Several deployment methods possible • Virtual Machine, LCG/AA binaries or recompilation • Multi platform / architecture / compiler • Continuous performance / unit / integration testing • Adding to overall software stability
Tools and Services http://sftweb.cern.ch/devtools
Coverity • Coverity is a professional, high quality tool that finds problems in C++ code by simply looking at that code (static code analysis) • It is used by several projects within or connected to PH-SFT, as well as by most of the LHC experiments,to track down bugsin code beforeanybody ever runs it.
Summary • The group is developing a number of software components mainly in the area of Simulation and data Analysis • Report problems, feature requests, special needs using the information channels in place (savannah, meetings, AF) • Good standardization in the use of tools • With the LHC experiments we have managed to keep diversity rather low while being open to evolutions and new suggestions • Using common tools and libraries reduces the effort that the experiment has to invest in the long term • Contact us if you need any advise on packages or tools
Features of an ideal Framework • Predefined component ‘vocabulary’ • E.g. ‘Algorithm’, ‘Tool’, ‘Service’, ‘Auditor’, ‘DataObject’, ‘Property’, ‘DetectorCondition’, etc • Separation from interfaces & implementation • Allowing for evolution of implementations • Plug-in based (dynamic loading) • Homogenous configuration, logging and error reporting • Built-in profiler, monitoring, utilities, etc. • Interoperable with other languages (e.g. Java, Python, etc.)
Gaudi: Principal Design Choices • Separation between “data” and “algorithms” • Three basic categories of “data” • event data, detector data, statistical data • Separation between “transient” and “persistent” representations of data • Data store-centered (“black-board”) architectural style • “User code” encapsulated in few specific places • Well defined component “interfaces” with plug-in capabilities
Gaudi: Algorithms & Transient Store Data T1 Data T1 TransientEvent Data Store Data T1 Algorithm A Data T2, T3 Data T2 Algorithm B Data T4 Data T3, T4 Algorithm C Apparent dataflow Data T5 Real dataflow Data T5
Event Input/Output Filter Decision Single Instances Algorithm Gaudi: Control Sequences • Concept of sequences of Algorithms to allow processing based on physics signature • Avoid re-calling same algorithm on same event • Different instances of the same algorithm possible • Event filtering • Avoid passing all the events through all the processing chain
Gaudi: Data On Demand • Typically the execution of Algorithms are explicitly specified by the initial sequence and and sub-sequences • Avoid too-late loading of components (HTL) • Easier to debug • For some use-cases it is necessary to trigger the execution of a given Algorithm by accessing an Object in the Transient Store • The DataOnDemand Service is can be configured to provide this functionality
Other Gaudi Services • JobOptions Service • Message Service • Particle Properties Service • Event Data Service • Histogram Service • N-tuple Service • Detector Data Service • Magnetic Field Service • Tracking Material Service • Random Number Generator • Chrono Service • (Persistency Services) • (User Interface & Visualization Services) • (Geant4 Services)
Gaudi: Configuring the Application • Each Framework component can be configured by a set of ‘properties’ (name/ value pairs) • In total thousands of parameters need to be specified to fully configure a complex HEP application • Using Python to facilitate the task • Python ”configurables” generated from C++ • Build-in type checking
Gaudi Parallel Configuration Input Event Data Reader Worker Writer Transient Event Store Transient Event Store Transient Event Store Event Input Queue Event Output Queue Algorithm Algorithm OutputStream Algorithm output gaudirun –-parallel=N optionfile.py