1 / 25

Simulation in a Distributed Computing Environment

Simulation in a Distributed Computing Environment. S. Guatelli 1 , A. Mantero 1 , P. Mendez Lorenzo 2 , J. Moscicki 2 , M.G. Pia 1 1 INFN Genova, Italy 2 CERN, Geneva, Switzerland. CHEP 2006 Mumbai, 13-17 February 2006. Speed of Monte Carlo simulation.

sherwood
Download Presentation

Simulation in a Distributed Computing Environment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Simulation in a Distributed Computing Environment S. Guatelli1, A. Mantero1, P. Mendez Lorenzo2, J. Moscicki2, M.G. Pia1 1INFN Genova, Italy 2CERN, Geneva, Switzerland CHEP 2006 Mumbai, 13-17 February 2006

  2. Speed of Monte Carlo simulation Speed of execution is often a concern in Monte Carlo simulation Often a trade-off between precision of the simulation and speed of execution Typical use cases • Semi-interactive response • Detector design • Optimisation • Oncological radiotherapy • Very long execution time • High statistics simulation • High precision simulation Fast simulation Variance reduction techniques (event biasing) Inverse Monte Carlo methods Parallelisation Methods for faster simulation response

  3. Features of this study • Geant4 application in a distributed computing environment • Architecture • Implications on simulation applications • Environments • PC farm • GRID • Two use cases: Geant4 Advanced Examples • semi-interactive response(brachytherapy) • high statistics(medical_linac) • By-product: results for Geant4 medical application(technology transfer) • Quantitative study • results to be submitted for publication

  4. Requirements • Transparent execution in sequential/parallel mode • Transparent execution on a PC farm and on the Grid Architectural requirements Semi-interactive simulation High statistics simulation • Geant4 medical_linac • Execution time for 109 events: ~10 days • Goal: execution time ~ few hours • Geant4 brachytherapy • Execution time for 20 M events: 5 hours • Goal: execution time ~ few minutes Reference: sequential mode on a Pentium IV, 3 GHz

  5. Parallel mode: local cluster / GRID • Both applications have the same computing model • a job consists of a number of independent tasks which may be executed in parallel • result of each task is a small data packet (few kb), which is merged as the job runs • In a cluster • computing resources are used for parallel execution • user connects to a possibly remote cluster • input data for the job must be available on the site • typically there is a shared file system and a queuing system • network is fast • GRID computing uses resources from multiple computing centres • typically there is no shared file system • (parts of) input data must be replicated in remote sites • network connection is slower than within a cluster

  6. Overview • Architectural issues • DIANE • How to dianize a Geant4 application • Performance tests • On a single CPU • On clusters • On the GRID • Conclusions • Lessons learned • Outlook Quantitative, documented results Publicly distributed: DIANE Geant4 application code

  7. Hide complex details of underlying technology prototypefor an intermediate layer between applications and the GRID http://cern.ch/DIANE DIANE Developed by J. Moscicki, CERN/IT • R&D project • started in 2001 in CERN/IT with very limited resources • collaboration with Geant4 groups at CERN, INFN, ESA • succesful prototypes running on LSF and EDG Master-Worker architectural pattern Parallel cluster processing • make fine tuning and customisation easy • transparently using GRID technology • application independent

  8. Practical example: Geant4 simulation with analysis • Each task produces a file with histograms • The job result is the sum of histograms produced by tasks • Master-worker model • client starts a job • workers perform tasks and produce histograms • master integrates the results • Distributed Processing for Geant4 Applications • task = N events • job = M tasks • tasks may be executed in parallel • tasks produce histograms/ntuples • task output is automatically combined (add histograms, append ntuples) • Master-Worker Model • Master steers the execution of job, automatically splits the job and merges the results • Worker initializes the Geant4 application and executes macros • Client gets the results

  9. UML Deployment Diagram for Geant4 applications simulation with DIANE • Completely transparent to the user: same Geant4 application code • G4Simulation class is responsible of managing the simulation • manage random number seeds • Geant4 initialisation • macros to be executed in batch mode • termination

  10. Development costs • Strategy to minimise the cost of migrating a Geant4 simulation to a distributed environment • DIANE Active Workflow framework • provides automatic communication/synchronization mechanisms • application is “glued” to the framework using a small Python module • in most cases no code changes to the original application are required • load balancing and error recovery policies may be plugged in form of simple python functions • Transparent adaptation for Clusters/GRIDs, shared/local file systems, shared/private queues • Development/modification of application code • original source code unmodified • addition of an interface class which binds together application and M-W framework The application developer is shielded from the complexity of underlying technology via DIANE

  11. Test results Performance of the execution of the dianized Brachytherapy example • Test on a single CPU • Test on a dedicated farm (60 CPUs) • Test on a farm shared with other users (LSF, CERN) • Test on the GRID (LCG) Tools and libraries: Simulation toolkit: Geant4 7.0.p01 Analysis tools: AIDA 3.2.1 and PI 1.3.3 DIANE: DIANE 1.4.2 CLHEP: 1.9.1.2 G4EMLOW 2.3

  12. Overhead at initialisation/termination • Test on a single dedicated CPU (Intel ®, Pentium IV, 3.00 GHz) • Study execution via DIANE w.r.t. sequential execution • run 1 event Overhead: ~ 5 s, negligible in a high statistics job

  13. Overhead due to DIANE • Test on a single dedicated CPU (Intel ®, Pentium IV, 3.00 GHz) • Study execution via DIANE w.r.t. sequential execution Execution time vs. number of events in the job The overhead of DIANE is negligible in high statistics jobs Ratio = with respect to the number of events

  14. Farm: execution time and efficiency • Dedicated farm : 30 identical bi-processors (Pentium IV, 3 GHz) • Thanks to Regional Operation Centre (ROC) Team, Taiwan • Thanks to Hurng-Chun Lee (Academia Sinica Grid Computing Center, Taiwan) • Load balancing: optimisation of the number of tasks and workers

  15. Worker number Worker number Time (seconds) Time (seconds) Optimizing the number of tasks • The job ends when all the tasks are executed in the workers • If the job is split into a higher number of tasks, the chance that the workers finish the tasks at the same time is a higher • Note: the overall time of the job is determined by the last worker to finish the last task Example of a job that can be improved from a performance point of view Example of a good job balancing

  16. Farm shared with other users Real-life case: farm shared with other users Execution in parallel mode on 5 workers of CERN LSF DIANE used as intermediate layer Preliminary! The load of the cluster changes quickly in time The conditions of the test are not reproducible Highly variable performance

  17. Parallel execution in a PC farm • Required production of Brachytherapy: 20 M events • 20 M events in sequential mode : 16646 s (~ 4h and 38 minutes) on a a Intel ®, Pentium IV, 3.00 GHz • The same simulation runs in 5 minutes in parallel on 56 CPUs • appropriate for clinical usage • Similar results for Geant4 medical_linac Advanced Example • production can become compatible with usage for the verification of IMRT treatment planning • sequential execution requires ~ 10 days to obtain significant results

  18. Running on the Grid (LCG) • G4Brachy executed on the GRID (LCG) • nodes located in Spain, Russia, Italy, Germany, Switzerland Conditions of the test • The load of the GRID changes quickly in time • The conditions of the test are not reproducible Efficiency • The evaluation of the efficiency with the same criterion as in a dedicated farm does not make much sense in this context • Study the “efficiency” of DIANE as automated job management w.r.t. manual submission through simple scripts

  19. Test results Execution on the GRID through DIANE, 20 M events,180 tasks, 30 workers Execution on the GRID, without DIANE Worker number Worker number Time (seconds) Time (seconds) Through DIANE: - All the tasks are executed successfully on 22 workers - Not all the workers are initialized and used: on-going investigation Without DIANE: - 2 jobs not successfully executed due to set-up problems of the workers

  20. Execution time of Brachytherapy in two different conditions of the GRID DIANE used as intermediate layer How the GRID load changes Worker number Worker number Time (seconds) Time (seconds) 20 M events, 60 workers initialized, 360 tasks Very different result!

  21. Farm/GRID execution Brachy, 20 M events, 180 tasks • Taipei cluster: • 29 machines, 734 s ~ 12 minutes • GRID: • 27 machines, 1517 s ~ 25 minutes Preliminary indication The conditions are not reproducible

  22. Lessons learned • DIANE as intermediate layer • Transparency • Good separation of the subsystems • Good management of CPU resources • Negligible overhead • Load balancing • A relatively large number of tasks increases the efficiency of parallel execution in a farm • Trade-off between optimisation of task splitting and overhead introduced • Controlled and real life situation is quite different in a farm • need dedicated farm for critical usage (i.e. hospital) • Grid • highly variable environment • not mature yet for critical usage • automated management through a smart system is mandatory • work in progress, details still to be understood quantitatively

  23. Outlook • Work in progress • A quantitative analysis of the all the performance results is still on-going • Generalize job splitting optimization • Better characterize the performance on the Grid quantitatively • Improve DIANE • To be submitted for publication in IEEE Trans. Nucl. Sci.

  24. Conclusions • General approach to the execution of Geant4 simulation in a distributed computing environment • transparent sequential/parallel application • transparent execution on a local farm or on the Grid • user code is the same • Quantitative, documented results • reference for users and for further improvement • on-going work to understand details • Acknowledgments to: • M. Lamanna (CERN), Hurng-Chun Lee (ASGC, Taiwan), L. Moneta (CERN), A. Pfeiffer (CERN) • the LCG teams at CERN and the Regional Operation Centre Team of Taiwan • no support from INFN GRID team

  25. IEEE Transactions on Nuclear Sciencehttp://ieeexplore.ieee.org/xpl/RecentIssue.jsp?puNumber=23 • Prime journal on technology in particle/nuclear physics • Review process reorganized about one year ago • Associate Editor dedicated to computing papers • Various papers associated to CHEP 2004 published on IEEE TNS Papers associated to CHEP 2006 are welcome Manuscript submission:http://tns-ieee.manuscriptcentral.com/ Papers submitted for publication will be subject to the regular review process Publications on refereed journals are beneficial not only to authors, but to the whole community of computing-oriented physicists Our “hardware colleagues” have better established publication habits… Further info: Maria.Grazia.Pia@cern.ch

More Related