1 / 28

How to port an application on GRID: available tools and tricks of the trade

How to port an application on GRID: available tools and tricks of the trade. Patricia M é ndez Lorenzo CERN (IT-PSS/ED) Trieste, 10th February 2006 ICTP/INFM-Democritos Workshop on Porting Scientific Applications on Computational GRIDs. Outlook.

Download Presentation

How to port an application on GRID: available tools and tricks of the trade

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to port an application on GRID: available tools and tricks of the trade Patricia Méndez Lorenzo CERN (IT-PSS/ED) Trieste, 10th February 2006 ICTP/INFM-Democritos Workshop on Porting Scientific Applications on Computational GRIDs

  2. Outlook ◘We will see now two examples of new gridifications inside LCG/EGEE ➸Geant4 ➸UNOSAT ◘I would like to emphasize that they are CERN partner communities ➸How they were known by the LCG is therefore obvious ➸But the gridification procedure is the same for any other community Trieste, 10th February 2006 Patricia Méndez Lorenzo

  3. The Geant4 Toolkit Generic Toolkit for Monte Carlo simulation of particle interactions with the matter (i.e. detectors) ◘ Application domains: ➙ High-Energy Physics:ATLAS, CMS and LHCb (LHC), BaBar (SLAC), etc ➙ Space Radiation:ESA ➙ Medical Physics:Proton and brachy therapies, etc ◘ Object-Oriented (C++) project, modular and extensible.Significant improved with respect its predecessor, Geant3, not only from the software structure, but mainly for the physics coverage ◘ Electromagnetic physics of Geant4 and even more Hadronic physics are complex fields.It is fundamental to test their models covering the widest possible range of particles, materials and energies Here appears the Grid Contribution Trieste, 10th February 2006 Patricia Méndez Lorenzo

  4. Geant4 Toolkit and the GRID Environment ◘Electromagnetic and Hadronic physicsare fundamental features to be properly simulated in High-Energy Physics and medical applications. However they areextremely CPU demanding ▪Number of events and energy depending: 1 event of 1 GeV ~ 0.03 sec (2.4 GHz) 1 event of 300 GeV ~ 9-10 sec Geant4 wants to use the LCG environment to validate the software they provide to their users twice per year ●Two large productions per year ◘ Goal during the software validation:Comparison some shower observables between the two different Geant4 versions and check statistical significant changes ●Small productions (some few thousands of jobs) during the whole year Trieste, 10th February 2006 Patricia Méndez Lorenzo

  5. Geant4 Toolkit and the Grid Environment ◘Geant4 validates its software through a wide range of different parameters: ➸7 simplified detectors ► FeSci, CuSci, PbSci, CuLAr, PbLAr, WLAr, PbW04 ➸8 different particles ► e-, pi+, pi-, k+, k-, k0L, p, n ➸23 different beam particles (GeV) ► 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 20, 30, 40, 50, 60, 80, 100, 120, 150, 180, 200, 250, 300, 1000 ➸5 physics list ► LHEP, QGSP, QGSC, QGSP_BIC, QGSP_BERT The combinations of all these parameters define the possible scenarios to check the software Trieste, 10th February 2006 Patricia Méndez Lorenzo

  6. Geant4 Toolkit and the GRID Environment ◘ Geant4 is an international project but the management is based at CERN ➸It is very well known by LCG ➸ It is a “lower level tool” for many HEP experiments ➸A good validated Geant4 software will assist during the experiment productions ◘ Its software is stable, feasible and very well tested in many platforms and systems ➸ This is attractive for LCG ➸ We would like to put it inside our own software tests ◘ They always arrive with a lack of time ➸ They produce the whole tar file 2 weeks before the release, so we have 2 weeks to perform the whole production ◘ We have to provide them with tools ➸ To perform fast and reliable productions ➸ Automatic job submission ➸ Easy monitoring tools Trieste, 10th February 2006 Patricia Méndez Lorenzo

  7. Geant4 History History: ◘ Geant4 contacts LCG in December 2004 ◘ At that moment, a support person was assigned to the group ➸To teach them the project ➸To run the productions with them ➸To be the contact person with the deployment team and with the sites ◘ Between the 2 yearly productions, we have to keep on working with them ➸To improve the tools and the software provided to them ►The tools for the LCG implementation have been developed and provided by us ➸To fully involve them in the LCG infrastructure ►Make them become a new VO Trieste, 10th February 2006 Patricia Méndez Lorenzo

  8. Geant4 History Geant4 has already ran 3 productions with us ◘ First Production ➸ During the event production phase, 5635 jobs had to be run for each Geant4 version: 11270 jobs in total ➸ Finally the statistical test suite was used to compare parallel Geant4 outputs from each version (this part already outside the LCG resources) ◘ Second Production ➸ During this phase 6440 jobs had to be run ➸ This time each job contained the event production for each Geant4 version and the statistical test suite ➸ In just one job the whole production and analysis was done ◘ Third Production ➸ Same strategy as in the 2nd production Trieste, 10th February 2006 Patricia Méndez Lorenzo

  9. Stages of the Geant4 Production 1. Software installation: Installation of the Geant4 packages (with all the required external additional packages) ➸ Software provided via a tar file (copied and register in the GRID) ➸ Installation performed using GRID jobs ➸ The installation is validated through a small production ➸ After this a tag is published inside the Information System ➸Fundamental request for the sites: Shared area between WNs and perfectly definition of the software installation region ( I go to this step immediately) 2. Events production: ➸Jobs sent in bunches of 1288 each defined by each physics list ➸ 5000 events were produce per job 3. Analysis (Performed outside GRID): ➸ Statistical tests to perform the comparison between the two Geant4 versions. Trieste, 10th February 2006 Patricia Méndez Lorenzo

  10. Why a shared area? ◘ Huge packages with the specific experiment software needed ➸Executables, entry data, compilers, external libraries, etc… ◘ It is not effective to bring the whole experiment software with each job ➸And this for each user of the VO ➸Solution: Pre-installation of the specific experiment software before each production ◘ So what the production VO managers do: ➸Pre-install the software in the site, in just one WN, through Grid jobs ➸ In certain regions visible from the WN (do not care where) ➸ Region mounted normally via NFS (shared among all WN) ➸ Access to it from the WN through a env variable ➸ Only sgm persons are allowed to do it Trieste, 10th February 2006 Patricia Méndez Lorenzo

  11. Strategy inside LCG ◘ VO Configuration ➸1st Production: dteam (6 certificates, one as dteamsgm) ►This is a problem for the sites because of the SFT tests ► These are tests performed by the LCG team to test the sites ► Normally sites dedicate 1 CPU for this test. If we take it for other purposes they will not have a good result ➸2nd Production: alice (2 certificates, one as alicesgm) ►We cannot count on it anymore, Alice and the rest of experiments are under full production ➸3nd Production: geant4 (2 certificates, one as geantsgm) ►First production with their own VO ◘Resources ➸1st Production: Own RB+BDII+UI at CERN ➸2nd and 3rd Productions: lxplus resources and 2 RBs ◘Outputs ➸1st Production: about 30 GB stored at CERN (lxn1183) ➸2nd and 3rd Productions: comparable quantity stored at CERN Trieste, 10th February 2006 Patricia Méndez Lorenzo

  12. Geant4 requirements ◘ Each production takes about 3-4 years of CPU time ◘ Very small output for the whole production: 15-20GB in total (fully retrieved to CERN for analysis) ◘ As explain before, it is a CERN community ➸CERN also supported them as site ◘ We ask (as support), we provide (as CERN) ➸Access to UI (provided at CERN) ➸ VO = Geant4 (provided at CERN, but sites should recognize it) ➸ RBs access (provided at CERN) ➸ CE (dedicated long queues at each site) ➸ SE (provided at CERN) ➸ Software area (2GB at each site) ➸ Access to the LFC catalog (centralized at CERN) Trieste, 10th February 2006 Patricia Méndez Lorenzo

  13. Tools developed for new Gridifications Generation of a general framework consisting of 2 major tools: 1. Tool to perform the automatic job submission 2. Tool to retrieve and handle the corresponding output 1. Automatic job submission ◘ Given an user’s jdl this tool performs the following actions: ➸It lists all sites able to run the jdl provided by the user ➸ It creates automatically a jdl file based on that provided by the user ➸ It submits the just created jdl containing the user application(s) ➸ Moreover it creates a subdirectory (defined by the user) containing a list of the sites where the jobs have been submitted, the corresponding jdls and the jobs IDs Trieste, 10th February 2006 Patricia Méndez Lorenzo

  14. Tools developed for new Gridifications ◘ Additional Features: ➸ The user can define the queues where the jobs are submitted. These queues are checked to see whether it fixes the job requirements. ➸ Requested LFN files can be included. The corresponding TURLs are searched and included in a file passed in the InputSandbox to the WN ◘ Applications ➸This tool has been used for the 1st and the 2nd phases of the production: software installation and production ◘ Usage: ./submitter_general -vo geant4 -jdl jdlexample -jobfile G4_PROD -data /grid/geant4/production_software Mandatory Give a jdl example Mandatory Mandatory. It stores the created jdl, the job Ids, the list of used CEs Not mandatory. Just in the case this LFN is required Trieste, 10th February 2006 Patricia Méndez Lorenzo

  15. Tools developed for new Gridifications 2. Retrieve and handle of the outputs ➸ The 2nd tool checks the status of the jobs from the job IDs included in the directory given by the user ◘ Usage: ./get_output -jobfile G4_PROD -dest G4_PROD/outputs ◘ Output The job run in ramses.dcic.ups.es:2119/jobmanager-torque-dteam is in status: Scheduled The job run in grid01.phy.ncu.edu.tw:2119/jobmanager-torque-dteam is in status: running The job run in scaic10.scai.frauhofer.de:2119/jobmanager-torque-dteam is in status: over Mandatory. Where to put the output Mandatory. Directory holding the jobs to monitor Trieste, 10th February 2006 Patricia Méndez Lorenzo

  16. Tools developed for new Gridifications ◘ Additional Features: ➸ It is possible to visualize the outputs on the web ➸ A html report is provided showing the files decided by the user Trieste, 10th February 2006 Patricia Méndez Lorenzo

  17. Results and Discussion ◘ 1st Production (as dteam) ➸We were learning to involve Geant4 inside LCG ➸ The software was successfully installed in 28 sites ➸ Efficiency around 70% ◘ 2nd Production (as alice) ➸The software was successfully installed in 35 sites ➸ Efficiency around 70% ◘ 3rd Production (as geant4) ➸The software was successfully installed in 5 sites ➸ Efficiency 99% ➸ At this moment we have already 11 sites and OSG getting involved Trieste, 10th February 2006 Patricia Méndez Lorenzo

  18. Results and Discussion Strange Results?... Not really Main problems: ◘ Sites with not shared area or even not mounted ➸It is not required for dteam (during the 1st production was the largest problem) ➸We have forced the sites to include this region ◘ Instable sites ➸It is difficult to have under control 28 or 35 sites ➸During the 3rd production (5sites), we assisted sites to setup the VO Geant4 and the contact with them was great ◘ Lack of time ➸We have a short period of time ➸ Resubmissions no possible ➸ A good follow-up of the sites not possible Trieste, 10th February 2006 Patricia Méndez Lorenzo

  19. CE WN WN WN Next Production with DIANE ◘ Resource optimization layer which exploits a pull model via direct communication channel between Master and Workers ◘ Implemented for the next Geant4 production SITE SLAVES MASTER ◘ User runs the MASTER on his PC ◘ MASTER submits slaves NOT the jobs ➸slaves are normal GRID jobs ◘ Slaves begin to pull jobs from the MASTER Trieste, 10th February 2006 Patricia Méndez Lorenzo

  20. DIANE: EXAMPLE Trieste, 10th February 2006 Patricia Méndez Lorenzo

  21. UNOSAT Satellite imagery based web mapping service ◘ Objectives ➸Easy access to quality geoinformation service ➸ Organize the demand for geoinformation ➸ Ensure cost-effective and timely products ◘ Core Services ➸ Humanitarian Mapping ➸ Image Processing VEGETATION – 1 Km IKONOS – 1m Trieste, 10th February 2006 Patricia Méndez Lorenzo

  22. UNOSAT i j k Ground station Data suppliers USER WWW UNOSAT Central Unit Trieste, 10th February 2006 Patricia Méndez Lorenzo

  23. Relief Projects of UNOSAT ◘ Case Study: Indian Ocean Tsunami Relief and Development ◘ 29th Dec 2004: First Map distributed online to field users ➸14th Jan 2005: Imagery Bank online: ► 100 Tsunami-related maps (pre and post) ► 670 raw satellite images ➸January: 200,000 tsunami maps downloaded in total ◘ UNOSAT has a huge amount of data to stored ◘ CERN has provided a good amount of space for this aim ◘ From Summer 2005 the collaboration with GRID began ◘ Running and storing data in LCG/EGEE can certainly assist UNOSAT in their purposes Trieste, 10th February 2006 Patricia Méndez Lorenzo

  24. First step: UNOSAT and CERN ◘ UNOSAT is CERN partner since 2002 ◘ CERN supports them with network facilities, with computer infrastructure and with human (support) resources ◘ Asian Tsunami Example: ➸Central Web Services at CERN under considerable strain ➸Solution quickly found by CERN’s Internet Services Group ➸Result: UNOSAT data remained available continuously ◘ UNOSAT provides the users with a web interface able to find the files of the images by clicking on the earth images Attractive, easy.... ◘ Something similar to do with the GRID Deal with certificates, but possible Trieste, 10th February 2006 Patricia Méndez Lorenzo

  25. One step further: GRID ◘ Potential Bottlenecks: ➸Limited capacity and processing power ➸ Multiple satellites being launched ➸ Grid can help? ◘ In summer 2005 we have provided a whole structure at CERN for UNOSAT ➸UNOSAT Virtual Organization (VO) ➸ 3.5TB in CASTOR ➸ Computing Elements, Resource Brokers ➸ Collaboration with ARDA group ➸ AFS area of 5GB ◘ We have run some UNOSAT tests (images compression) inside the GRID environment (quite successful) ◘ The framework developed for Geant4 has been adapted for UNOSAT needs We have provided The whole GRID infrastructure At CERN Trieste, 10th February 2006 Patricia Méndez Lorenzo

  26. A GRID Metadata Catalogue ◘ LFC Catalogue ➸Mapping of LFN to PFN ◘ UNOSAT requires ➸User will give as input data certain coordinates ➸As output, he wants the PFN for downloading ◘ The ARDA Group assists us setting up the AMGA tool for UNOSAT Oracle DB CASTOR ARDA APP Metadata (x,y,z) SRM PFN LFN LFC Trieste, 10th February 2006 Patricia Méndez Lorenzo

  27. Future Plans with UNOSAT ◘ Collaboration between UNOSAT, ARDA and GD ➸1(2) ARDA and 2 UNOSAT Students ➸ Still many discussions needed ➸Support from other sites foreseen ?? (x,y,z) GRID WORLD ◘ User can get the info in his laptop too ◘ Fundamental AMGA Application Trieste, 10th February 2006 Patricia Méndez Lorenzo

  28. Summary ◘ The Support team is ready to assist projects besides the HEP communities to be involved in GRID ➸We have different applications and frameworks ready to give such support ◘ Two different communities are already fully involved in the environment ➸Geant4 and UNOSAT using ARDA applications normally used by huge HEP experiments ◘ Together with these communities we are gaining confidence in the procedure ➸Now we can say we have a structure ready to do it Trieste, 10th February 2006 Patricia Méndez Lorenzo

More Related