1 / 21

Enabling Grid Computer for HEP

Enabling Grid Computer for HEP. Babar Team at University of Manchester Resources: www.hep.man.ac.uk/u/jamwer. Human resource strategy. * Jobs with 5 events instead Millions. Resources Strategy. Grid Test Bed.

ashleyi
Download Presentation

Enabling Grid Computer for HEP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enabling Grid Computer for HEP Babar Team at University of Manchester Resources: www.hep.man.ac.uk/u/jamwer jamwer@hep.man.ac.uk

  2. Human resource strategy * Jobs with 5 events instead Millions. jamwer@hep.man.ac.uk

  3. Resources Strategy jamwer@hep.man.ac.uk

  4. Grid Test Bed jamwer@hep.man.ac.uk

  5. jamwer@hep.man.ac.uk

  6. Software: 850 packages. Tau Datasets: range between 60 files 1GB and 150 files 1GB Total 4,000 GB ~ 10,000 files jamwer@hep.man.ac.uk

  7. Analysis Submission to Grid (Prototype) • Single command: ./easygrid dataset_name • Perform Handlers management and submission • Software based in State-machine • Verify skimdata available: • If not available perform BbkDatasetTCL to generate skimData. Each file will be a job. • Verify if there are handlers pending • If not, script generation (gera.c) with edg-job-submit and ClassAdds, and script execution. Nest for submission policy and optimisation. • If yes, verify job status. When the all jobs ended, recover results in user folder. jamwer@hep.man.ac.uk

  8. Generation and submission [jamwer@bfb babar]$ ./easygrid SP-1005-Tau11-R14 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy ......................................................... Done Creating proxy .................................................... Done Searching pre selected skimdata. Searching previous handlers. Handlers not found. Submiting to GRID . Wait end of process... jamwer@hep.man.ac.uk

  9. Job Status [jamwer@bfb babar]$ ./easygrid SP-1005-Tau11-R14 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy ............................ Done Creating proxy ............................... Done Searching pre selected skimdata. Searching previous handlers. Checking if jobs finished. ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg Current Status: Scheduled https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg still pendent. ### Handle -> https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA Current Status: Scheduled https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA still pendent. 4 jobs did not finished ! Try again later. jamwer@hep.man.ac.uk

  10. Job Status and recovery [jamwer@bfb babar]$ ./easygrid SP-1005-Tau11-R14 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy .......................................... Done Creating proxy ........................................................... Done Searching pre selected skimdata. Searching previous handlers. Checking if jobs finished. ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/foRHhWyeDBnbqA9JkDADLg Current Status: Done Exit code: 0 ### Handle -> https://lxn1188.cern.ch:9000/8DdK3xruxtevNpei3zZbaA Current Status: Done Exit code: 0 0 jobs did not finished ! Try again later. All jobs done. Recovering results in your folder. Results in the following folders: /home/jamwer/grid_sub/babar/jamwer_foRHhWyeDBnbqA9JkDADLg /home/jamwer/grid_sub/babar/jamwer_8DdK3xruxtevNpei3zZbaA jamwer@hep.man.ac.uk

  11. Monte Carlo Submission to Grid (Prototype) • Single Command: ./mcgrid JobName num_copies • Perform Handlers management and submission. • Software based in State-Machine: • Verify if there are handlers pending • If not, script generation (geramc.c) with edg-job-submit and ClassAdds for each copy, and script execution. Nest for submission policy and optimisation. • If yes, verify job status. When the all jobs ended, recover results in user folder. jamwer@hep.man.ac.uk

  12. MC Submission [jamwer@bfb mcgrid1]$ ./mcgrid MCteste 3 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy ................................. Done Creating proxy ....................................................... Done Searching previous handlers. Handlers not found. Submiting to GRID . Wait end of process... jamwer@hep.man.ac.uk

  13. Job Status [jamwer@bfb mcgrid1]$ ./mcgrid MCteste 3 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy ........................................ Done Creating proxy ....................................... Done Searching previous handlers. Checking if jobs finished. ### Handle -> https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw Current Status: Scheduled https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw still pendent. ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg Current Status: Ready https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg still pendent. ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/L5BD1OE--eckTm5RXkp2nA Current Status: Ready https://lcgrb01.gridpp.rl.ac.uk:9000/L5BD1OE--eckTm5RXkp2nA still pendent. 3 jobs did not finished ! Try again later. jamwer@hep.man.ac.uk

  14. Job status and recovery [jamwer@bfb mcgrid1]$ ./mcgrid MCteste 3 Invalid configuration filename: /opt/edg/etc/vomses Your identity: /C=UK/O=eScience/OU=Manchester/L=HEP/CN=james werner Enter GRID pass phrase for this identity: Creating temporary proxy .................................................. Done Creating proxy .................................................... Done Searching previous handlers. Checking if jobs finished. ### Handle -> https://lxn1188.cern.ch:9000/9WzceoIMEQoTK24a-UvOmw Current Status: Done Exit code: 0 ### Handle -> https://lcgrb01.gridpp.rl.ac.uk:9000/c4iCB8vioozaGteI9hybIg Current Status: Done Exit code: 0 0 jobs did not finished ! Try again later. All jobs done. Recovering results in your folder. Results in the following folders: /home/jamwer/grid_sub/mcgrid1/jamwer_9WzceoIMEQoTK24a-UvOmw /home/jamwer/grid_sub/mcgrid1/jamwer_c4iCB8vioozaGteI9hybIg /home/jamwer/grid_sub/mcgrid1/jamwer_L5BD1OE--eckTm5RXkp2nA jamwer@hep.man.ac.uk

  15. Testing Submission Script • Load Range: Worker load x #Files • 16 x 60 files = 960 jobs pendent • 16 x 150 files = 2400 jobs pendent • Test with Submission script * sslv3 alert handshake failure ** Please wait job enter the “Done” status. This never happens! Resource Broker not reliable or robust. Sometimes failure 3 days a weekor takes hours to submit/dispatch to CE (empty!). jamwer@hep.man.ac.uk

  16. Pending Infrastructure => Course of action • Babar Software Know How is not available at Manchester => Web Page & Network skills. • Quality Assurance => We are OK! from benchmark (E x P) • Real Application to perform complete cycle, acquire know how, and grid prof-of-concept is missing => Partnership with physicists • CERN does NOT recognise Babar Community => Lets reduce their priority! • RB at Manchester => 60MB binaries and policies freedom. • SE/RC at Manchester => policies and submission jobs freedom. • Mass storage (10TB) for Babar purposes => CAP! • UI in the AFS => wide access to Manchester farms. • Apprenticeship at RAL and later at SLAC – production and experiment => improve where others fail • Configuration for optimal job performance/submission at Tear 2 (1 Ce x 50 WN? Performance dCache with Babar Software? Why 10TB if Liverpool bought 80TB? Electricity bill? => analyse procedures to improveQoS and better Site Configuration • Update (software and data) and operational policies => operational standards to achieve high QoS jamwer@hep.man.ac.uk

  17. Aimed Hardware Architecture (Redundant RB with alternate access) jamwer@hep.man.ac.uk

  18. Aimed Software Architecture jamwer@hep.man.ac.uk

  19. Production Job Submission Package • Operational policies/integration with RB (application level). • Recovery of aborted status. • Resources optimisation. • Integration with RC (application level) for replicas policies development. • Interactive data visualisation (Useful?) • Integration with GridSite (Data visualisation, analysis, performance monitor, and submission) • Professional version. jamwer@hep.man.ac.uk

  20. Summary Integrate LCG2 and Job Submission with Babar/CM2 at University of Manchester for Tau Physics modelling, analysis and MC generation. We aim to be soon… • The largest site in UK. • Leader in grid computing and HEP jamwer@hep.man.ac.uk

  21. Conclusion Babar CM2 is running at Manchester! LCG2 Grid is running with real world experiment! Babar submission prototype to Grid is running ! LCG is not LHC software only! It is Babar’s. We are doing today what will take years to you to achieve. Lets work together! jamwer@hep.man.ac.uk

More Related