1 / 16

AliEn v2-20

AliEn v2-20. A. Abramyan , L. Betev , D. Goyal , A. Grigoras , C. Grigoras , M. Litmaath , N . Manukyan , M. Martinez, J . Porter, P. Saiz, S. Sankar , S. Schreiner. Content. New features on v2.20 TaskQueue Catalogue Service communication Deployment Summary.

lilli
Download Presentation

AliEn v2-20

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AliEn v2-20 A. Abramyan, L. Betev, D. Goyal, A. Grigoras, C. Grigoras, M. Litmaath, N. Manukyan, M. Martinez, J. Porter, P. Saiz, S. Sankar, S. Schreiner

  2. Content • New features on v2.20 • TaskQueue • Catalogue • Service communication • Deployment • Summary

  3. Database Layout • Single DB • Innodb tables • Row locking • Foreign keys • Transactions • not used… • Lookup tables • 2 JDLs per job • JDL fields mapped to columns • Link to full graph

  4. Brokering • Avoid Classadmatching • Less fields to parse • Match in a single SQL statement. • Two attempts at matching: • With packages already installed • With any packages • (Add a third attempt with remote data??)

  5. File brokering Current schema Submit 4 jobs: File1 File 4 File2 File3 File 5 Broker per file Submit 3 empty subjobs If nothing left, just exit File1,2,4,5 When a job starts, analyze as much as possible File 3

  6. More TaskQueue • MaxWaitingTime: amount of time that job can stay in ‘WAITING’ • If time exceeded, job ends up in error • New state: ERROR_EW (Expired Waiting) • Retrial: • Number of times that a single job can be resubmitted • Resubmission done by central services • Reusing JobId in resubmission • Direct removal of KILLED jobs

  7. Some results… • DB time to insert a job, and 8 change status: Time to process all 230M ALICE jobs: 4.8 days

  8. Service communication • Replacing SOAP with JSON • Less overhead (no XML encoding) • Easier to interact with other clients • And even from a web browser • Backward incompatible change 

  9. SOAP vs JSON • Apache web server • 32 hosts for clients • 16 cores • 8000 calls per client

  10. Catalogue • Innodb tables • Row locking • Transactions • Foreign keys

  11. Deployment • All the features already deployed on ALICE_TEST • Instead of one single big-bang release, divide it in three: • TaskQueue • JSON • Catalogue • Reduces amount of downtime, • Increases complexity of deployment…

  12. Central Services 80 sites AliEn v2-19.(80-163) 80 sites Central Services 8 machines AliEn v2-19** 8 machines vobox catalogue aliensh Api TaskQueue Transfers Api Api ROOT LDAP Api BACKUP JA 12 machines AliEn v2-19**, v2-17 12 machines 3 machines (+1 slave, backups) 3 machines (+1 slave, backups) AliEn v2-17 40.000 wn AliEn v2-19.(80-163) 40.000 wn

  13. Deployment of TaskQueue • Only needed on the central services • Database migration of 1 hour (24 GB) • Already done! • Monday, 1st Oct • Downtime of 12 hours • Method: • Install new version • Stop services • Convert DB • Start services

  14. Deployment of JSON • Full deployment • Once Central Services updated, old installation won’t be able to connect • No database migration • Plan: • Install new version everywhere • Stop all services • Restart everything with new version • When: • ?

  15. Deployment of catalogue • Only needed on central services • Very delicate operation • Database migration of 24 hours • 430 GB, 290 big tables • Plan: • Prepare a hybrid version • Install v2-20 and hybrid • Restart services with hybrid • Convert DB • Restart services with v2-20 • When:?

  16. Summary • Parts of AliEn v2.20 already deployed! • TaskQueue speed improved drastically • 40 times insertion rate • 20 times resubmission time • Improved concurrency • Need to schedule 2 more upgrades • JSON: Improve service communication • New catalogue layout

More Related