1 / 16

Scalability Tests With CMS, Boss and R-GMA

Scalability Tests With CMS, Boss and R-GMA. Boss Overview. CMS jobs are submitted via a UI node for data analysis. Individual jobs are wrapped in a boss executable. Jobs are then delegated to a local batch farm managed by (for example) PBS.

gray-burks
Download Presentation

Scalability Tests With CMS, Boss and R-GMA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scalability Tests With CMS, Boss and R-GMA EDG WP3 Meeting 14/01/04 – 16/01/04

  2. Boss Overview • CMS jobs are submitted via a UI node for data analysis. • Individual jobs are wrapped in a boss executable. • Jobs are then delegated to a local batch farm managed by (for example) PBS. • When a job is executed, the boss wrapper spawns a separate process to catch input, output and error streams. • Job status is then redirected to a local DB. EDG WP3 Meeting 14/01/04 – 16/01/04

  3. CMS EDG CE CE CE parameters CMS software CMS software CMS software Job output filtering Runtime monitoring SE JDL data registration Push data or info WN Pull info Boss RefDB BOSS DB Workload Management System SE UI IMPALA/BOSS input data location SE CE Replica Manager SE EDG WP3 Meeting 14/01/04 – 16/01/04

  4. Where R-GMA Fits In • Boss designed for use within a local batch farm. • If a job is scheduled on a remote compute element, status info needs to be sent back the submitter’s site. • Within a grid environment we want to collate job info from potentially many different farms. • Job status info should be filtered depending on where the user is located – the use of predicates would therefore be ideal. • R-GMA fits in nicely. • Boss wrapper makes use of the R-GMA API. • Job status updates are then published using a stream producer. • An archiver positioned at each UI node can then scoop up relevant job info and dump it into the locally running boss db. • Users can then access job status info from the UI node. EDG WP3 Meeting 14/01/04 – 16/01/04

  5. BOSS wrapper Job Tee OutFile R-GMA API Use of R-GMA in BOSS Sandbox UI IMPALA/BOSS WN BOSS DB Receiver servlets CE/GK servlets Registry Job EDG WP3 Meeting 14/01/04 – 16/01/04

  6. Test Motivation • Want to ensure R-GMA can cope with volume of expected traffic and is scalable. • CMS production load estimated at around 2000 jobs. • Initial tests with v3-3-28 only managed about 400 - could do better . EDG WP3 Meeting 14/01/04 – 16/01/04

  7. Test Design • A simulation of the CMS production system was created. • An MC simulation was designed to represents a typical job. • Each job creates a stream producer. • Each job publishes a number of tuples depending on the job phase. • Each job contains 3 phases with varying time delays. • An Archiver scoops up published tuples. • The Archiver db used is a representation of the boss db. • Archived tuples are compared with published tuples to verify the test outcome. EDG WP3 Meeting 14/01/04 – 16/01/04

  8. Topology SP Mon Box Archiver Mon Box Archiver Client Boss DB IC MC Sim Test verification Test Output EDG WP3 Meeting 14/01/04 – 16/01/04

  9. Topology Archiver Mon Box SP Mon Boxes Archiver Client IC Boss DB MC Sims Test verification Test Output EDG WP3 Meeting 14/01/04 – 16/01/04

  10. Test Setup • Archiver & SP mon box setup at Imperial. • SP mon box & IC setup at Brunel. • Archiver and MC sim clients positioned at various nodes within both sites. • Tried 1 MC sim and Archiver with variable Job submissions. • Also setup similar test on WP3 testbed using 2 MC sims and 1 Archiver. EDG WP3 Meeting 14/01/04 – 16/01/04

  11. Results • 1 MC sim creating 2000 jobs and publishing 7600 tuples proven to work without clitch. • Demonstrated 2 MC sims each running 4000 jobs (with 15200 published tuples) on the WP3 testbed. EDG WP3 Meeting 14/01/04 – 16/01/04

  12. Pitfalls Encountered • Lots of fun integration problems. • Firewall access between imperial and Brunel initially blocked for streaming data (port 8088). • Limitation on number of open streaming sockets – 1K. • Discovered lots of OutOfMemoryErrors. • Various configurations problems at both imperial and Brunel sites. • Caused many test runs to fail. • Probably explained poor initial performance. EDG WP3 Meeting 14/01/04 – 16/01/04

  13. Weaknesses • Test is time consuming to set-up. • MC and Archiver are are manually started. • Analysis bash script takes forever to complete. • Requires continual attention while running. • To ensure the MC simulation is working correctly. • Test takes considerable time for large number of jobs (ie > 1000). • Need to run multiple MC sims. • To generate more realistic load. • How to we account for ‘other’ R-GMA traffic? EDG WP3 Meeting 14/01/04 – 16/01/04

  14. Improving the Design • Need to automate! • Improvements made so far. • CMS test comes as part of the ‘performance’ R-GMA RPM. • MC simulation can be configured to occur repeatedly with random job ids. • Rather than monitoring the STDOUT from the MC Sim, test results are published to an R-GMA table. • Analysis script now implemented in java; Verification of data now takes seconds rather than hours! • CMS test can now be used as part of the WP3 testing framework and is ready for Nagios integration. EDG WP3 Meeting 14/01/04 – 16/01/04

  15. Measuring Performance • Need a nifty way of measuring performance! • Things to measure. • Average time taken for a tuple published via the MC simulation to arrive at the Archiver. • The tuple throughput of the CMS test (eg how do we accurately measure the rate at which tuples are archived). • Would be nice to measure the number of jobs each mon box can carry before hitting memory limits. • Need to define a hardware spec that satisfies a level of performance. EDG WP3 Meeting 14/01/04 – 16/01/04

  16. Summary • After initial configuration problems tests were successful - . • But scalability of test is largely dependent on the specs of the Stream Producer/Archiver Mon box. • Possible to keep increasing number of submitted Jobs but will eventually hit an upper memory limit. • Need more accurate ways to record performance particularly for data and time related measurements. • Need to pin down exact performance requirements! EDG WP3 Meeting 14/01/04 – 16/01/04

More Related