1 / 17

Garuda: A Cloud-based Job Scheduler

Garuda: A Cloud-based Job Scheduler. Ashish Patro MinJae Hwang Thanumalayan S. Thawan Kooburat. Agenda. Overview Job Scheduler Characteristics Scheduling Prototypes Performance Data. Introduction. Key Idea Centralized job scheduler in the Cloud Benefit and Motivation

finola
Download Presentation

Garuda: A Cloud-based Job Scheduler

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Garuda: A Cloud-based Job Scheduler Ashish Patro MinJae Hwang Thanumalayan S. Thawan Kooburat

  2. Agenda • Overview • Job Scheduler Characteristics • Scheduling Prototypes • Performance Data

  3. Introduction • Key Idea • Centralized job scheduler in the Cloud • Benefit and Motivation • Simplify deployment and maintenance • Deploy only worker daemon • Scalability of the Cloud • Infinite scalability and pay-as-you go model • Simplify system design • Reliable services

  4. Overview

  5. Platform Choice • Amazon – EC2 • On-demand VMs • Google App Engine • Web application hosting platform • Reliable and scalable storage: DataStore • Automatic load balancing • Other services: • Memcached, Instant Messaging, Email, Cron Jobs, …

  6. Job Scheduler Characteristics • Condor Job Scheduling (revisit) Match Maker ClassAd Storage CentralManager Negotiator Collector C C C C C Schedd Startd Job Queue Worker Node

  7. Job Scheduler Characteristics • Job Scheduler need to process large amount of data Job Request Daemon Process Google App Engine Memcache Datastore Servlet

  8. Memory Hierarchy Volatile Serialization Cost Query Global Namespace Local memory (Static variable) Memcached DataStore

  9. Scheduling Prototypes M M M M M M M M M Batch ClassAd Online ClassAd Online + Batch DB Query Local memory (Static variable) C C C C C C C C C C C C J J C J C C Memcached DataStore

  10. DataStore • DataStore API take lots of CPU cycle • Easy to reach hard DataStore limit (20 CPU-sec/sec) • Storing each ClassAd take 0.22 CPU-sec • DataStore rejects requests on high contention cells • DataStore is faster to retrieve a large amount of data • Query predicates have to match pre-defined indices

  11. Memcache • Memcached size limit • 10K entries of 5K ClassAd • Memcached latency • Retrieving 1.5K entries of 5K ClassAd takes 30 secs. • Memcached parallelism • Multiple concurrent requests to memcached do not degrade performance • Only provide get/set interface • Cannot traverse/query memcached

  12. Hosting Platform • GAE dynamically spawns JVM processes • Spawns only when all process is busy • Each JVM has only 1 thread • Maximum 10 JVM processes for free account • 10 request at a time • Memory limit per each JVM is around 110MB • JVM process is short-lived • Get killed after 110 seconds idle • Use Cron job to keep JVM process alive

  13. Other Useful Services • Instant messaging: XMPP • Convenient communication protocol • Between CM and worker • Between CM and users (Google Talk) • Email • Offline notification

  14. Conclusion

  15. Testimonial • Google App Engine is crippled J2EE+RDBMS • Scale with money • Good testing platform • Look promising on documents

  16. Job Request Daemon Process Google App Engine Memcache Datastore Servlet

  17. Central Manager Match Maker ClassAd Storage Negotiator Collector C C Schedd Startd C C C Job Queue Worker Node

More Related