1 / 20

Integrated Resource Management for Cluster-based Internet Services

Integrated Resource Management for Cluster-based Internet Services. Kai Shen Dept. of Computer Science Univ. of Rochester. Hong Tang, Tao Yang*, Lingkun Chu Dept. of Computer Science Univ. of California, Santa Barbara * : Ask Jeeves, Inc. Background.

Patman
Download Presentation

Integrated Resource Management for Cluster-based Internet Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrated Resource Management for Cluster-based Internet Services Kai Shen Dept. of Computer Science Univ. of Rochester Hong Tang, Tao Yang*, Lingkun Chu Dept. of Computer Science Univ. of California, Santa Barbara *: Ask Jeeves, Inc.

  2. Background • Large-scale resource-intensive Internet services hosted on server clusters. • Yahoo, MSN, Google, Teoma/Ask Jeeves … • Challenges/requirements for resource management: • Scalability and robustness; • Online users require interactive responses; • Resource (CPU, IO)–hungry service processing and large user traffic require efficient resource utilization; • Fluctuating user traffic requires adaptive management; • Supporting differentiated services to different types of user requests. OSDI 2002

  3. Architecture of Targeted Services:Document Search Engine Index servers (partition 1) Query caches Firewall/ Web switch Local-area network Index servers (partition 2) Web server/ Query handlers Index servers (partition 3) Doc servers OSDI 2002

  4. “Neptune” Project Overview • Programming and runtime support to aggregate and replicate stand-alone service components. • Building blocks forscalable and robust service constructions: • Functionally-symmetric clustering architecture; • Integrated resource management – quality, efficiency, and differentiation; • Replication management. OSDI 2002

  5. Neptune runtime Neptune runtime SAP SAP Architecture of Targeted Services:Document Search Engine Index servers (partition 1) Query cache Firewall/ Web switch Local-area network Index servers (partition 2) Web server/ Query handlers Index servers (partition 3) Doc servers OSDI 2002

  6. Neptune Deployments • Service deployments: • Web document searching; • BLAST – protein sequence similarity matching; • Prototype database services – online discussion group, auction. • Production system at search enginesTeoma/Ask Jeeves since 2000: • search indexes of more than 450M Web documents; • over 800 multiprocessor servers; • tens of millions of search queries per day. OSDI 2002

  7. Outline • Project Overview • Integrated Resource Management • Multiple Resource Management Objectives • Two-level Mechanism • Trace-driven Performance Evaluation on a Linux Cluster • Related Work and the Conclusion OSDI 2002

  8. Quality-aware Resource Utilization Efficiency • Throughput: measure resource utilization efficiency. • Service response time: measure client-perceived service quality. • Aggregate service yield: measure quality-aware resource utilization efficiency. • Fulfillment of each service request generates quality-aware service yield – a function of service response time. • Service yield function– specified by service providers (flexibility). • System goal – maximizing aggregate service yield: OSDI 2002

  9. <A> Maximizing throughput (with a deadline) Constant yield Service yield Response time 0 0 Deadline <B> Minimizing mean response time (with a deadline) <C> A hybrid metric Full yield Full yield Service yield Service yield Drop penalty Response time Response time 0 0 0 Full-yield deadline Deadline 0 Deadline Sample Service Yield Functions QoS yield QoS yield QoS yield OSDI 2002

  10. Service Differentiation • Service class – a category of service accesses that enjoy the same level of QoS support. • Client identities: paid vs unpaid, consumers vs corporate partners. • Service types or data partitions: order placement vs catalog browsing. • Service differentiation in Neptune • Differentiated service yield function. • Proportional resource allocation guarantee. OSDI 2002

  11. Two-level Resource Management OSDI 2002

  12. Cluster-level: Partitioning or Not? • Periodic Server Partitioning [Zhu2001]: • Determine resource allocation at each epoch. • Partition the server pool among service classes. • Neptune – does not partition servers at cluster-level: • Random polling-based load balancing to evenly distribute requests for each service class to all nodes  service differentiation inside each node. • Advantages: • Functional-symmetry and decentralization  robustness and scalability. • Better handling of system state changes: demand spikes and node failures. • Disadvantage: • Less isolation for misbehaved service classes. OSDI 2002

  13. Drop requests likely generating zero yield Search for under-allocated service class Schedule the under-allocated service class Yes Found ? No Schedule for high aggregate yield Node-level Request Scheduling OSDI 2002

  14. Scheduling for High Aggregate Yield • Offline optimal scheduling is NP-complete. OSDI 2002

  15. Evaluation Settings • Evaluation platform • A cluster of Linux servers connected by switched Ethernet. • Workload I: trace-driven • Document search on a 2.5GB memory-mapped search index. • Based on 1.5M search queries selected from an one-week access trace at Ask Jeeves search in January 2002. • “Service yield”-based priority order: Gold > Silver > Bronze. • Workload II: • CPU-spinning micro-benchmark. • Poisson process arrival; exponentially-distributed service processing time. QoS yield OSDI 2002

  16. Evaluation on Scheduling Policies (16 nodes aggregate) Performance Metric: (B) Overload (A) Underload EDF 6% 60% YID Loss percent Loss percent Greedy 45% Adaptive 4% 30% EDF YID 2% Lost percent Lost percent 15% Greedy • EDF and YID perform better than Greedy during system under-load; Greedy performs better during system overload. • Adaptive dynamically switches between YID and Greedy to achieve good performance under both situations. Adaptive Aggregated yield (normalized) Aggregated yield (normalized) Aggregated yield (normalized) Aggregated yield (normalized) 0% 0% 0% 25% 50% 75% 100% 100% 125% 150% 175% 200% Aggregated yield (normalized) Aggregated yield (normalized) Arrival demand Arrival demand OSDI 2002

  17. Gold demand Silver demand Bronze demand Gold acquisition Silver acquisition Bronze acquisition Service Differentiation during a Demand Spike and a Node Failure (8 nodes) CPU demand/acquisition In percentage to total system resource 100% 80% 60% 40% 20% • “Service yield”-based priority order: Gold > Silver > Bronze. • 20% proportional resource guarantee for low-priority Bronze class. • Demand spike for the Silver class between time 50 and 150. • One node fails at time 200 and recovers at 250. Resource demand/acquisition Resource demand/acquisition 0% 0 50 100 150 200 250 300 Timeline (seconds) OSDI 2002

  18. Performance Scalability <A> Differentiated Search <B> Micro-benchmark 20 20 Aggregated yield (normalized) Aggregated yield (normalized) Demand 200% Demand 200% Demand 125% Demand 125% 15 15 Demand 75% Demand 75% 10 10 5 5 Aggregate yield (normalized) Aggregate yield (normalized) 0 0 0 5 10 15 20 0 5 10 15 20 Number of service nodes Number of service nodes OSDI 2002

  19. Related Work • Software infrastructure for cluster-based Internet services – TACC [Fox1997], MultiSpace [Gribble1999], Porcupine [Saito1999], Ninja [von Behren2002]. • QoS and service differentiation in computer networks – Weighted Fair Queuing [Demers1990; Parekh1993], Leaky Bucket, LIRA [Stoica1998], [Dovrolis1999]. • QoS or real-time scheduling at the single host level – [Huang1989], [Haritsa1993], [Waldspurger1994], [Mogul1996], LRP [Druschel96], [Jones97], Eclipse [Bruno1998], Resource Container [Banga1999], [Steere1999]. • Resource management and QoS for Web servers – [Almeida1998], [Pandey1998], [Abdelzaher1999], [Bhatti1999], [Chandra2000], [Li2000], [Voigt2001]. • Resource management for clustered servers – LARD [Pai1998], Cluster Reserves [Aron2000], [Sullivan2000], DDSD [Zhu2001], [Chase2001]. OSDI 2002

  20. Conclusion • Multiple resource management objectives: • quality-aware resource utilization efficiency • service differentiation • Two-level resource management mechanism: • non-partitioning at the cluster level • adaptive scheduling at the node level • Trace-driven evaluations. • Future work – other types of service qualities. OSDI 2002

More Related