1 / 34

Locality-Aware Request Distribution in Cluster-based Network Servers

Locality-Aware Request Distribution in Cluster-based Network Servers. Presented by: Kevin Boos Authors: Vivek S. Pai , Mohit Aron , et al. Rice University ASPLOS 1998 *** Figures adapted from original presentation ***. Time Warp to 1998. Rapid Internet growth Bandwidth limitations

beata
Download Presentation

Locality-Aware Request Distribution in Cluster-based Network Servers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, MohitAron, et al.Rice UniversityASPLOS 1998*** Figures adapted from original presentation ***

  2. Time Warp to 1998 • Rapid Internet growth • Bandwidth limitations • “Cheap” PCs and “fast” LANs • Need for increased throughput

  3. Clustered Servers Front-End Node Back-End Node Client Back-End Node LAN (Switch) Back-End Node Client

  4. Weighted Round Robin (WRR)

  5. Pure Locality-Based Distribution

  6. Motivation for Change • Weighted Round Robin • Disregards content on back-end nodes • Many cache misses • Limited by disk performance • Pure Locality-Based Distribution • Disregards current load on back-end nodes • Uneven load distribution • Inefficient use of resources

  7. LARD Concepts • Locality-Aware Request Distribution • Goal: improve performance • Higher throughput • Higher cache hit rates • Reduced disk access • Even load distribution + content-based distribution • The best of both algorithms

  8. Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing

  9. Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing

  10. Basic LARD Algorithm • Front-end maps target content to back-end nodes • 1-to-1 mapping • First request for each target is assigned to the least-loaded back-end node • Subsequent requests are distributed to the same back-end node based on target content mapping • Unless overloaded… • Re-assigns target content to a new back-end node

  11. Flow of Basic LARD Front-End A a A a A A Client

  12. Determining Load in Basic LARD • Ask the server? • Introduces unnecessary communication • Current load = number of open connections • Tracked in the front-end node • Use thresholds to determine when to re-balance • Low, High, and Limit • Re-balance when (load > Tlimit) or (load > Thigh and there is a “free” node with load < Tlow)

  13. Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing

  14. LARD Needs Improvement • Only one back-end node per target content • Working set is a single node • Front-end must limit total connections • Still need to increase throughput • One node per content type is unrealistic • …add more back-end nodes?

  15. LARD/R • LARD with Replication • Maps target content to a setof back-end nodes • Working set is several nodes with similar cache content • Sends new requests to least-loaded node in set • Moves nodes to/from sets based on load imbalance • Idle nodes in a low-load set are moved to higher-load set

  16. Flow of LARD/R Front-End A a A a A a A A A Client

  17. LARD Outline • Basic LARD Algorithm • Improvements to LARD • Request Handoff Protocol • Simulation and Results • Prototype Implementation and Testing

  18. Determining Content Type • How do we determine content in the front-end? • Front-end must see network traffic • Standard TCP Assumptions • Requests are small and light • Responses are big and heavy • How do we forward requests?

  19. Potential TCP Solutions • Simple TCP Proxy • Everything must flow through front-end node • Can inspect all incoming content • Cannot respond directly from back-end to client • But front-end can also inspect all outgoing content • Better for persistent connections

  20. TCP Connection Handoff • Front-end connects to client • Inspects content • Forwards request to back-end node • Returned directly back to client from back-end node

  21. LARD Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing

  22. Evaluation Goals • Throughput • Requests/second served by entire cluster • Hit rate • (Requests that hit memory cache) / (total requests) • Underutilization time • Time that a node’s load is ≤ 40% of Tlow

  23. Simulation Model • 300MHz Pentium II • 32MB Memory (cache) • 100Mbps Ethernet • Traces from web servers at Rice and IBM

  24. Simulation Results – Prior Work • Weighted Round Robin • Lowest throughput • Highest cache miss ratio • But lowest idle time • Pure Locality-Based • An increase in nodes  decrease in cache miss ratio • But idle time increases (unbalanced load) • Only minor improvement over WRR

  25. Simulation Results – LARD & LARD/R • Throughput ~4x better (8 nodes) • WRR would need nodes with a 10x larger cache size • CPU bound after 8 nodes • Cache miss rate decreases • Only 1% idle time on average

  26. Simulation Results – Throughput

  27. Simulation Results – Cache Misses

  28. Simulation Results – Idle Time

  29. What Affects Performance? • WRR is disk-bound, LARD/R is CPU bound • Increasing CPU speed improves LARD/R, not WRR • Adding more disks improves WRR, not LARD/R • LARD/R shows no improvement if a node has > 2 disks • WRR is not scalable

  30. LARD Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing

  31. Prototype Implementation • One front-end PC • 300MHz Pentium II, 128MB RAM • 6 back-end PCs • 7 client PCs • 166MHz Pentium Pro, 64MB RAM • 100Mb Ethernet, 24-port switch

  32. Prototype Testing Results

  33. Evaluation Shortcomings • What influences the results more? • LARD/R protocol? • TCP handoff protocol?

  34. Conclusion • LARD and LARD/R significantly better than WRR • Higher throughput • Better CPU utilization • More frequent cache hits • Reduced disk access • Benefits of Locality-Based and Load-Balanced • Scalable at low cost

More Related