Locality-Aware Request Distribution in Cluster-based Network Servers

Locality-Aware Request Distribution in Cluster-based Network Servers Presented by: Kevin Boos Authors: Vivek S. Pai, MohitAron, et al.Rice UniversityASPLOS 1998*** Figures adapted from original presentation ***

Time Warp to 1998 • Rapid Internet growth • Bandwidth limitations • “Cheap” PCs and “fast” LANs • Need for increased throughput

Clustered Servers Front-End Node Back-End Node Client Back-End Node LAN (Switch) Back-End Node Client

Weighted Round Robin (WRR)

Pure Locality-Based Distribution

Motivation for Change • Weighted Round Robin • Disregards content on back-end nodes • Many cache misses • Limited by disk performance • Pure Locality-Based Distribution • Disregards current load on back-end nodes • Uneven load distribution • Inefficient use of resources

LARD Concepts • Locality-Aware Request Distribution • Goal: improve performance • Higher throughput • Higher cache hit rates • Reduced disk access • Even load distribution + content-based distribution • The best of both algorithms

Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing

Basic LARD Algorithm • Front-end maps target content to back-end nodes • 1-to-1 mapping • First request for each target is assigned to the least-loaded back-end node • Subsequent requests are distributed to the same back-end node based on target content mapping • Unless overloaded… • Re-assigns target content to a new back-end node

Flow of Basic LARD Front-End A a A a A A Client

Determining Load in Basic LARD • Ask the server? • Introduces unnecessary communication • Current load = number of open connections • Tracked in the front-end node • Use thresholds to determine when to re-balance • Low, High, and Limit • Re-balance when (load > Tlimit) or (load > Thigh and there is a “free” node with load < Tlow)

Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing

LARD Needs Improvement • Only one back-end node per target content • Working set is a single node • Front-end must limit total connections • Still need to increase throughput • One node per content type is unrealistic • …add more back-end nodes?

LARD/R • LARD with Replication • Maps target content to a setof back-end nodes • Working set is several nodes with similar cache content • Sends new requests to least-loaded node in set • Moves nodes to/from sets based on load imbalance • Idle nodes in a low-load set are moved to higher-load set

Flow of LARD/R Front-End A a A a A a A A A Client

LARD Outline • Basic LARD Algorithm • Improvements to LARD • Request Handoff Protocol • Simulation and Results • Prototype Implementation and Testing

Determining Content Type • How do we determine content in the front-end? • Front-end must see network traffic • Standard TCP Assumptions • Requests are small and light • Responses are big and heavy • How do we forward requests?

Potential TCP Solutions • Simple TCP Proxy • Everything must flow through front-end node • Can inspect all incoming content • Cannot respond directly from back-end to client • But front-end can also inspect all outgoing content • Better for persistent connections

TCP Connection Handoff • Front-end connects to client • Inspects content • Forwards request to back-end node • Returned directly back to client from back-end node

LARD Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing

Evaluation Goals • Throughput • Requests/second served by entire cluster • Hit rate • (Requests that hit memory cache) / (total requests) • Underutilization time • Time that a node’s load is ≤ 40% of Tlow

Simulation Model • 300MHz Pentium II • 32MB Memory (cache) • 100Mbps Ethernet • Traces from web servers at Rice and IBM

Simulation Results – Prior Work • Weighted Round Robin • Lowest throughput • Highest cache miss ratio • But lowest idle time • Pure Locality-Based • An increase in nodes  decrease in cache miss ratio • But idle time increases (unbalanced load) • Only minor improvement over WRR

Simulation Results – LARD & LARD/R • Throughput ~4x better (8 nodes) • WRR would need nodes with a 10x larger cache size • CPU bound after 8 nodes • Cache miss rate decreases • Only 1% idle time on average

Simulation Results – Throughput

Simulation Results – Cache Misses

Simulation Results – Idle Time

What Affects Performance? • WRR is disk-bound, LARD/R is CPU bound • Increasing CPU speed improves LARD/R, not WRR • Adding more disks improves WRR, not LARD/R • LARD/R shows no improvement if a node has > 2 disks • WRR is not scalable

LARD Outline • Basic LARD Algorithm • Improvements to LARD • TCP Handoff Protocol • Simulation and Results • Prototype Implementation and Testing

Prototype Implementation • One front-end PC • 300MHz Pentium II, 128MB RAM • 6 back-end PCs • 7 client PCs • 166MHz Pentium Pro, 64MB RAM • 100Mb Ethernet, 24-port switch

Prototype Testing Results

Evaluation Shortcomings • What influences the results more? • LARD/R protocol? • TCP handoff protocol?

Conclusion • LARD and LARD/R significantly better than WRR • Higher throughput • Better CPU utilization • More frequent cache hits • Reduced disk access • Benefits of Locality-Based and Load-Balanced • Scalable at low cost

Locality-Aware Request Distribution in Cluster-based Network Servers

Locality-Aware Request Distribution in Cluster-based Network Servers

Presentation Transcript

Locality-aware connection establishment

Network Programming: Servers

A network-based quark-cluster algorithm

Cluster-Based Scalable Network Services

Network-aware OS

Cooperative Caching Middleware for Cluster-Based Servers

Federated DAFS: Scalable Cluster-based Direct Access File Servers

Evaluation of Data and Request Distribution Policies in Clustered Servers

Request Distribution in Server Clusters

Load Sharing for Cluster-Based Network Service

Network-aware OS

Using Distributed Data Structures for Constructing Cluster - Based Servers

Network-aware OS

Linux Network Servers

Locality Aware Network Solutions

Network-aware OS

QoS Aware Scheduling in a Cluster-Based Web Server

A Construction of Locality-Aware Overlay Network: mOverlay and Its Performance

Network-aware OS

Towards a Scalable, Adaptive and Network-aware Content Distribution Network

Network-aware OS