Request Distribution in Server Clusters

Request Distribution in Server Clusters

Web site infrastructure Clustered, multi-tiered architectures • e-Shopping • Open the portal home page • Login • View items, prices, availability • Select an item type • Specify the no. of items • Confirm by entering the credit card number • Logout

WS vs. AS • Web servers • Do well defined and quantifiable local work • e.g., processing HTTP headers, serving static content • Application servers • Run multi-layer programs • e.g., scripts involving calls to backends

ReDal In clustered, multi-tiered architectures, two request distribution points: • Web Server Request Distribution (WSRD): Web switch distributes requests to the web server cluster • Application Server Request Distribution (ASRD): Web server distributes requests requiring business logic to the application server cluster • ReDal: • Request Distribution • for the Application Layer • An approach for efficient distribution of requests • across a cluster of • application servers

Web Server Request Distribution Many policies: Random, Round Robin (RR), Weighted Round Robin (WRR), Least Connections • Several of these policies are commercially implemented (e.g., Cisco’s Local Director and F5’s BIG/IP) Two improvements: • Session Affinity • Locality-Aware Request Distribution (LARD) • attempts to exploit locality of working sets on different servers – not applicable to dynamically generated content • Session Affinity: • Consecutive requests in a given user session will be served faster if they are handled by the same server

Application Server Request Distribution Dynamic scheduling techniques usually presuppose some knowledge of task (e.g., duration, weight) and/ or resource (e.g., queue sizes, service times) • In ASRD, both tasks and resources are highly dynamic So, techniques are adaptations of WSRD techniques Most common technique: combination of RR and Session Affinity • Requests starting new sessions are dispatched according to RR • Subsequent requests in a session are routed to the server where the session’s previous request was served, i.e., where the session object resides => frequently results in load imbalances

A1 A1 S S3 s7 S9 1 1 2 2 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 3 3 Number of Active Sessions Number of Active Sessions Load imbalances Load imbalances A2 A2 s S6 S8 1 1 2 2 4 4 5 5 6 6 7 7 8 8 9 9 10 10 11 11 3 3 Time (minutes) Time (minutes) ReDal: Motivation Request distribution combining RR and Session Affinity Short and long sessions arrive at at one-minute intervals S S L S S L S L L S

Throughput Peak Lightly Loaded Heavily Loaded Trs per Sec #users Peak Load ReDAL Objective Distribute requests across a cluster of application servers such that: • Load on each application server is kept below a certain threshold • Session affinity is preserved where possible

ReDAL Components Application Analyzer characterizes behavior of application server Runs in offline phase to record peak throughput/load values, which are used at runtime by Request Dispatcher • Request Dispatcher • routes requests to a set of application servers • Monitors expected and actual load on each application server • Routes a given request to • the affined server if lightly loaded • else to application server having lowest expected load

ReDAL Algorithm based on key observation: think-time or view-time on a page is predictable based on past behavior Jeffrey Heer and Ed H. Chi (Palo Alto Xerox Research Center), “Mining the Structure of User Activity using Cluster Stability”, Proceedings of the Web Analytics Workshop, SIAM Conference on Data Mining(2002)

ReDal: Capacity Reservation • Consider a finite lookahead period partitioned into discrete time periods or slices Current Time Think Time r1 r2 t1 t2 Time Time Slice Slice 0 Slice 1 Slice 2 Load metrics: • Actual Load = number of requests in time slice • Expected Load = number of requests expected in a time slice based on think time, i.e., time between subsequent requests in a session • e.g., Capacity is reserved for request r2 on this application server during time slice 2 • Modified Load = Actual Load +  Expected Load (0    1)  accounts for prediction errors

ReDal: Algorithm Overview Inputs: Request in a session, Think time, Time slice duration,  Output: Assignment of request to application server A A = NULL A = SessionAffinity() If A is NULL A = LeastLoaded() UpdateLoadMetrics() AdvanceTimeSlice() Return A SessionAffinity IfActualLoad() < PeakLoad() Return AffinedServer() LeastLoaded If request is part of new session A = LeastLoaded(modified) Else A = LeastLoaded(actual) Return A

Consistent global view of metadata • Multicasting of changed load info by WS request dispatcher • Session objects virtualized in a shared db • Web server records time of response in a cookie • useful for estimating think times in web server clusters

ReDal: Evaluation HJ (Hwang and Jung, 2002) uses “least-active-requests” routing policy not applicable to stateful applications • ReDal, RR, HJ implemented as Apache Web Server plug-ins • Load generator simulates a varying number of simultaneous user sessions, each session submitting a stream of requests • Each request chosen from a uniform distribution across the high and low load transaction requests • Load generator (LoadRunner 6), Web server (Apache), 10 application server instances (WebLogic 7.1), and session repository (Oracle 8), each running on separate hardware • Machine configuration: single-CPU (900 MHz), 1GB RAM, 20 GB disk, running Windows 2000 Advanced Server (SP3)

ReDal: Experimental Results Performance Metrics: • Average Throughput per Application Server (ATAS): average number of transactions per second an application server in the cluster provides • Average Response Time (ART): average response time provided by the application servers, measured from the end user perspective • Web Server CPU Utilization (WSCU): percentage CPU utilization on the web server, measured by OS utilities • Peak % CPU on the Application Servers: peak percentage CPU usage among a cluster of application servers measured by OS utilities. • Scaling with Application Servers: percentage CPU usage in web server for various number of application servers in application server cluster.

Throughput Performance • ReDAL (0.9) is ReDAL algorithm with  = 0.9 • ReDAL (0.5) is ReDAL algorithm with  = 0.5 ReDAL with  = 0.9 case has highest throughput

Response Time Performance ReDAL with  = 0.9 case has best response time

CPU Overhead on the Web Server Additional overhead of ReDal algorithm is 1.5% or less

Peak CPU Utilization on Application Servers Highest in the RR case and lowest in the ReDAL ( = 0.9) case

14 12 10 #App-Server=5 WSCU (%) 8 #App-Server=10 6 #App-Server=20 4 2 0 0 20 40 60 80 100 Number of Simulatenous Sessions Scaling with Application Servers overhead of ReDAL algorithm is at or below 15% for 100 concurrent sessions

1800 1600 1400 1200 ReDal-0.8 1000 ART (ms) HJ 800 RR 600 400 200 0 0 200 400 600 800 1000 Number of Simultaneous Sessions Real World Evaluation • Online credit card application • 30 WebLogic application servers on Linux Redhat 9.0 Apache Web Server on Linux RedHat 9.0 Machine hardware configuration: 1 GB RAM, 2.2 GHz dual processors Load was simulated by re-tracing web log collected during various times over a day At a peak load of 1000 simultaneous sessions, ReDAL improved the response time of RR by 100%.

Summary ReDal: Application server load Distribution Maximizes affinity Exploits application characteristics Practical and scalable

Request Distribution in Server Clusters