Evaluation of Data and Request Distribution Policies in Clustered Servers Adnan Khaleel and A. L. Narasimha Reddy Texas A&M University adnan,firstname.lastname@example.org
Introduction • Internet use has skyrocketed • 74MB/month in ‘92, several gigabytes/hour today • Trend can be expected to grow in coming years • Increasing load has placed burdens on hardware and software beyond their original designs
Introduction (cont’d) • Clustered Servers are viable solutions
Issues in Clustered Servers • Need to present a single server image • DNS aliasing, magic routers etc • Multiplicity in Back-End Servers: • How should data be organized on back-end ? • How should incoming requests be distributed amongst the back-end servers ?
Issues in Clustered Servers (cont’d) • Data Organization • Disk Mirroring • Identical data maintained on all back-end servers • Every machine able to service requests without having to access files on other machines. • Several redundant machines present, good system reliability • Disadvantages • Inefficient use of disk space • Data cached on several nodes simultaneously
Issues in Clustered Servers (cont’d) • Data Organization (cont’d) • Disk Striping • Borrowed from Network File Servers • Entire data space divided over all the back-end servers • Portion of file may reside on several machines • Improve reliability through parity protection • For large file accesses, automatic load distribution • Better access times
Issues in Clustered Servers (cont’d) • Locality • Taking advantage of files already cached in back-end server’s memory • For clustered Server System • Requests accessing same data be sent to the same set of servers
Issues in Clustered Servers (cont’d) • Distribution Vs Locality ? • Load balanced system • Distribute requests evenly among back-end servers • Improve hit-rate and response time • Maximize locality • Current studies focus only on one aspect and ignore the other
Request Distribution Schemes (cont’d) • Round Robin Request Distribution
Request Distribution Schemes (cont’d) • Round Robin Request Distribution (cont’d) • Requests distributed in a sequential manner • Results in ideal distribution • Does not take server loading into account • Weighted Round Robin • Two Tier Round Robin • Cache Hits purely coincidental
Request Distribution Schemes (cont’d) • Round Robin Request Distribution (cont’d) • Every back-end server has to cache the entire content of the Server • Unnecessary duplication of files in cache • Inefficient use of cache space • Back-ends may see different queuing times due to uneven hit rates
Request Distribution Schemes (cont’d) • File Based Request Distribution
Request Distribution Schemes (cont’d) • File Based Request Distribution (cont’d) • Locality based distribution • Partition file-space and assign a partition to each back-end server • Advantages • Does not suffer from duplicated data on cache • Based on access patterns, can yield high hit rates
Request Distribution Schemes (cont’d) • File Based Request Distribution (cont’d) • Disadvantages • How to determine file-space partitioning ? • Difficult to partition so requests load back-ends evenly • Dependent on client access patterns, no one partition scheme can satisfy all cases • Some files will always be requested more than others • Locality is of primary concern, distribution ignored • Hope that partitioning achieves the distribution
Request Distribution Schemes (cont’d) • Client Based Request Distribution
Request Distribution Schemes (cont’d) • Client Based Request Distribution (cont’d) • Also locality based • Partition client-space and assign a partition to each back-end server • Advantages and disadvantages similar to file-based • Difficult to find ideal partitioning scheme • Ignores distribution
Request Distribution Schemes (cont’d) • Client Based Request Distribution (cont’d) • Slightly modified from DNS used in Internet • Allows flexibility in client-server mapping • TTL set during first resolution • On expiration, client expected to re-resolve name • Possibly different TTL could be used for different workload characteristics • However, clients ignore TTL • Hence a STATIC scheme
Request Distribution Schemes (cont’d) • Locality Aware Request Distribution  • Broadly based on file-based scheme • Addresses the issue of load balancing • Each file assigned a dynamic set of servers instead of just one server
Request Distribution Schemes (cont’d) • LARD (cont’d) • Technique • On first request for a file, assign least loaded back-end • On subsequent requests for the same file • Determine Max/Min loaded servers in assigned set • If (Max loaded server > High Threshold OR a server exists in cluster with load < Low Threshold ) then add the new least loaded server to set and assign to service request • Else assign Min loaded server in set to service request • If any server in set inactive > time T, remove from set
Request Distribution Schemes (cont’d) • LARD (cont’d) • File-space partitioning done on the fly • Disadvantages • Large amounts of processing needs to be performed by the front-end • Large amount of memory needed to maintain information on each individual file • Possible bottleneck as system is scaled
Request Distribution Schemes (cont’d) • Dynamic Client Based Request Distribution • Based on the premise that file reuse among clients is high • Complete ignorance of server loads • Propose a modification to the static client based distribution to make it actively modify distribution based on back-end loads.
Request Distribution Schemes (cont’d) • Dynamic Client Based (cont’d) • Use of time-to-live (TTL) for server mappings within cluster - TTL is continuously variable • In heavily loaded systems • RR type distribution preferable as queue times predominate • TTL values should be small • In lightly loaded systems • TTL values should be large in order to maximize benefits of locality
Request Distribution Schemes (cont’d) • Dynamic Client Based (cont’d) • On TTL expiration, assign client partition to least loaded back-end server in cluster • If more than one server has the same low load - choose randomly from that set • Allows server using an IPRP type protocol to redirect client to other server if it aids load balancing • Unlike DNS, clients cannot void this mechanism • Hence - Dynamic
Request Distribution Schemes (cont’d) • Dynamic Client Based (cont’d) • Trend in server load essential to determine if TTL is to be increased or decreased • Need to average out the requests to smooth out transient activity • Moving Window Averaging Scheme • Only requests that come within the window period actively contribute towards load calculation
Simulation Model • Trace Driven simulation model • Based on CSIM  • Modelled an IBM OS/2 for various hardware parameters • Several parameters could be modified • # of servers, memory size, CPU capacity in MIPS (50), disk access times, Network communication time/packet, data organization - disk mirror or stripe
Simulation Model (cont’d) • In disk mirror and disk striping, data cached at request servicing nodes • In disk striping, data is also cached at disk-end nodes
Simulation Model (cont’d) • Traces • Representative of two arena where clustered servers are currently used • World Wide Web (WWW) Servers • Network File (NFS) Servers
Simulation Model (cont’d) • WEB Trace • ClarkNet WWW Server - ISP for Metro Baltimore - Washington DC area • Collected over a period of two weeks • Original trace had 3 million records • Weeded out non HTTP related records like CGI, ftp • Resulting trace had 1.4 million records • Over 90,000 clients • Over 24,000 files that had a total occupancy of slightly under 100 MBytes
Simulation Model (cont’d) • WEB Trace (cont’d) • Records had timestamps with 1 second resolution • Did not accurately represent real manner of request arrivals • Requests that arrived in the same second were augmented with a randomly generated microsecond extension
Simulation Model (cont’d) • NFS Trace • Obtained from Auspex  file server at UC Berkeley • Consists of post client-cache misses • Collected over a period of one week • Had 231 clients, over 68,000 files that had a total occupancy of 1,292 Mbytes
Simulation Model (cont’d) • NFS Trace (cont’d) • Original trace had a large amount of backup data at night and over weekends, only daytime records used in simulation • Records had timestamps with microsecond resolution • Cache allowed to WARM-UP prior to any measurements being made
Results - Effects of Memory Size • NFS Trace,Disk Stripe • Increase mem = increase cache space Response time for 4 back-end servers.
Results - Effects of Memory Size • NFS trace, Disk Stripe • FB better at extracting locality • RR hits are purely probabilistic Cache-hit ratio for 4 back-end servers.
Results - Effects of Memory Size • WEB trace, Disk Stripe • WEB trace has a smaller working set • Increase in memory as less of an effect Response time for 4 back-end servers.
Results - Effects of Memory Size • WEB trace, Disk Stripe • Extremely high hit rates, even at 32 Mbytes • FB able to extract maximum locality • Distribution scheme less of an effect on response time • Load distribution was acceptable for all schemes, best RR , worst FB Cache hit rates for 4 back-end system.
Results - Effects of Memory Size • WEB Trace,Disk Mirror • Very similar to DS • With smaller memory, hit rates slightly lower as no disk end caching Disk Stripe Disk Mirror Disk stripe vs. disk mirror.
Results - Scalability Performance • NFS trace, Disk Stripe • RR shows least benefit • Due to probabilistic cache hits Number of servers on response time (128MB memory).
Results - Scalability Performance • NFS Trace,Disk Stripe • ROUND ROBIN • Drop in hit rates with more servers • Lesser “probabilistic” locality Cache hit rate vs. memorysize and number of back-end servers
Results - Scalability Performance • NFS Trace,Disk Mirror • RR performance worsens with more servers • All other schemes perform similar to Disk Striping Number of servers on response time (128MB).
Results - Scalability Performance • NFS Trace,Disk Mirror • For RR, lower hit rates with more servers - lower response time • For RR, disk-end caching offers better hit rates in disk striping than in disk mirror Disk Stripe Disk Mirror Cache hit rates for RR under Disk striping vs. mirroring (128MB)
Results - Effects of Memory Size • NFS trace, Disk Mirror • Similar effect of more memory • Stagnation of hit rates in FB, DM does better than DS due to caching of data at disk end • RR exhibits better hit rates with DS than DM, greater variety of files in cache Cache hit rates with disk mirror and disk striping.
Results - Disk Stripe Vs Disk Mirror • Implicit distribution of load in Disk striping produces low disk queues Disk Stripe Disk Mirror Queueing time in Disk stripe and disk mirror. NFS trace with a 4 back-end system used.
Conclusion & Future Work • RR ideal distribution, poor response rates due to probabilistic nature of cache hit rates. • File -based was the best at extracting locality, complete lack of server loads, poor load distribution • LARD, similar to FB but better load distribution • For WEB Trace, cache hit rates were so high that distribution did not play a role in determining response time
Conclusion & Future Work • Dynamic CB addressed the problem of server load ignorance of static CB, better distribution in NFS trace, better hit rates in WEB Trace • Disk Striping distributed requests over several servers, relieved disk queues but increased server queues • In the process of evaluating a flexible caching approach with Round Robin distribution that can exploit the file-based caching methodology • Throughput comparisons of various policies • Impact of faster processors • Impact of Dynamically generated web page content