1 / 16

Cluster Load Balancing for Fine-grain Network Services

This paper discusses a cluster-based load balancing architecture for fine-grain network services, focusing on scalability and availability. Traces and experimental evaluations are presented to validate the effectiveness of the proposed load balancing policies.

kparks
Download Presentation

Cluster Load Balancing for Fine-grain Network Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster Load Balancing for Fine-grain Network Services Kai Shen, Tao Yang, and Lingkun Chu Department of Computer Science University of California at Santa Barbara http://www.cs.ucsb.edu/projects/neptune

  2. Cluster-based Network Services • Emerging deployment of large-scale complex clustered services. • Google: 150M searches per day; index of more than 2B pages; thousands of Linux servers. • Teoma search (powering Ask Jeeves search): a Sun/Solaris cluster of hundreds of processors. • Web portals: Yahoo!, MSN, AOL, etc. • Key requirements: availability and scalability. IPDPS 2002

  3. Architecture of a Clustered Service:Search Engine Index servers (partition 1) Firewall/ Web switch Local-area network Index servers (partition 2) Web server/ Query handlers Doc servers IPDPS 2002

  4. “Neptune” Projecthttp://www.cs.ucsb.edu/projects/neptune • A scalable cluster-based software infrastructure to shield clustering complexities from service authors. • Scalable clustering architecture with load-balancing support. • Integrated resource management. • Service replication – replica consistency and performance scalability. • Deployment: • At Internet search engine Teoma www.teoma.com for more than a year. • Serve Ask Jeeves search www.ask.com since December 2001. (Serving 6-7M searches per day as of January 2002.) IPDPS 2002

  5. Network to the rest of the cluster Service Access Point Service Availability Directory Service Load-balancing Subsystem Service Availability Subsystem Service Availability Publishing Service Runtime Neptune Clustering Architecture – Inside a Node Service Consumers Services IPDPS 2002

  6. Cluster Load Balancing • Design goals: • Scalability – scalable performance; non-scaling overhead. • Availability – no centralized node/component. • For fine-grain services: • Already widespread. • Additional challenges: • Severe system state fluctuation  more sensitive to load information delay. • More frequent service requests  low per-request load balancing overhead. IPDPS 2002

  7. Evaluation Traces • Traces of two service cluster components from Internet search engine Teoma; collected during one-week of July 2001; the peak-time portion is used. IPDPS 2002

  8. Broadcast Policy • Broadcast policy: • An agent at each node collects the local load index and broadcasts it at various intervals. • Another agent listens to broadcasts from other nodes and maintains a directory locally. • Each service request is directed to the node with lightest load index in the local directory. • Load index – number of active service requests. • Advantages: • Require no centralized component; • Very low per-request overhead. IPDPS 2002

  9. Broadcast Policy with Varying Broadcast Frequency (16-node) Mean response time (norm. to Cent.) Mean response time (norm. to Cent.) <A> server 50% busy <B> server 90% busy 10 10 MediumGrain MediumGrain FineGrain 8 8 FineGrain Centralized Centralized 6 6 4 4 2 2 • Too much dependent on frequent broadcasts for fine-grain services at high load. • Reasons: load index staleness, flocking effect. 0 0 31.25 62.5 125 250 500 1000 31.25 62.5 125 250 500 1000 Mean response time (norm. to IDEAL) Mean response time (norm. to IDEAL) Mean response time (norm. to IDEAL) Mean response time (norm. to IDEAL) Mean response time (norm. to IDEAL) Mean response time (norm. to IDEAL) Mean broadcast interval (in ms) Mean broadcast interval (in ms) IPDPS 2002

  10. Random Polling Policy • For each service request, a polling agent on the service consumer node • randomly polls a certain number (poll size) of service nodes for load information; • picks the node responding with the lightest load. • Random polling with a small poll size. • Require no centralized components; • Per-request overhead is limited by the poll size; • Small load information delay due to just-in-time polling. IPDPS 2002

  11. Service nodes are kept 90% busy in average Is a Small Poll Size Enough? <A> MediumGrain trace <B> FineGrain trace 1000 100 Random Mean response time (in ms) Polling 2 Mean response time (in ms) Polling 3 800 80 Polling 4 Centralized 600 60 400 40 200 20 • Mean response time (in milliseconds) • Mean response time (in milliseconds) 0 0 0 50 100 0 50 100 Number of service nodes Number of service nodes In principle, it matches the analytical results on the supermarket model. [Mitzenmacher96] IPDPS 2002

  12. System Implementation of Random Polling Policies • Configurations: • 30 dual-processor Linux servers connected by a fast Ethernet switch. • Implementation: • Service availability announcements made through IP multicast; • Application-level services are loaded into Neptune runtime module as DLLs; run as threads; • For each service request, polls are made concurrently in UDP. IPDPS 2002

  13. Experimental Evaluation of Random Polling Policy (16-node) <B> FineGrain trace <A> MediumGrain trace 700 Random Random Mean response time (in ms) 80 Mean response time (in ms) 600 Polling 2 Polling 2 Polling 3 Polling 3 500 Polling 4 Polling 4 60 Polling 8 Polling 8 400 Centralized Centralized 40 300 200 20 100 • For FineGrain trace, large polling size performs even worse due to excessive polling overhead and long polling delay. 0 0 50% 60% 70% 80% 90% 50% 60% 70% 80% 90% Mean response time (in milliseconds) Mean response time (in milliseconds) Mean response time (in milliseconds) Mean response time (in milliseconds) Server load level Server load level IPDPS 2002

  14. Discarding Slow-responding Polls • Polling delay with a poll size of 3: • 290us polling delay when service nodes are idle. • In a typical run when service nodes are 90% busy: • Mean polling delay – 3ms; • 8.1% polls are not returned in 10ms.  Significant for fine-grain services (service time in tens of ms) • Discarding slow-responding polls – shortens the polling delay.  8.3%reduction in mean response time. IPDPS 2002

  15. Related Work • Clustering middleware and distributed systems – Neptune, WebLogic/Tuxedo, COM/DCOM, MOSIX, TACC, MultiSpace. • HTTP switching – Alteon, ArrowPoint, Foundry, Network Dispatcher. • Load-balancing for distributed systems – [Mitzenmacher96], [Goswami93], [Kunz91], MOSIX, [Zhou88], [Eager86], [Ferrari85]. • Low-latency network architecture – VIA, InfiniBand. IPDPS 2002

  16. Conclusions • Random-polling based load balancing policies are well-suited for fine-grain network services. • A small poll size provides sufficient information for load balancing; while an excessively large poll size may even degrade the performance. • Discarding slow-responding polls can further improve system performance. http://www.cs.ucsb.edu/projects/neptune IPDPS 2002

More Related