1 / 85

Components of a Scalable Distributed Relational Information Service

This thesis explores the architecture and components of a scalable distributed relational information service, focusing on size-based scheduling, fairness and efficiency, and other applications beyond RGIS. It also discusses the DualPats algorithm for characterizing and predicting TCP throughput on the Wide Area Network.

Download Presentation

Components of a Scalable Distributed Relational Information Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Components of a Scalable Distributed Relational Information Service Dong Lu June 14, 2005

  2. Outline • Bird’s Eye View • What is RGIS? • Architecture • What components are studied in the thesis? • Size-Based Scheduling With Inaccurate Info • Fairness and efficiency as function of correlation • Other applications: beyond RGIS • DualPats: Characterizing and Predicting TCP Throughput on the Wide Area Network • Why TCP throughput prediction? • Flow size / TCP throughput correlation • Issues with simple benchmarking • DualPats algorithm and dynamic rate adjustment • Thesis Contributions

  3. RGIS • Grid computing • Providing dependable, reliable, consistent, pervasive and unlimited computing resources • RGIS: Relational Grid Information Service • Represents globally distributed resources, including the network • Relational Model allows complex compositional queries • Relational Model is well studied; large user population • RGIS servers distributed among multiple organizations and sites

  4. Query and Update Example • A query example • Find a set of 16 Linux machines on the same LAN, each has memory over 1GB, they have a total memory of at least 32 GB, and each has a link capacity >100Mb • An update example • Host A has added 1GB memory, and will be available from 1:00 PM to 6:00 PM central time

  5. RGIS Architecture Users Applications Web Interface Canned Approximate Queries Canned Queries SOAP Interface Authenticated Direct Interface Scoping Rewrite Content Delivery Network Interface For loose consistency Query Manager and Rewriter Update Manager Nondeterminism Rewrite Time Bounding (And Iteration Of Query) Updates encrypted using asymmetric cryptography on network. Only those with appropriate keys have access Oracle 9i Front End transactional inserts and updates using stored procedures, queries using select statements (uses database’s access control) RDBMS Oracle 9i Back End Windows,Linux,Parallel Server,etc site-to-site Schema, type hierarchy, indices, PL/SQL stored procedures for each object

  6. RGIS Web Interface

  7. RGIS Architecture Users Applications Web Interface Canned Approximate Queries Canned Queries SOAP Interface Authenticated Direct Interface Scoping Rewrite Content Delivery Network Interface For loose consistency Query Manager and Rewriter Update Manager Nondeterminism Rewrite Time Bounding (And Iteration Of Query) Updates encrypted using asymmetric cryptography on network. Only those with appropriate keys have access Oracle 9i Front End transactional inserts and updates using stored procedures, queries using select statements (uses database’s access control) RDBMS Oracle 9i Back End Windows,Linux,Parallel Server,etc site-to-site Schema, type hierarchy, indices, PL/SQL stored procedures for each object

  8. Query Components • GridG: the first synthetic grid generator • Topology [Sigmetrics Performance Evaluation Review, Vol 30, No. 4, 2003] • Annotation [SC’03-1] • Query rewriting techniques to trade off query time and the result set size • Nondeterministic query [SC’03-2] • Scoped and approximate queries [GRID’03]

  9. Update and CDN Components • Size-Based Scheduling with inaccurate info to minimize mean update time • Fairness and efficiency as function of correlation [MASCOTS’04-1] • P2P scheduling [LCR’04], one in submission • Web server scheduling, in submission • Other applications [MASCOTS’04-2] • Characterizing and predicting TCP throughput on the WAN to determine update transfer time • [ICDCS’05]

  10. Update and CDN Components • Modeling and taming parallel TCP on the WAN to transfer updates faster • [IPDPS’05] • Fat-tree based end-system multicast to disseminate update scalably • [WCW’04], one in submission

  11. Outline • Bird’s Eye View • What is RGIS? • Architecture • What components are studied in the thesis? • Size-Based Scheduling With Inaccurate Info • Fairness and efficiency as function of correlation • Other applications: beyond RGIS • DualPats: Characterizing and Predicting TCP Throughput on the Wide Area Network • Why TCP throughput prediction? • Flow size / TCP throughput correlation • Issues with simple benchmarking • DualPats algorithm and dynamic rate adjustment • Thesis Contributions

  12. Scheduling Section Outline • Review of Size-Based Scheduling • Motivation • Simulation Setup • Simulation Results • New Applications

  13. The scheduling problem Scheduling: a general problem Goal: minimize the mean response time; be fair Updates come from CDN Scheduler 10K 8K 6K 3K Which update to run next? Database Response time: the time from job arrival to its completion

  14. Review of Non-size-based scheduling • FCFS, PS, etc. • FCFS: First Come First Serve • Intuitive • Easiest to implement • PS: Processor Sharing • Fair: all jobs accept equal resources • Also easy to implement Problem: Unaware of job size information, which results in high mean response time

  15. Review of size-based scheduling • SRPT, FSP, etc. • Use the job size (processing time, service time) information for scheduling • Optimal in mean response time • Fair? • Easy to implement? We use Job Size to refer to the Processing Time (Service Time) of the job

  16. Shortest Remaining Processing Time (SRPT) • Always serve the job with minimum remaining processing time first, Preemptive scheduling • Yields minimum mean response time [Schrage, Operations Research, 1968] • Surprisingly, it is fair for heavy-tail job size distribution [Bansal and Harchol-Balter, Sigmetrics ‘01] • Easy to implement? • With accurate a priori job size information, YES • Otherwise, NO

  17. Fair Sojourn Protocol (FSP) • Combined SRPT with PS, preemptive scheduling • Mean response time is close to that of SRPT; and more fair than SRPT and PS [Friedman, et al, Sigmetrics ‘03] • Easy to implement? • With accurate a priori job size information, YES • Otherwise, NO

  18. Scheduling Section Outline • Review of Size-Based Scheduling • Motivation • Simulation Setup • Simulation Results • New Applications

  19. Motivation • Size-based scheduling requires accurate knowledge of job sizes • In practice, a priori job size information is not always available • All the previous work assumes perfect knowledge of job sizes a priori • How does performance depend on quality of job size information?

  20. Correlation We study the performance of Size-based schedulers as a function of the correlation coefficient (Pearson’s R) between actual job sizes and estimated job sizes.

  21. Scheduling Section Outline • Review of Size-Based Scheduling • Motivation • Simulation Setup • Simulation Results • New Applications

  22. Trace generator Correlation (Pearson’s R) Distribution A Distribution B Trace Generator • X Y • 100 • 300 • . . • . . • . . • Correlated random pairs of X and Y • X has distribution A • Y has distribution B • X and Y are correlated to R

  23. Trace generator algorithm • Algorithm: “Normal-To-Anything” • First developed by Cario and Nelson, on INFORMS Journal on Computing 10, 1 (1998). • We simplified the algorithm and first introduced it into the simulation studies of computer systems

  24. Scatter plot of example traces Y Y X X R=0.78 R=0.13

  25. Performance metrics • Mean response time: Sojourn time, Turn-around time • Slowdown: the ratio of response time to its size. Fairness metric

  26. Simulator • Simulator • Supports M/G/1 and G/G/n/m queuing model • Simulator validation • Little’s law • Repeat the simulations in the FSP paper [Friedman, et al, Sigmetrics ‘03] • Compare with available theoretical results [Bansal and Harchol-Balter, Sigmetrics ‘01]

  27. Scheduling Policies • PS: Processor sharing • Size-based scheduling policies • SRPT: Ideal SRPT scheduler • SRPT-E: SRPT scheduler using estimated job size • FSP: Ideal Fair Sojourn Protocol • FSP-E: FSP scheduler using estimated job size Each simulation is repeated 20 times and we present the average

  28. Scheduling Section Outline • Review of Size-Based Scheduling • Motivation • Simulation Setup • Simulation Results • New Applications

  29. Mean response timeas function of R

  30. Slowdown (R=0.0224)

  31. Slowdown (R=0.239)

  32. Slowdown (R=0.4022)

  33. Slowdown (R=0.5366)

  34. Slowdown (R=0.7322)

  35. Slowdown (R=0.9779)

  36. Simulation Results: Conclusions • Performance heavily depends on correlation • SRPT-E and FSP-E can outperform PS given an effective job size estimator • Crossover point of performance metrics is a function of correlation • Also of job size distributions (See TR NWU-CS-04-33)

  37. Scheduling Section Outline • Review of Size-Based Scheduling • Motivation • Simulation Setup • Simulation Results • New Applications

  38. New Applications: Web server scheduling (TR NWU-CS-04-33) • Is file size a good estimator of a job’s service time (processing time)? Not Really (R  0.14) File Size Service time (wall clock time)

  39. New Applications: Web server scheduling • Domain-based estimator: much more accurate prediction of the service timeat low overhead

  40. New Applications: P2P server side scheduling (LCR ’04) • “Server side” of current file sharing P2P applications superficially similar to web server • Both send back files upon requests. • However, P2P application can’t even know the file size accurately a priori • Partial downloads • Our ongoing work shows that SRPT-E performs well using our time-series based job size estimators.

  41. Scheduling Section Summary • Performance of size-based scheduling policies depends on correlation between size estimates and actual sizes • Fairness, mean response time, etc. • Estimator must preserve ordering of job sizes for high performance • Performance degrades as correlation degrades • Effective new estimators for Web and P2P

  42. Outline • Bird’s Eye View • What is RGIS? • Architecture • What components are studied in the thesis? • Size-Based Scheduling With Inaccurate Info • Fairness and efficiency as function of correlation • Other applications: beyond RGIS • DualPats: Characterizing and Predicting TCP Throughput on the Wide Area Network • Why TCP throughput prediction? • Flow size / TCP throughput correlation • Issues with simple benchmarking • DualPats algorithm and dynamic rate adjustment • Thesis Contributions

  43. DualPats Overview • Algorithm for predicting the TCP throughput as function of flow size • Minimal active probing • Dynamic probe rate adjustment • Explaining flow size / throughput correlation • Explaining why simple active probing fails Large scale empirical study

  44. DualPats Section Outline • Why TCP Throughput Prediction? • Particulars of Study • Flow Size / TCP Throughput Correlation • Issues with Simple Benchmarking • DualPats Algorithm • Stability and Dynamic Rate Adjustment

  45. Goal A library call BW = PredictTransfer(src,dst,numbytes); Expected Time = numbytes/BW; Ideally, we want a confidence interval: (BWLow,BWHigh) = PredictTransfer(src,dst,numbytes,p);

  46. Available Bandwidth • Maximum rate a path can offer a flow without slowing other flows • pathchar, cprobe, nettimer, delphi, IGI, pathchirp, pathload … • mainly for traffic engineering • Available bandwidth can differ significantly from TCP throughput • Not real time, takes at least tens of seconds to run

  47. Simple TCP Benchmarking • Benchmark paths with a single small probe • BW = ProbeSize/Time • Widely used Network Weather Service (NWS) and others (Remos benchmarking collector) • Not accurate for large transfers on the current high speed Internet • Numerous papers show this and attempt to fix it

  48. Fixing Simple TCP Benchmarking • Logs [Sundharshan]: correlate real transfer measurements with benchmarking measurements • Recent transfers needed • Similar size transfers needed • Measurements at application chosen times • CDF-matching [Swany]: correlate CDF of real transfer measurements with CDF of benchmarking measurements • Recent transfers still needed • Measurements at application chosen times

  49. Analysis of TCP • Extensive research on TCP throughput modeling in networking community • Really intended to build better TCPs • Difficult to use models online because of hard to measure parameters • Future loss rate and RTT

  50. DualPats Section Outline • Why TCP Throughput Prediction? • Particulars of Study • Flow Size / TCP Throughput Correlation • Issues with Simple Benchmarking • DualPats Algorithm • Stability and Dynamic Rate Adjustment

More Related