1 / 38

Network Weather Service

Network Weather Service. Sathish Vadhiyar. Sources / Credits: NWS web site: http://nws.cs.ucsb.edu NWS papers. Introduction. “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing resources”

yoshi
Download Presentation

Network Weather Service

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Network Weather Service Sathish Vadhiyar • Sources / Credits: • NWS web site: http://nws.cs.ucsb.edu • NWS papers

  2. Introduction • “NWS provides accurate forecasts of dynamically changing performance characteristics from a distributed set of metacomputing resources” • What will be the future load (not current load) when a program is executed? • Producing short-term performance forecasts based on historical performance measurements • The forecasts can be used by dynamic scheduling agents

  3. Introduction • Resource allocation and scheduling decisions must be based on predictions of resource performance during a timeframe • NWS takes periodic measurements of performance and using numerical models, forecasts resource performance

  4. NWS Goals • Components • Persistent state • Name server • Sensors • Passive (CPU availability) • Active (Network measurements) • Forecaster

  5. Architecture

  6. Architecture

  7. Performance measurements • Using sensors • CPU sensors • Measures CPU availability • Uses • uptime • vmstat • Active probes • Network sensors • Measures latency and bandwidth • Each host maintains • Current data • One-step ahead predictions • Time series of data

  8. Network Measurements

  9. Issues with Network Sensors • Appropriate transfer size for measuring throughput • Collision of network probes • Solutions • Tokens and hierarchical trees with cliques

  10. Available CPU measurement

  11. Available CPU measurement • The formulae shown does not take into account job priorities • Hence periodically an active probe is run to adjust the estimates

  12. Predictions • To generate a forecast, forecaster requests persistent state data • When a forecast is requested, forecaster makes predictions for existing measurements using different forecast models • Dynamic choice of forecast models based on the best Mean Absolute Error, Mean Square Prediction Error, Mean Percentage Prediction Error • Forecasts requested by: • InitForecaster() • RequestForecasts() • Forecasting methods • Mean-based • Median based • Autoregressive

  13. Forecasting Methods Notations: Prediction Accuracy: Mean Absolute Error (MAE) is the average of the above Prediction Method:

  14. Forecasting Methods – Mean-based 1. 2. 3.

  15. Forecasting Methods – Mean-based 4. 5.

  16. Forecasting Methods – Median-based 1. 2. 3.

  17. Autoregression 1. ai found such that it minimizes the overall error. ri ,j is the autocorellation function for the series of N measurements.

  18. Forecasting Methodology

  19. Forecast Results

  20. Forecasting Complexity vs Accuracy • Semi Non-parametric Time Series Analysis (SNP) – an accurate but complicated model • Model fit using iterative search • Calculation of conditional expected value using conditional probability density

  21. Sensor Control • Each sensor connects to other sensors and perform measurements O(N2) • To reduce the time complexity, sensors organized in hierarchy called cliques • To avoid collisions, tokens are used • Adaptive control using adaptive token timeouts • Adaptive time-out discovery and distributed leader election protocol

  22. Synchronizing network probes • Consistent periodicity and mutual exclusion • Token • List of hosts to probe • Periodicity of probe • Parameters to the probe • Sequence number • Leader initiates the token • A hosts after receiving a token: • Conducts probes with the other hosts in the token • Passes the token to the next host • Token passed back to the leader

  23. Contd… • Leader notes the token circuit time and calculates the next token initiation time as (desired periodicity – token circuit time) • To avoid long delays in token circulation and to have fault tolerance: • Each host maintains a timer • When the timer times out, the host declares itself as the leader and initiates a new token • When a host encounters two tokens, the old token is destroyed • Calculation of time-outs • Each host records token circuit time, variance of the time • Uses NWS forecasting models to predict the next token arrival time

  24. New Protocol • Compromise between periodicity and mutual exclusion • NWS administrator specifies periodicity, and an upper range of desired periodicity • If network conditions are stable and if tokens are received within the upper range, then mutual exclusion is guaranteed • If not, hosts times out and start conducting probes with possible collisions • Thus the protocol switches between good and bad phases

  25. Illustration

  26. Comparison of 2 protocols – Experimental setup • 4 machines – 2 in Lyon, France and 2 in Tennessee, USA • 240 second periodicity • 5 second range

  27. Comparison - Periodicity

  28. Comparison – Mutual exclusion

  29. Use of NWS: Scheduling a Jacobi application The problem: Appropriate partitioning strategy to balance processor efficiencies and communication overheads, i.e. deriving partitions to obtain resource performance

  30. Deriving Partitions for Jacobi • Notations • Per-processor execution time • The goal

  31. Deriving Partitions for Jacobi • Communication time • Soultion: system of linear equations by Gaussian Elimination

  32. NWS in Jacobi

  33. Resource Selection and Scheduling

  34. Resource Selection and Scheduling

  35. References • Implementing a Performance Forecasting System for Metacomputing: The Network Weather Service. Rich Wolski, Neil Spring, Chris Peterson, in Proceedings of SC97, November, 1997. • Dynamically Forecasting Network Performance Using the Network Weather Service. Rich Wolski, in Journal of Cluster Computing, Volume 1, pp. 119-132, January, 1998. • The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing. Rich Wolski, Neil Spring, and Jim Hayes, Journal of Future Generation Computing Systems,Volume 15, Numbers 5-6, pp. 757-768, October, 1999.

  36. References • Synchronizing Network Probes to avoid Measurement Intrusiveness with the Network Weather Service, B. Gaidioz, R. Wolski, and B. Tourancheau, Proceedings of 9th IEEE High-performance Distributed Computing Conference, August, 2000, pp. 147-154. • Experiences with Predicting Resource Performance On-line in Computational Grid Settings, Rich Wolski, ACM SIGMETRICS Performance Evaluation Review, Volume 30, Number 4, pp 41--49, March, 2003.

  37. Forecasting Methods Summary

  38. Prediction Accuracy

More Related