1 / 19

Validating an Access Cost Model for Wide Area Applications

Validating an Access Cost Model for Wide Area Applications. Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright. Scalable Wide-Area Applications. Problems Wide area environment is dynamic (noisy) Wide variability in latency (end-to-end delay)

Download Presentation

Validating an Access Cost Model for Wide Area Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Validating an Access Cost Model for Wide Area Applications Louiqa Raschid University of Maryland CoopIS 2001 Co-authors V. Zadorozhny, T. Zhan and L. Bright

  2. Scalable Wide-Area Applications Problems • Wide area environment is dynamic (noisy) • Wide variability in latency (end-to-end delay) • Network and server workloads are unknown • Time and Day dependencies impact latency • Dynamic environment - constantly monitored Research Objective: Use query feedback to monitor and learn behavior and to predict access cost distributions that may be Time and Day dependent L. Raschid — University of Maryland, CoopIS01

  3. Talk Outline • Architecture for Wide Area Applications • WebPT: Tool to predict access costs • WebPT based Access Cost Catalog • Grouping of WebSources based on observable WebSource characteristics • Hypothesis to test WebPT based Catalog -- High Prediction Accuracy versus Low Prediction Accuracy • Validation based on experimental case study L. Raschid — University of Maryland, CoopIS01

  4. Architecture for WebPT based Catalog L. Raschid — University of Maryland, CoopIS01

  5. Predicting Response Times for Accessing WebSources Problem: Difficulty in determining evaluation costs Physical implementation details unknown Load on network and WebSource unknown • Objective: • Use query feedback to learn access costs • Exploit Time of day, Day of week etc., to predict costs • Identify easily observable WebSource characteristicsDetermine prediction accuracy for WebSources based on WebSource characteristics L. Raschid — University of Maryland, CoopIS01

  6. Metrics in WebPT Access Cost Model • WebSource and Network Costs • Query Processing at WebSource • Downloading data from WebSource (extraction cost) • Wrapper Statistics • Number of Pages Accessed • Cardinality of Result • Statistics may be dependent on value of query binding • WebPT - a tool for learning using query feedback and predicting access cost based on parameters such as Day, Time, Qty of data , Cardinality, etc. L. Raschid — University of Maryland, CoopIS01

  7. WebPT Learning L. Raschid — University of Maryland, CoopIS01

  8. WebPT based Prediction • WebPT is configured for some hierarchy of dimensions Quantity, Day,Time, Cardinality • WebPT Learning algorithm • Cell splitting • Smoothing • Estimate response time and confidence • Similar to CART (regression versus heuristics) • Cell merging • Heuristics used in calibration of each cell • Dimension - min/ max/ scale • Allowed deviation • Confidence window L. Raschid — University of Maryland, CoopIS01

  9. Prediction Accuracy of WebPT based Cost Model is strongly correlated with the following: • Observable WebSource Characteristics • Significance of Time and Day in predicting workload at the server and on the network • Variance (noise) in accessing server • Quality of available statistics - cardinality • Random bindings - large variance in cardinality • Fixed bindings - better estimation of cardinality L. Raschid — University of Maryland, CoopIS01

  10. Case Study: Data gathering and Experiment • 6 data sources in the public domain • Data gathered for several weeks in 1999, 2000 • Queries submitted to WebSources periodically • Recorded TTF TTL • Query bindings affected result cardinality • Random bindings - >50 bindings • Fixed bindings - 2 bindings each for [S,M,L] • Mediator queries - simple scan to complex 5 way join over data in 5 WebSources (not reported) L. Raschid — University of Maryland, CoopIS01

  11. Observable WebSource Characteristics L. Raschid — University of Maryland, CoopIS01

  12. Grouping of WebSources based on Characteristics • G1: T and D significant; Noise can vary • G2: Noise High • G3: T, D not significant; Noise Low - EMPTY L. Raschid — University of Maryland, CoopIS01

  13. Hypothesis to test WebPT based Access Cost Catalog • H1: High prediction Accuracy for the following • T, D, are significant and Low Noise • Sources are in G1 but not in G2 • H2: Catalog will improve prediction accuracy for the following WebSources • T, D are significant independent of noise • Group G1 • H3: Statistics may be dependent on value of query binding • Prediction accuracy improves with learning on fixed bindings • Sources in both groups L. Raschid — University of Maryland, CoopIS01

  14. Prediction Accuracy for WebSources WebPT(Lo) - Random bindings L. Raschid — University of Maryland, CoopIS01

  15. WebSource Characteristics and Correlation With Prediction Accuracy L. Raschid — University of Maryland, CoopIS01

  16. Groupings of WebSources and Correlation with Prediction Accuracy G1: T and D significant G2: Noise High GNIS: High Pred Accuracy G1 AND G2 FAA; FishBase: Low Pred Accuracy while in G1; Noisy L. Raschid — University of Maryland, CoopIS01

  17. Quantile Plots of Relative Error of Prediction for ACM, Aircraft L. Raschid — University of Maryland, CoopIS01

  18. Quantile Plot of Relative Error of Prediction for FAA, GNIS L. Raschid — University of Maryland, CoopIS01

  19. Summary + Impact • Unique Case Study: WebPT based Access Cost Catalog and Cost distributions • Grouping of WebSources based on observable WebSource characteristics • High Prediction Accuracy for some sources in G1 (T,D significant) with low noise • High Prediction Accuracy for some sources in G1 and in G2 (High Noise) • Similar results for Mediator cost model and complex N-way joins over multiple WebSources L. Raschid — University of Maryland, CoopIS01

More Related