1 / 36

Benchmarking DBMS’s for Communication Cost Analysis

Benchmarking DBMS’s for Communication Cost Analysis. A Work Term Report Presentation Tony Young M.Math Candidate May 27 th , 2005. Introduction. What is a federated system? Travelocity Remote searches of airline databases Performs bookings, adds payment details, etc. Google Scholar

Download Presentation

Benchmarking DBMS’s for Communication Cost Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Benchmarking DBMS’s for Communication Cost Analysis A Work Term Report Presentation Tony Young M.Math Candidate May 27th, 2005

  2. Introduction • What is a federated system? • Travelocity • Remote searches of airline databases • Performs bookings, adds payment details, etc. • Google Scholar • Remote searches of ACM, IEEE, etc. databases • Presents consolidated view of papers matching common search criteria

  3. Outline • Introduction • Organization • Optimization • Global Cost Modeling • Experiments • Experimental Procedure • Results • Conclusion • Future Work

  4. Organization • Multidatabase Language Approach • Pass-through Querying • Global Schema Approach

  5. Organization • Global schema approach • Burden of integration is on global DBA • Logical global schema • Functional compensation • Possibly high maintenance

  6. Organization • Global Schema Approach Physical Org. Logical Org.

  7. Optimization • Optimization challenges for the FDBS • Remote site autonomy • Remote parameters • Translation • Heterogeneous capabilities • Additional costs • From the perspective of the remote source, the FDBS is just another application requesting data!

  8. Optimization • Omni module in iAnywhere ASA • Supports GS approach and pass-through querying • Performance of global queries is not as good as local queries

  9. Global Cost Modeling • Many factors must be taken into account • Optimization Cost (OPT) • Communication Cost (COMM) • Execution Cost (EXEC) • Sub-query/Method Call Costs (SM) • Reformatting Costs (RF) • Working Cost Model

  10. Global Cost Modeling • Interest for this project is communication cost • LS = Link Speed • S = Source/DBMS • DS = Data Size • DT = Data Type • PF = Prefetch Status • PS = Packet Size • R = Processor Speed

  11. Experiments • Goal • Determine if communication cost can be modeled using simple network applications • Determine what factors affect communication cost • Two sets of experiments • Pure network benchmarking • DBMS benchmarking • Varied each factor mentioned previously, one at a time

  12. Experimental Procedure • Hot cache • 30 trials • Experimental error below 5% • Parameters varied during both sets of experiments • Semantics of prefetching for network benchmarking

  13. Experimental Procedure • Applications • DBCreate • NetBench • DBBench • ResultParse

  14. Experimental Procedure • Recall the working cost model • Used two types of queries • SELECT * ROW • SELECT MAX(COLUMN) MAX • Ensure no indexes were created • Determining communication cost

  15. Experimental Procedure • Recording query execution time

  16. Experimental Procedure • Many ways to calculate • Similar overhead in both types of queries • Assumptions • Hot cache • Transfer of max() value negligible • Loop evaluation is negligible

  17. Results • Results Table • DBMS (S)

  18. Results • Link Speed (LS)

  19. Results • Link Speed (LS)

  20. Results • Data Size (DS)

  21. Results • Data Type (DT)

  22. Results • Prefetch Status (PF)

  23. Results • Packet Size (PS)

  24. Results • Server CPU Speed (CPU)

  25. Results • Other notes • Dominant Factors • Consistency • Efficiency of Link Usage

  26. Conclusion • Many factors need to be included in cost models • Dominant Factors • Affecting Factors • Communication cost is not a pure networking problem

  27. Conclusion • Each DBMS is different in added overhead • Systems are consistent in overhead • Efficiency of link use could improve • Ease of control of the factors • Easily controllable • Not easily controllable • Much work still to be done!

  28. Future Work • Collection of additional data • Generation and testing of a communication cost model • Gathering and analysis of other global cost model parameters

  29. Acknowledgements • iAnywhere for their support • Glenn and Ivan • Support and countless questions • Mike, Anil, Ani, Dan, Matthew • Help and guidance • Mark, Scott and Dave • Hardware loans • Karim, Graham and Ian • Software help • Frank • Arranging the work term and help with the report and talk

  30. Want More? • Check out the work term report at http://www.tonyyoung.ca/wtr.pdf

  31. Optimization • Semijoin algorithm • Site selection • Remote reduction • Global reduction • Assembly • Minimizes communication costs • Exploits heterogeneous capabilities

  32. Optimization • Replicate algorithm • Site selection • Data transfer • Query execution • Assembly • Minimizes query response time • Exploits varying hardware configurations

  33. Optimization • Difference between semijoin and replicate • Assumptions made • Execution location

  34. Optimization • Garlic • Fire access STAR’s • Fire join STAR’s • Fire FinishRoot STAR • Hybrid of semijoin and replicate algorithms • Large amount of overhead

  35. Motivation • Proliferation of heterogeneous DBMS’s • Data sharing within organizations • Differing rates of technology adoption • Mergers and acquisitions • Geographic separation of teams

  36. Want More? • Check out the work term report at http://www.tonyyoung.ca/wtr.pdf

More Related