1 / 43

Tutorial of Course Project: Distributed Query Engine

Tutorial of Course Project: Distributed Query Engine. Jun Wang( 王军 ) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn. Outline. Requirements Benchmark Discussion of Design & Implementation Demo Assignment Q&A. Outline. Requirements Benchmark

Download Presentation

Tutorial of Course Project: Distributed Query Engine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tutorial of Course Project:Distributed Query Engine Jun Wang(王军) East Main Building 9-216 18901291504 wjun09@mails.tsinghua.edu.cn

  2. Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB

  3. Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB

  4. Database Management • Compulsory Commands • SELECT • Fragmentation • Horizontal Fragmentation • Vertical Fragmentation DDB

  5. Architecture • P2P Architecture DDB

  6. Query Processing • SELECT statement • One table & multi-tables (JOIN) • Types of operator in the predicate: >,<,= • Command Parsing • Query Processing • General query tree • Query tree optimization and reduction • Network traffic optimization DDB

  7. User Interface • The user should be able to use the interface to interact with your Distributed Query Engine • Any type of interface • Command Line Interface • Application-based Interface • Web-based Interface • Note: DO NOT focus on the interface design. The interface meets the requirements if: • Let users input the commands • Display the results and additional evaluation metrics DDB

  8. System Outputs • The size of query result set • The optimized query tree • The time cost of query • The communication cost of query DDB

  9. Documentation and Report • Mid-term presentation • Design of the distributed database query engine • Project work plan • Final report • Architecture • Query optimization method • Implementation of communication protocols • System operation specification • Instruction of installation, configuration, and operation of the query engine DDB

  10. System Evaluation • Demonstration Time • 16th Week • SystemTest Environment • Operating system: Windows • Local DBMS: MySQL DDB

  11. Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB

  12. Dataset • We simulate a scenario of using distributed database systems. • In general, the followings are provided: • The schema of a database (global tables) • The fragmentation schemes • The allocation DDB

  13. Dataset DDB

  14. Fragmentation • Horizontal Fragmentation • Vertical Fragmentation DDB

  15. Allocation DDB

  16. Overview of Query Processing • Decomposition and Localization • Rewriting: a query  an Algebra tree • Reduction • Optimization • Optimize the cost of data transfer • Execution • Intermediate table storage and access • The TOTAL response time after the user issues a query DDB

  17. Decomposition and Localization • Evaluation Points: • The elimination of useless fragmentations and joins • The global optimization of algebra tree • Example: DDB

  18. Decomposition and Localization DDB

  19. Optimization • Evaluation metric: The amounts (Bytes) of data transfer • You should provide the following information: • The execution plan, where all operations as well as data transfers should be listed in sequence. • The amounts of each data transfer and the sum of amounts of all transfers. Note that the amounts of data transfer is measured by data BYTES before compression (you can compress the transferred data if it is necessary). DDB

  20. Execution • Evaluation metric: total response time • Total response time is the sum of • Time of input receiving • Time of query processing (decomposition, localization and optimization) • Time of result display DDB

  21. Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB

  22. Communication Protocols • Access Level • Client-Server Protocols • Server-Server Protocols • How to Design Communication Protocols • Sync vs. Async • Design of commands and responses • How to implement Communication Protocols • Strong vs. Economy • Techniques DDB

  23. Database Management • Global vs. Local • Global Management • Local Management • GDD • Global Information of DDB • Storage Issues • Local DBMS Recommendation • MySQL DDB

  24. Query Processing • Master site • Optimize the query • Formulate execution plan • Broadcast the plan • All sites • Execute commands from Master site • Return results B commands A Client C D • The Crucial Points • Global Optimization • Global Execution Formulation DDB

  25. Other Issues • SQL Statement Parser • Multi-Thread Mechanism • Query Tree Layout and Visualization DDB

  26. Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB

  27. Demo For References Only • Authors: • Shoubin Kong 孔守斌 • Jun Wang 王 军 • FangQiang Yu 余芳强 DDB

  28. Implementation Details • Programming Language: Java • Local DBMS: MySQL • Protocol: RMI DDB

  29. An Overview of the System Client End Client User System Communication Protocols Server End DDBMS DDB

  30. Deployment • Client: 127.0.0.1 • Site server 1: 127.0.0.1:40001 • Site server 2: 127.0.0.1:40002 • Site server 3: 127.0.0.1:40003 • Site server 4: 127.0.0.1:40004 DDB

  31. Database Initialization • Use your self-defined commands to initialize the database: • Define the 4 sites over 4 servers • Create the database • Create the tables • Fragment the tables • Allocation each fragmentation to sites 2014/12/1 DDB 31

  32. Commands Define site Create table Fragment Allocate Import Insert / Delete Select 2014/12/1 DDB 32

  33. Summaries • Requirement Driven • Perfect vs. Good Enough • Comparative Advantage • A Central Management Scheme to a Distributed Project DDB

  34. Outline • Requirements • Benchmark • Discussion of Design & Implementation • Demo • Assignment • Q&A DDB

  35. Assignment : Fragmentation • Q1: Select SNO from PARTA, SUPPLY Where PARTS.PNO = SUPPLY.PNO and PARTS.PRICE<6000 • Q2: Select SNAME, PNO from SUPPLIER, SUPPLY Where SUPPLIER.SNO = SUPPLY.SNO and SUPPLIER.COUNTRY = “USA” • Q3: Select SNO, SNAME, COUNT(*) FROM SUPPLIER, SUPPLY Where SUPPLIER.SNO = SUPPLY.SNO group by SUPPLIER.SNO DDB

  36. Assignment : Fragmentation • The Set of Complete and Minimal Simple Predicates {PRICE < 6000, PRICE ≥ 6000, COUNTRY = “USA”, COUNTRY ≠ “USA” } DDB

  37. Assignment : Fragmentation • PART – Horizontal Fragmentation • PARTS1 = σprice<6000PARTS • PARTS2 = σprice≥6000PARTS DDB

  38. Assignment : Fragmentation • SUPPLIER – Horizontal Fragmentation • SUPPLIER1 = σcountry=“USA”SUPPLIER • SUPPLIER2 = σ country≠ “USA” SUPPLIER DDB

  39. Assignment : Fragmentation • SUPPLY – Derived Fragmentation • SUPPLY 1 = (SUPPLY SUPPLIER1) PARTS1 • SUPPLY 2 = (SUPPLY SUPPLIER1) PARTS2 • SUPPLY 3 = (SUPPLY SUPPLIER2) PARTS1 • SUPPLY 4 = (SUPPLY SUPPLIER2) PARTS2 DDB

  40. Assignment : Allocation • a) Solution1 DDB

  41. Assignment : Allocation • a) Solution2 DDB

  42. Assignment : Allocation • b) Solution DDB

  43. Q & AThank You! CLUE

More Related