1 / 32

R*: An Overview of the Architecture

R*: An Overview of the Architecture. R. Williams, et al IBM Almaden Research Center. Outline. Environment and Data Definitions Object Naming Distributed Catalogs Transaction Management and Commit Protoctols Query Preparation Query Execution SQL Additions and Changes.

ondrea
Download Presentation

R*: An Overview of the Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R*: An Overview of the Architecture R. Williams, et al IBM Almaden Research Center

  2. Outline • Environment and Data Definitions • Object Naming • Distributed Catalogs • Transaction Management and Commit Protoctols • Query Preparation • Query Execution • SQL Additions and Changes

  3. Environment and Data Definitions • CICS as the underlying communication model • Data distribuion: • Dispersed • Replicated • Partitioned • Horizontal • vertical • Snapshot

  4. Figure 1 from paper

  5. Figure 21.4 from CS 432 text

  6. Object Naming • System Wide Names (SWN): • USER @ USER_SITE.OBJECT_NAME @ BIRTH_SITE

  7. Distributed Catalogs • Local site maintains objects in its database • Catalog entry may be cached • Entries are versioned

  8. Transaction Management and Commit Protocol • Transaction number: • SITE.SEQ_NUM (or SITE.TIME) • Two phase commit (2PC)

  9. Query Preparation • Name resolution • Authorization check • Distributed compilation • Global plan generation/optimization • Local access path selection • Local optimization • Local view materialization

  10. Figure 2 from paper

  11. Cost Model • 3 weighted components: • I/O • CPU • Message • # of messages sent • # of bytes sent

  12. Query Execution • Synchronous vs asynchronous execution • Distributed concurrency control • Deadlock detection and resolution • Crash recovery

  13. Figure 3 from paper

  14. SQL Additions and Changes • DEFINE SYNONYM • DISTRIBUTE TABLE • HORIZONTALLY • VERTICALLY • REPLICATED • DEFINE SNAPSHOT • REFRESH SNAPSHOT • MIGRATE TABLE

  15. R* Optimizer Validation and Performance Evaluation for Distributed Queries Lothar F. Mackert Guy M. Lohman IBM Almaden Research Center

  16. Outline • Distributed Compilation/Optimization • Instrumentation • Experiments and Results

  17. Distributed Compilation/Optimization • Issues: • Join site • Transfer methods: • ship whole • fetch matches • Cost model

  18. Weights Estimation • CPU: inverse of MIPS • I/O: avg seek, latency, transfer time • MSG: # of instruction per msg • BYTE: effective transmission speed of network

  19. Figure 2 from paper

  20. Instrumentation • Distributed EXPLAIN • Distributed COLLECT COUNTERS • Force optimizier

  21. Experiment I • Transfer method • Merge-scan join of 2 tables: • 500 tuples in each table • Project both table – 50% • 100 different values for join attribute • Join result: 2477 tuples

  22. Figure 4 from paper

  23. Figure 3 from paper

  24. Experiment II • Distributed vs local join • Join of 2 tables: • 1000 tuples in each table • Project both table – 50% • 3000 different values for join attribute

  25. Figure 5 from paper

  26. Figure 6 from paper

  27. Experiment III • Relative importance of cost components

  28. Figure 7, 8, 9, 10 from paper

  29. Experiment IV • Optimizer evaluation • Accurate estimates of # of msgs and bytes sent (<2% difference) • Better estimates when tables are more distributed

  30. Experiment V • Alternative distributed join methods: • Dynamically created indexes • Semijoins • Bloomjoins • 2 tables: • 1000 tuples for outer • Varies inner from 100 to 6000 tuples

  31. Figure 11, 12 from paper

  32. Other Experiments • Clustered index: • Bloomjoins < Semijoins < R* • 50% Projection: • Site 1: Bloomjoins < Semijoins < R* • Site 2: Bloomjoins < R* << Semijoins • Wider join column: • Bloomjoins < R* << Semijoins

More Related