320 likes | 400 Views
Learn about architecture, data definitions, query execution, SQL enhancements, and performance evaluation in distributed databases. Explore topics like distributed compilation, transaction management, query preparation, and more.
E N D
R*: An Overview of the Architecture R. Williams, et al IBM Almaden Research Center
Outline • Environment and Data Definitions • Object Naming • Distributed Catalogs • Transaction Management and Commit Protoctols • Query Preparation • Query Execution • SQL Additions and Changes
Environment and Data Definitions • CICS as the underlying communication model • Data distribuion: • Dispersed • Replicated • Partitioned • Horizontal • vertical • Snapshot
Object Naming • System Wide Names (SWN): • USER @ USER_SITE.OBJECT_NAME @ BIRTH_SITE
Distributed Catalogs • Local site maintains objects in its database • Catalog entry may be cached • Entries are versioned
Transaction Management and Commit Protocol • Transaction number: • SITE.SEQ_NUM (or SITE.TIME) • Two phase commit (2PC)
Query Preparation • Name resolution • Authorization check • Distributed compilation • Global plan generation/optimization • Local access path selection • Local optimization • Local view materialization
Cost Model • 3 weighted components: • I/O • CPU • Message • # of messages sent • # of bytes sent
Query Execution • Synchronous vs asynchronous execution • Distributed concurrency control • Deadlock detection and resolution • Crash recovery
SQL Additions and Changes • DEFINE SYNONYM • DISTRIBUTE TABLE • HORIZONTALLY • VERTICALLY • REPLICATED • DEFINE SNAPSHOT • REFRESH SNAPSHOT • MIGRATE TABLE
R* Optimizer Validation and Performance Evaluation for Distributed Queries Lothar F. Mackert Guy M. Lohman IBM Almaden Research Center
Outline • Distributed Compilation/Optimization • Instrumentation • Experiments and Results
Distributed Compilation/Optimization • Issues: • Join site • Transfer methods: • ship whole • fetch matches • Cost model
Weights Estimation • CPU: inverse of MIPS • I/O: avg seek, latency, transfer time • MSG: # of instruction per msg • BYTE: effective transmission speed of network
Instrumentation • Distributed EXPLAIN • Distributed COLLECT COUNTERS • Force optimizier
Experiment I • Transfer method • Merge-scan join of 2 tables: • 500 tuples in each table • Project both table – 50% • 100 different values for join attribute • Join result: 2477 tuples
Experiment II • Distributed vs local join • Join of 2 tables: • 1000 tuples in each table • Project both table – 50% • 3000 different values for join attribute
Experiment III • Relative importance of cost components
Experiment IV • Optimizer evaluation • Accurate estimates of # of msgs and bytes sent (<2% difference) • Better estimates when tables are more distributed
Experiment V • Alternative distributed join methods: • Dynamically created indexes • Semijoins • Bloomjoins • 2 tables: • 1000 tuples for outer • Varies inner from 100 to 6000 tuples
Other Experiments • Clustered index: • Bloomjoins < Semijoins < R* • 50% Projection: • Site 1: Bloomjoins < Semijoins < R* • Site 2: Bloomjoins < R* << Semijoins • Wider join column: • Bloomjoins < R* << Semijoins