1 / 37

Gamma DBMS (Part 2): Failure Management Query Processing

Gamma DBMS (Part 2): Failure Management Query Processing. Shahram Ghandeharizadeh Computer Science Department University of Southern California. Failure Management Techniques. Teradata’s Interleaved Declustering A partitioned table has a primary and a backup copy.

Download Presentation

Gamma DBMS (Part 2): Failure Management Query Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Gamma DBMS (Part 2): Failure Management Query Processing Shahram Ghandeharizadeh Computer Science Department University of Southern California

  2. Failure Management Techniques • Teradata’s Interleaved Declustering • A partitioned table has a primary and a backup copy. • The primary copy is constructed using one of the partitioning techniques. • The secondary copy is constructed by: • Dividing the nodes into clusters (cluster size is 4), • Partition a primary fragment (R0) across the remaining nodes of the cluster: 1, 2, and 3. Realizing r0.0, r0.1, and r0.2.

  3. Teradata’s Interleaved Declustering • When a node (say 1) fails, its backup copy processes requests directed towards the primary copy of R1. • Three backup fragments r1.2, r1.0 and r1.1. • Note that the load of R1 is distributed across the remaining nodes of the cluster.

  4. Teradata’s Interleaved Declustering • MTTR involves: • Replacing the failed node with a new one. • Reconstructing the primary copy of the fragment assigned to the failed node, R1. • By reading r1.2, r1.0, and r1.1 from Nodes 0, 2, and 3. • Reconstructing the backup fragments assigned to the failed node: r0.0, r2.2, and r3.1.

  5. Teradata’s Interleaved Declustering • When does data become unavailable?

  6. Teradata’s Interleaved Declustering • When does data become unavailable? • When a second node in a cluster fails prior to repair of the first failed node in that cluster. • Note that it is a bit more complex than the discussion here.

  7. Teradata’s Interleaved Declustering • What is the advantage of making the cluster size equal to 8?

  8. Teradata’s Interleaved Declustering • What is the advantage of making the cluster size equal to 8? • Better distribution of the workload across the nodes in the presence of a failure.

  9. Teradata’s Interleaved Declustering • What is the advantage of making the cluster size equal to 8? • Better distribution of the workload across the nodes in the presence of a failure. • What is the dis-advantage of making the cluster size equal to 8?

  10. Teradata’s Interleaved Declustering • What is the advantage of making the cluster size equal to 8? • Better distribution of the workload across the nodes in the presence of a failure. • What is the dis-advantage of making the cluster size equal to 8? • Higher likelihood of data becoming unavailable.

  11. Teradata’s Interleaved Declustering • What is the advantage of making the cluster size equal to 8? • Better distribution of the workload across the nodes in the presence of a failure. • What is the dis-advantage of making the cluster size equal to 8? • Higher likelihood of data becoming unavailable. • Tradeoff between load-balancing (in the presence of a failure) and data availability.

  12. Gamma’s Chained Declustering • Nodes are divided into disjoint groups called relation clusters. • A relation is assigned to one relation cluster and its records are declustered across the nodes of that relation cluster using a partitioning strategy (Range, Hash). • Given a primary fragment Ri, its backup copy is assigned to node (i+1) mod M (M is the number of nodes in the relation cluster).

  13. Gamma’s Chained Declustering • During normal mode of operation: • Read requests are directed to the fragments of primary copy, • Write requests update both primary and backup copies.

  14. Gamma’s Chained Declustering • In the presence of failure: • Both primary and backup fragments are used for read operations, • Objective: Balance the load and avoid bottlenecks! • Write requests update both primary and backup copies. • Note: • Load of R1 (on node 1) is pushed to node 2 in its entirety. • A fraction of read request from each node is pushed to the others for a 1/8 load increase attributed to node 1’s failure.

  15. Gamma’s Chained Declustering • MTTR involves: • Replace node 1 with a new node, • Reconstruct R1 (from r1 on node 2) on node 1, • Reconstruct backup copy of R0 (i.e., r0) on node 1. • Note: • Once Node 1 becomes operational, primary copies are used to process read requests.

  16. Gamma’s Chained Declustering • Any two node failures in a relation cluster does not result in data un-availability. • Two adjacent nodes must fail in order for data to become unavailable.

  17. Gamma’s Chained Declustering • Re-assignment of active fragments incurs neither disk I/O nor data movement.

  18. Join • Hash-join • A data-flow execution paradigm

  19. Example Join of Emp and Dept Emp join Dept (using dno) Dept EMP

  20. Hash-Join: 1 Node • Join of Tables A and B using attribute j (A.j = B.j) consists of two phase: • Build phase: Build a main-memory hash table on Table A using the join attribute j, e.g., build a hash table on the Toy department using dno as the key of the hash table. • Probe phase: Scan table B one record at a time and use its attribute j to probe the hash table constructed on Table A, e.g., probe the hash table using the rows of the Emp department.

  21. Hash-Join: Build • Read rows of Dept table one at a time and place in a main-memory hash table. dno % 7

  22. Hash-Join: Build • Read rows of Emp table and probe the hash table. dno % 7

  23. Hash-Join: Build • Read rows of Emp table and probe the hash table and produce results when a match is found. dno % 7

  24. Hash-Join: Build • Termination condition is when all rows of the Emp table have been processed! dno % 7

  25. Hash-Join • Key challenge:

  26. Hash-Join • Key challenge: Table used to build the hash table does not fit in main memory! • Solution:

  27. Hash-Join • Key challenge: Table used to build the hash table does not fit in main memory! • A divide-and-conquer approach: • Use the inner table (Dept) to construct n memory buckets where each bucket is a hash table. • Every time memory is exhausted, spill a fixed number of buckets to the disk. • The build phase terminates with a set of in-memory buckets and a set of disk-resident buckets. • Read the outer relation (Emp) and probe the in-memory buckets for joining records. For those records that map onto the disk-resident buckets, stream and store them to disk. • Discard the in memory buckets to free memory space. • While disk-resident buckets of inner-relation exist: • Read as many (say i) of the disk-resident buckets of the inner-relation into memory as possible. • Read the corresponding buckets of the outer relation (Emp) to probe the in-memory buckets for joining records. • Discard the in memory buckets to free memory space. • Delete the i buckets of the inner and outer relations.

  28. Hash-Join: Build • Two buckets of Dept table. One in memory and the second is disk-resident. dno % 7

  29. Hash-Join: Probe • Read Emp table and probe the hash table for joining records when dno = 1. With dno=2, stream the data to disk. dno % 7

  30. Hash-Join: Probe • Those rows of Emp table with dno=1 probed the hash table and produce 3 joining records. dno % 7

  31. Hash-Join: While loop • Read the disk-resident bucket of Dept into memory. dno % 7

  32. Hash-Join: While loop • Read the disk-resident bucket of Dept into memory. • Probe it with the disk-resident buckets of Emp table to produce the remaining two joining records. dno % 7

  33. Parallelism and Hash-Join • Each node may perform hash-join independently when: • The join attribute is the declustering attribute of the tables participating in the join operation. • The participating tables are declustered across the same number of nodes using the same declustering strategy. • The system may re-partition the table (see the next bullet) if its aggregate memory exceeds the size of memory the tables are declustered across. • Otherwise, the data must be re-partitioned to perform the join operation correctly. • Show an example!

  34. Parallelism and Hash-Join (Cont…) • R join S where R is the inner table.

  35. Data Flow Execution Paradigm • Retrieve all those Employees working for the toy department: SELECT * FROM Dept d, Emp e WHERE d.dno = e.dno and d.dname = Toy

  36. Data Flow Execution Paradigm • Producer/Consumer relationship where consumers are activated in advance of the producers.

  37. Data Flow Execution Paradigm • “Split Table” contains routing information for the records • The consumers must be setup in order to activate producers.

More Related