1 / 58

CSG131 Summary

CSG131 Summary. Donghui Zhang. What we learned. Storage (Chp 8 & 9) B+-tree (Chp 10) Hash index (Chp 11) External sort (Chp 13) Query evaluation Concurrency control (Chp 17) Recovery (Chp 18) Project: simple DBMS implentation, NEUStore SB-tree, R-tree. Tracks. Arm movement.

teresa
Download Presentation

CSG131 Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSG131 Summary Donghui Zhang

  2. What we learned • Storage (Chp 8 & 9) • B+-tree (Chp 10) • Hash index (Chp 11) • External sort (Chp 13) • Query evaluation • Concurrency control (Chp 17) • Recovery (Chp 18) • Project: simple DBMS implentation, NEUStore • SB-tree, R-tree.

  3. Tracks Arm movement Arm assembly Components of a Disk Spindle Disk head • The platters spin (say, 90rps). • The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder(imaginary!). Sector Platters • Only one head reads/writes at any one time. • Block size is a multiple of sector size (which is fixed).

  4. Accessing a Disk Page • Time to access (read/write) a disk block: • seek time (moving arms to position disk head on track) • rotational delay (waiting for block to rotate under head) • transfer time (actually moving data to/from disk surface) • Seek time and rotational delay dominate. • Seek time varies from about 1 to 20msec • Rotational delay varies from 0 to 10msec • Transfer rate is about 1msec per 4KB page • Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions?

  5. Index Types • A primary index is an index which controls the actual storage of a table. Typically this index is built using the primary key of the table. • A data entry is one record of the table. • A secondary index is an index which is built using some other attribute(s). • A data entry contains a set of RIDs.

  6. secondary index 21: {10, 20} 33: {10, 20, 30} 55: {10, 30} An Example primary index 3325 6632 10 30 20

  7. DB Buffer Management in a DBMS Page Requests from Higher Levels • Data must be in RAM for DBMS to operate on it! • For each page, dirty bit, pin count. BUFFER POOL disk page free frame MAIN MEMORY DISK choice of frame dictated by replacement policy

  8. Page Formats: Variable Length Records Rid = (i,N) Page i • Can move records on page without changing rid; so, attractive for fixed-length records too. Rid = (i,2) Rid = (i,1) N Pointer to start of free space 20 16 24 N . . . 2 1 # slots SLOT DIRECTORY

  9. What we learned • Storage (Chp 8 & 9) • B+-tree (Chp 10) • Hash index (Chp 11) • External sort (Chp 13) • Query evaluation • Concurrency control (Chp 17) • Recovery (Chp 18) • Project: simple DBMS implentation, NEUStore • SB-tree, R-tree

  10. Example B+ Tree Root 17 24 5 13 30 39* 2* 3* 5* 7* 8* 19* 20* 22* 24* 27* 38* 29* 33* 34* 14* 16*

  11. B+ Trees in Practice • Typical order: 100. Typical fill-factor: 67%. • average fanout = 133 • Can often hold top levels in buffer pool: • Level 1 = 1 page = 8 KB • Level 2 = 133 pages = 1 MB • Level 3 = 17,689 pages = 145 MB • Level 4 = 2,352,637 pages = 19 GB • With 1 MB buffer, can locate one record in 19 GB (or 0.3 billion records) in two I/Os!

  12. B+-tree index • Structure • Search • Insert • Delete • Bulk-loading a B+-tree • Aggregation Query • SB-tree

  13. What we learned • Storage (Chp 8 & 9) • B+-tree (Chp 10) • Hash index (Chp 11) • External sort (Chp 13) • Query evaluation • Concurrency control (Chp 17) • Recovery (Chp 18) • Project: simple DBMS implentation, NEUStore • SB-tree, R-tree

  14. LOCAL DEPTH 2 Extendible Hashing Bucket A 16* 4* 12* 32* GLOBAL DEPTH 2 2 Bucket B 00 5* 1* 21* 13* 01 • Directory is array of size 4. • To find bucket for r, take last `global depth’ # bits of h(r); we denote r by h(r). • If h(r) = 5 = binary 101, it is in bucket pointed to by 01. 2 10 Bucket C 10* 11 2 DIRECTORY Bucket D 15* 7* 19* DATA PAGES • Insert: If bucket is full, splitit (allocate new page, re-distribute). • If necessary, double the directory. (As we will see, splitting a • bucket does not always require doubling; we can tell by • comparing global depth with local depth for the split bucket.)

  15. Insert h(r)=20 (Causes Doubling) 2 LOCAL DEPTH 3 LOCAL DEPTH Bucket A 16* 32* 32* 16* GLOBAL DEPTH Bucket A GLOBAL DEPTH 2 2 2 3 Bucket B 5* 21* 13* 1* 00 1* 5* 21* 13* 000 Bucket B 01 001 2 10 2 010 Bucket C 10* 11 10* Bucket C 011 100 2 2 DIRECTORY 101 Bucket D 15* 7* 19* 15* 7* 19* Bucket D 110 111 2 3 Bucket A2 4* 12* 20* DIRECTORY 12* 20* Bucket A2 4* (`split image' of Bucket A) (`split image' of Bucket A)

  16. If insert 22 …… Example of Linear Hashing Let hLevel=(a*key+b) % 4 hLevel+1=(a*key+b) % 8 • On split, hLevel+1 is used to re-distribute entries. Level=0, N=4 Level=0 PRIMARY h h OVERFLOW h h PRIMARY PAGES 0 0 1 1 PAGES PAGES Next=0 32* 32* 44* 36* 000 00 000 00 Next=1 Data entry r 9* 5* 9* 5* 25* 25* with h(r)=5 001 001 01 01 30* 30* 10* 10* 14* 18* 14* 18* Primary 10 10 010 010 bucket page 31* 35* 7* 31* 35* 7* 11* 11* 43* 011 011 11 11 (This info is for illustration only!) (The actual contents of the linear hashed file) 100 44* 36* 00

  17. Example of Linear Hashing Let hLevel=(a*key+b) % 4 hLevel+1=(a*key+b) % 8 Level=0 h OVERFLOW h PRIMARY 0 1 PAGES PAGES Important notice: after buckets 010 and 011 are split, Next moves back To 000! 32* 000 00 9* 25* 001 01 Next=2 30* 10* 14* 18* 22* 10 010 31* 35* 7* 11* 43* 011 11 100 44* 36* 00 101 5* 01

  18. What we learned • Storage (Chp 8 & 9) • B+-tree (Chp 10) • Hash index (Chp 11) • External sort (Chp 13) • Query evaluation • Concurrency control (Chp 17) • Recovery (Chp 18) • Project: simple DBMS implentation, NEUStore • SB-tree, R-tree

  19. General External Merge Sort • More than 3 buffer pages. How can we utilize them? • To sort a file with N pages using B buffer pages: • Pass 0: use B buffer pages. Produce sorted runs of B pages each. • Pass 2, …, etc.: merge B-1 runs. INPUT 1 . . . . . . INPUT 2 . . . OUTPUT INPUT B-1 Disk Disk B Main memory buffers

  20. Cost of External Merge Sort • Number of passes: • Cost = 2N * (# of passes) • E.g., with 5 buffer pages, to sort 108 page file: • Pass 0: = 22 sorted runs of 5 pages each (last run is only 3 pages) • Pass 1: = 6 sorted runs of 20 pages each (last run is only 8 pages) • Pass 2: 2 sorted runs, 80 pages and 28 pages • Pass 3: Sorted file of 108 pages • Note: for merge join, no need to write result.

  21. What we learned • Storage (Chp 8 & 9) • B+-tree (Chp 10) • Hash index (Chp 11) • External sort (Chp 13) • Query evaluation • Concurrency control (Chp 17) • Recovery (Chp 18) • Project: simple DBMS implentation, NEUStore • SB-tree, R-tree

  22. query compiling query execution query optimization answer SQL query parse parse tree convert logical query plan execute apply laws statistics Pi “improved” l.q.p pick best estimate result sizes {(P1,C1),(P2,C2)...} l.q.p. +sizes estimate costs consider physical plans {P1,P2,…..}

  23. SELECT E.Name FROM Emp E WHERE E.SSN<5000 AND E.Age>50 E.Name E.SSN<5000 AND E.Age>50 Emp E Query Compiling  Convert • Input: a parse tree. • Output: a logical query plan. • Algorithm:  followed by . • E.Name(E.SSN<5000 AND E.Age>50(E) ) • Alternatively, a l.q.p tree.

  24. SELECT E.Name FROM Emp E WHERE E.SSN<5000 AND E.Age>50 E.Name E.SSN<5000 AND E.Age>50 Emp E Query Optimization  Consider Physical Plans • Associate each RA operator with an implementation scheme. • Multiple implementation schemes? Enumerate all. Plan 1 (always work!) on-the-fly scan

  25. SELECT E.Name FROM Emp E WHERE E.SSN<5000 AND E.Age>50 E.Name E.SSN<5000 AND E.Age>50 Emp E Query Optimization  Consider Physical Plans Plan 2 on-the-fly range search in SSN index

  26. SELECT E.Name FROM Emp E WHERE E.SSN<5000 AND E.Age>50 E.Name E.SSN<5000 AND E.Age>50 Emp E Query Optimization  Consider Physical Plans Plan 3 on-the-fly range search in Age index, follow pointers to SSN index

  27. SELECT E.Name FROM Emp E WHERE E.SSN<5000 AND E.Age>50 Query Optimization  Estimate Costs  sample input • Assume for table E: • Schema = (SSN: int, Name: string, Age: int, Salary: int) • T(E) = 100 tuples. • For attribute SSN: • V(E, SSN)=100, min(E, SSN)=0000, max(E, SSN)=9999 • For attribute Age: • V(E, Age)=20, min(E, Age)=21, max(E, Age)=60 • Primary index on SSN: 3 level B+-tree, 50 leaf nodes. • Secondary index on Age: 2 level B+-tree, 10 leaf nodes, every leaf entry points to 3.5 pageIDs (on average). • Assumptions: all B+-tree roots are pinned. Can reach the first leaf page of a B+-tree directly. • Memory buffer size: 2 pages.

  28. SELECT E.Name FROM Emp E WHERE E.SSN<5000 AND E.Age>50 E.Name E.SSN<5000 AND E.Age>50 Emp E Query Optimization  Estimate Costs • Cost = 50. (The primary index has 50 leaf nodes. Assume we can reach the first leaf page of a B+-tree directly.) Plan 1 (always work!) on-the-fly scan

  29. SELECT E.Name FROM Emp E WHERE E.SSN<5000 AND E.Age>50 E.Name E.SSN<5000 AND E.Age>50 Emp E Query Optimization  Estimate Costs • Cost = 25. SSN<5000 selects half of the employees, so 50/2=25 leaf nodes. • Note: if condition is E.SSN>5000, needs 1 more I/O. Plan 2 on-the-fly range search in SSN index

  30. SELECT E.Name FROM Emp E WHERE E.SSN<5000 AND E.Age>50 #I/Os in the SSN index #I/Os in the Age index E.Name E.SSN<5000 AND E.Age>50 Emp E Query Optimization  Estimate Costs • Cost = 10/4 +  20/4 * 3.5 = 21. Plan 3 on-the-fly range search in Age index, follow pointers to SSN index

  31. SELECT E.Name FROM Emp E WHERE E.SSN<5000 AND E.Age>50 Pick! Query Optimization  Pick Best

  32. SELECT E.Name, D.Dname FROM Emp E, Dept D WHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000 E.Name, D.Dname E.Did=D.Did AND E.SSN<5000 AND D.budget=1000  Emp E Dept D Query Compiling  Convert • Algorithm:  then  then . • E.Name. D.Dname(E.Did=D.Did AND E.SSN<5000 AND D.budget=1000(ED) ) • The l.q.p tree:

  33. SELECT E.Name, D.Dname FROM Emp E, Dept D WHERE E.Did=D.Did AND E.SSN<5000 AND D.budget=1000 Query Compiling  Apply Laws • Always always: (try to) replace  with ! • Also, push  down. E.Name, D.Dname E.SSN<5000 D.budget=1000 Emp E Dept D

  34. Four Join Algorithms • Iteration join (nested loop join) • Merge join • Hash join • Join with index

  35. Example E D over common attribute Did • E: • T(E)=10,000 • primary index on SSN, 3 levels. • |E|= 1,000 leaf nodes. • D: • T(D)=5,000 • primary index on Did. 3 levels. • |D| = 500 leaf nodes. • Memory available = 101 blocks

  36. Iteration Join Use our memory (1) Read 100 blocks of D (2) Read all of E (using 1 block) + join (3) Repeat until done • I/O cost = |D| + |D|/100 * |E| = • 500 + 5*1000 = 5,500.

  37. Merge Join Cost • Recall that |E|=1000, |D|=500. And |D| is already sorted on Did. • External sort E: pass 0, by reading and writing E, produces a file with 10 sorted runs. Another read is enough. • No need to write! Can pipeline to join operator. • Cost = 3*1000 + 500 = 3,500.

  38. Simple example hash: even/odd R1 R2 Buckets 2 5 Even 4 4 R1 R2 3 12 Odd: 5 3 8 13 9 8 11 14 2 4 8 4 12 8 14 3 5 9 5 3 13 11

  39. Hash Join Cost • Read + write both E and D for partitioning, then read to join. • Cost = 3 * (1000 + 500) = 4,500.

  40. Join with index (Conceptually) For each r  E do Find the corresponding D tuple by probing index. • Assuming the root is pinned in memory, Cost = |E| + T(E)*2 = 1000 + 10,000*2 = 21,000.

  41. What we learned • Storage (Chp 8 & 9) • B+-tree (Chp 10) • Hash index (Chp 11) • External sort (Chp 13) • Query evaluation • Concurrency control (Chp 17) • Recovery (Chp 18) • Project: simple DBMS implentation, NEUStore • SB-tree, R-tree

  42. ACID Properties of Transactions • Atomicity: all actions are carried out, or none. • Consistency: each transaction preserves the consistency of the database if executed by itself. (The users make sure of this. E.g. transfer money…) • Isolation: transactions are isolated from the effect of concurrently scheduling other transactions. • Durability: the effect of a committed transaction should last, even if system crash before all changed are flushed to disk.

  43. Scheduling Transactions • Serial schedule: Schedule that does not interleave the actions of different transactions. • Equivalent schedules:For any database state, the effect (on the set of objects in the database) of executing the first schedule is identical to the effect of executing the second schedule. • Serializable schedule: A schedule that is equivalent to some serial execution of the transactions. (Note: If each transaction preserves consistency, every serializable schedule preserves consistency. )

  44. Example • Conflict serializable  serializable. • Serializable  conflict serializable T1: R(A) W(A), Commit T2: W(A), Commit T3: W(A), Commit T1 T2 • precedence graph: • a node for each transaction • an arc from Ti to Tj if an • action in Ti precedes and • conflicts with an action in • Tj. T3

  45. Deadlock Prevention • Assign priorities based on timestamps. Assume Ti wants a lock that Tj holds. Two policies are possible: • Wait-Die: It Ti has higher priority, Ti waits for Tj; otherwise Ti aborts • Wound-wait: If Ti has higher priority, Tj aborts; otherwise Ti waits • If a transaction re-starts, make sure it has its original timestamp

  46. Database Tables Pages Tuples Multiple-Granularity Locks • Why? If a transaction needs to scan all records in a table, do we really want to have a lock on all tuples individually? Significant locking overhead! • Put a single lock on the table! A lock on a node implicitly locks all decendents. contains

  47. Allowed Sharings IS IX S X -- Ö Ö Ö Ö Ö -- IS Ö Ö Ö Ö IX Ö Ö Ö Ö S Ö Ö Ö X

  48. Tree Locking Algorithm • Search: Start at root and go down; repeatedly, S lock child then unlock parent. • Insert/Delete: Start at root and go down, obtaining X locks as needed. Once child is locked, check if it is safe: • If child is safe, release all locks on ancestors. • Safe node: Node such that changes will not propagate up beyond this node. • Inserts: Node is not full. • Deletes: Node is not half-empty.

  49. ROOT Do: 1) Search 38* 2) Delete 38* 3) Insert 45* 4) Insert 25* A Example 20 B 35 F C 23 38 44 H G I D E 20* 22* 23* 24* 35* 36* 38* 41* 44*

  50. Kung-Robinson Model • Xacts have three phases: • READ: Xacts read from the database, but make changes to private copies of objects. • VALIDATE: Check for conflicts. • WRITE: Make local copies of changes public. old modified objects ROOT new

More Related