CSCE 520 Test 2 Info Indexing

CSCE 520 Test 2 InfoIndexing Modified from slides of Hector Garcia-Molina and Jeff Ullman

Physical Storage Media • Speed of data access • Cost per unit of data • Reliability • Data loss (power failure or system crash) • Physical failure (storage device) • Storage types • Volatile storage • Non-volatile storage

Memory Hierarchy Programs, Main Memory DBMS Tertiary Storage DBMS Disk Virtual Memory File System Main Memory Cache

Disk Access Characteristics • Move data to main memory: • Position head on cylinder • Find and access sector • Steps of reading a block: • Processor and disk controller processes the request • Seek time: position the head • Rotation latency: rotate the sector under the head • Transfer time: sector/block read by the head

Disk Access Characteristics • Steps of writing a block: • Read the block into the main memory • Change main memory copy of block • Write new content back on disk • Verify correctness of write

How to find records efficiently? • Primary key – sequential organization • Search key? • High I/O cost  INDEXING

Cost of Indexing • Where the time spent on answering a query • Fast: processing in memory • Slow: fetching from secondary storage • Cost of indexing: • Index on several attributes: fast retrieval but slow writes (maintain index structure)

Topics • Conventional indexes • B-trees • Hashing schemes (read only)

10 30 50 70 90 20 40 60 80 100 Sequential File

70 50 30 10 110 90 20 40 100 80 120 60 70 50 30 10 90 80 60 40 20 100 Sequential File Dense Index

130 90 50 10 210 170 30 70 190 150 230 110 70 50 30 10 90 80 60 40 20 100 Sequential File Sparse Index

210 50 130 10 170 490 90 170 330 10 110 30 570 70 410 150 190 90 230 250 30 90 70 50 10 100 80 60 40 20 Sequential File Sparse 2nd level

Sparse vs. Dense Tradeoff • Sparse: Less index space per record can keep more of index in memory • Dense: Can tell if any record exists without accessing file

Terms • Index sequential file • Search key ( primary key) • Primary index (on Sequencing field) • Secondary index • Dense index (all Search Key values in) • Sparse index • Multi-level index

Next: • Duplicate keys • Deletion/Insertion • Secondary indexes

10 10 20 30 40 10 20 30 30 45 Duplicate keys

10 40 10 20 30 30 40 20 10 10 45 20 30 10 30 30 30 45 10 20 Duplicate keys Dense index, one way to implement? 10 10 10 10 10 10 20 20 20 20 30 30 30 30 30 30

10 10 20 30 40 10 20 30 30 45 Duplicate keys Dense index, better way? 10 20 30 40

10 40 10 30 20 30 30 20 10 45 careful if looking for 20 or 30! Duplicate keys Sparse index, one way? 10 10 20 30

10 40 10 30 20 30 30 20 10 45 should this be 40? Duplicate keys Sparse index, another way? • place first new key from block 10 20 30 30

Duplicate values, primary index Summary • Index may point to first instance of each value only File Index a a a . . b

70 10 30 50 20 40 60 80 Deletion from sparse index 10 30 50 70 90 110 130 150

70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150

10 30 50 70 20 40 60 80 40 40 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150

10 30 50 70 20 40 60 80 50 70 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150

70 10 30 50 20 40 60 80 Deletion from dense index 10 20 30 40 50 60 70 80

10 70 50 20 60 80 40 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

10 30 40 60 20 50 Insertion, sparse index case 10 30 40 60

10 30 40 60 20 50 34 • our lucky day! • we have free space • where we need it! Insertion, sparse index case • insert record 34 10 30 40 60

10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60 • Illustrated: Immediate reorganization • Variation: • insert new block (chained file) • update index

10 30 40 60 25 20 50 overflow blocks (reorganize later...) Insertion, sparse index case • insert record 25 10 30 40 60

Insertion, dense index case • Similar • Often more expensive . . .

Summary so far • Conventional index • Basic Ideas: sparse, dense, multi-level… • Duplicate Keys • Deletion/Insertion • Secondary indexes

Conventional indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance

NEXT: Another type of index • Give up on sequentiality of index • Try to get “balance”

B+Tree Example n=3 100 Root 120 150 180 30 3 5 11 120 130 180 200 100 101 110 150 156 179 30 35

Sample non-leaf 57 81 95 to keys to keys to keys to keys < 57 57 k<81 81k<95 95

Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 To record with key 57 To record with key 81 To record with key 85

Size of nodes: n+1 pointers n keys (fixed)

Don’t want nodes to be too empty • Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data

n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null

B+tree rules tree of order n (1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records except for “sequence pointer”

(3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2- 1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1

Insert into B+tree (read only) (a) simple case • space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root

32 n=3 100 (a) Insert key = 32 30 3 5 11 30 31

7 3 5 7 n=3 100 (a) Insert key = 7 30 3 5 11 30 31

Deletion from B+tree (a) Simple case - no example (b) Coalesce with neighbor (sibling) (c) Re-distribute keys (d) Cases (b) or (c) at non-leaf

40 n=4 (b) Coalesce with sibling • Delete 50 10 40 100 10 20 30 40 50

35 35 n=4 (c) Redistribute keys • Delete 50 10 40 100 10 20 30 35 40 50

B+tree deletions in practice • Often, coalescing is not implemented • Too hard and not worth it!

CSCE 520 Test 2 Info Indexing

CSCE 520 Test 2 Info Indexing

Presentation Transcript

Indexing

Indexing:

CSCE 181 Test Automation at Schlumberger

Indexing

USEFUL INFO PROGRESS TEST 2 ORAL EXAM

Indexing

Indexing

Indexing

Indexing

Indexing

Indexing

USA Test Prep Info

Info for the Test

Test Info Matter-Volume

BCP-520 Test Questions and Answers

Hair Test Info Sheet

Indexing

Info for the Test

Info for the Test

Indexing

Indexing

Indexing