CS4432: Database Systems II

CS4432: Database Systems II Lecture #8 (Basic indexing) Professor Elke A. Rundensteiner lecture #9

Indexing (Chapter 14 ) Indexing : helps to retrieve data quicker for certain queries value= 1,000,000 Select * FROM Emp WHERE salary = 1,000,000; record ? value lecture #9

Topics • Sequential Index Files • Secondary Indexes lecture #9

10 30 50 70 90 20 40 60 80 100 Sequential File lecture #9

110 90 70 50 30 10 80 60 40 20 120 100 10 30 90 70 50 80 60 40 20 100 Sequential File Dense Index Every record is in index. lecture #9

210 170 130 90 50 10 150 110 70 30 230 190 10 30 90 70 50 80 60 40 20 100 Sequential File Sparse Index Only first record per block in index. lecture #9

Sequential File Sparse 2nd level 330 170 10 90 130 10 50 490 210 170 70 110 30 570 230 410 150 250 90 190 90 70 50 30 10 100 80 60 40 20 lecture #9

Note : DATA FILE or INDEX can be both “ordered files”. Question: How would we lay them out on disk ? - contiguous layout on disk ? - block-chained layout on disk ? lecture #9

50 170 10 210 170 130 90 10 330 490 190 410 230 30 90 570 70 110 150 250 90 70 30 10 50 60 20 40 100 80 Questions: • Do we want to build a dense 2nd-level index for a dense index? • Can we even do this ? 1st level? Sequential File 2nd level? lecture #9

Notes on pointers: • Block pointer (used in sparse index) can be smaller than record pointer (used in dense index) BP RP lecture #9

R2 R1 R3 R4 Note : If file is contiguous, then we can omit pointers K1 say: 1024 B per block K2 K3 K4 • if we want K3 block: • get it at offset (3-1)*1024 = 2048 bytes lecture #9

Sparse vs. Dense Tradeoff • Sparse: Less index space per record can keep more of index in memory (Later: sparse better for insertions) • Dense: Can tell if any record exists without accessing file (Later: dense needed for secondary indexes) lecture #9

Terms • Index sequential file • Search key ( primary key) • Primary index (on sequencing field) • Secondary index • Dense index (contains all search key values) • Sparse index • Multi-level index lecture #9

Next: • Duplicate keys • Deletion/Insertion • Secondary indexes lecture #9

10 10 20 30 40 10 20 30 30 45 Duplicate keys lecture #9

40 10 30 10 20 20 30 10 40 10 10 20 30 30 10 30 20 30 45 45 10 10 10 10 10 10 20 20 20 20 30 30 30 30 30 30 Duplicate keys Dense index ! Point to each value ! lecture #9

30 10 10 20 40 45 30 30 10 20 10 20 30 40 Duplicate keys Dense index. Point to each distinct value! lecture #9

30 10 10 20 40 45 30 30 10 20 careful if looking for 20 or 30! Duplicate keys Sparse index: point to start of block ! 10 10 20 30 lecture #9

10 10 20 30 40 30 20 30 45 10 should this be 40? Duplicate keys Sparse index, another way ? • place first new key from block 10 20 30 30 lecture #9

Duplicate values, primary index Summary • Index may point to first instance of each value only File Index a a a . . b lecture #9

10 30 50 70 80 20 40 60 Deletion from sparse index 10 30 50 70 90 110 130 150 lecture #9

70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150 lecture #9

10 30 50 70 20 40 60 80 40 40 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150 lecture #9

10 30 50 70 20 40 60 80 50 70 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150 lecture #9

10 30 50 70 80 20 40 60 Deletion from dense index 10 20 30 40 50 60 70 80 lecture #9

10 50 70 20 60 80 40 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80 lecture #9

10 30 40 60 20 50 Insertion, sparse index case 10 30 40 60 lecture #9

10 30 40 60 20 50 34 • our lucky day! • we have free space • where we need it! Insertion, sparse index case • insert record 34 10 30 40 60 lecture #9

10 30 40 60 20 50 15 20 20 30 Insertion, sparse index case • insert record 15 10 30 40 60 • Immediate reorganization • Other variations? lecture #9

Just Illustrated: • Immediate reorganization • Now Variation: • insert new block (chained file) • otherwise leave data file • update index only lecture #9

10 30 40 60 25 20 50 Insertion, sparse index case • insert record 25 10 30 overflow blocks (reorganize later...) 40 60 lecture #9

Insertion, dense index case • Similar • Often more expensive . . . lecture #9

30 20 80 100 90 50 70 40 10 60 Secondary indexes Sequence field Can I make a secondary index sparse ? lecture #9

30 20 80 100 90 40 70 60 10 50 30 90 20 ... 80 100 Secondary indexes Sequence field • Sparse index lecture #9

100 90 80 30 20 60 40 10 50 70 90 30 ... 20 80 100 Secondary indexes Sequence field • Sparse index ? lecture #9

100 90 80 30 20 60 40 10 50 70 90 30 ... 20 80 100 Secondary indexes Sequence field • Sparse index does not make sense! lecture #9

90 100 20 30 80 70 40 10 60 50 10 10 50 60 20 50 70 90 30 ... ... 40 sparse high level allowed? Secondary indexes Sequence field • Must be dense index ! lecture #9

Also: Pointers are record pointers (not block pointers; nor off-sets) Reminder : With secondary indexes: • Lowest level is dense • Other levels are sparse lecture #9

20 20 10 10 30 10 40 40 40 40 Duplicate values & secondary indexes lecture #9

30 20 20 10 10 10 40 40 40 40 20 10 40 10 40 30 10 40 ... 20 40 Duplicate values & secondary indexes one option... • Problem: • excess overhead! • disk space • search time lecture #9

20 20 10 10 30 40 40 40 40 10 20 40 30 Duplicate values & secondary indexes another option... 10 Problem: variable size records in index! lecture #9

30 20 20 10 10 10 40 40 40 40 50 10 60 20 30 ... 40 Duplicate values & secondary indexes   Another idea :Chain records with same key !   • Problems: • Need to add fields to data records for each index • Need to follow chain to know records lecture #9

Summary : Indexing Basics • Basic Ideas: sparse, dense, multi-level… • Duplicate Keys • Deletion/Insertion • Secondary Indexes lecture #9

CS4432: Database Systems II