Today

Advanced Database TechnologyAnna Östlin Pagh and Rasmus PaghIT University of CopenhagenSpring 2004February 19, 2004INDEXING ILecture based on [GUW, 13.0-13.2] Slides based onNotes 04: Indexingfor Stanford CS 245, fall 2002by Hector Garcia-Molina

Today • Why indexing? • Conventional indexes (dense/sparse) • Multi-level indexes • Secondary indexes • Next time: B-tree and hash indexes

Why indexing? • Common queries involve conditions on the values of attributes, e.g.SELECT * FROM R WHERE a=11SELECT * FROM R WHERE 0<=b and b<42 • indexing an attribute (or set of attributes) speeds up finding tuples with specific values. (And gives other speed-ups as well.) • Conceptually similar to index in a book.

Problem session • Consider an index data structure "similar to" the index in a book. • How many steps does it take to find the occurrences of a specific term? • What about the number of I/Os? • Does your encyclopaedia have an index?

10 20 30 40 50 60 70 80 90 100 Sequential File

10 10 20 20 30 30 40 40 50 50 60 60 70 80 70 80 90 100 90 100 110 120 Sequential File Dense Index

10 10 20 30 50 30 70 40 90 50 110 60 130 150 70 80 170 190 90 100 210 230 Sequential File Sparse Index

10 10 10 20 90 30 170 50 30 250 70 40 90 330 50 110 410 60 130 490 150 570 70 80 170 190 90 100 210 230 Sequential File Sparse 2nd level

Sparse vs. Dense Trade-off • Sparse: Less index space per record can keep more of index in memory • Dense: Can tell if any record exists without accessing file (Later: • sparse better for insertions • dense needed for secondary indexes)

Summary of terms • Index on sequential file • Search key (can be¹ primary key) • Primary index (on sequencing field) -secondary index works on other fields • Dense index (all search key values in) • Sparse index (one search key/ block) • Multi-level index (index on index)

Next: • Duplicate keys • Deletion/Insertion • Secondary indexes

10 10 10 20 20 30 30 30 40 45 Duplicate keys

Duplicate keys 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 30 30 30 30 30 30 30 30 30 30 30 30 40 40 45 45 Dense index, one way to implement?

Duplicate keys Dense index, better way? 10 10 10 20 10 30 20 40 20 30 30 30 40 45 What assumption is made here?

Duplicate keys Sparse index, one way? 40 10 10 30 20 10 20 30 45 30 careful if looking for 20 or 30! 10 10 20 30

Duplicate keys Sparse index, another way? 40 10 10 30 20 10 20 30 45 30 should this be 40? • place first new key from block 10 20 30 30

70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150

10 30 50 70 20 40 60 80 40 40 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150

70 10 30 50 20 40 60 80 50 70 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150

70 10 50 20 80 60 40 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

Insertion (sparse index case) 10 30 40 60 20 50 34 • our lucky day! • we have free space • where we need it! • insert record 34 10 30 40 60

Insertion - using overflow blocks 10 30 40 60 25 20 50 overflow blocks (reorganize later...) • insert record 25 10 30 40 60 Problem: Overflow blocks take longer to access

Insertion - immediate reorganization 10 30 40 60 20 50 15 20 20 30 • insert record 15 10 30 40 60 • In general: Use same technique as for inserting in linked list.

Secondary indexes 90 100 30 80 20 40 70 50 60 10 30 90 ... 20 80 100 does not make sense! Sequence field • Sparse index

Secondary indexes 10 30 20 50 30 10 40 20 50 70 90 50 ... 80 60 40 sparse high level 70 ... 100 10 90 60 Sequence field • Dense index

20 10 10 10 10 20 20 40 20 10 30 40 40 40 10 40 40 40 30 ... 40 Duplicate values & secondary indexes one option... Problem: Uses more space than necessary

20 10 10 20 20 30 40 40 10 50 40 60 ... 10 40 30 40 Duplicate values & secondary indexes buckets

Summary • Indexes allow finding a particular attribute value in a few I/Os. • Unresolved problems regarding insertions and deletions. • Next time: Also obtaining efficient updates (using B-trees). Hash indexes - sometimes more efficient.

Today

Today

Presentation Transcript

Today

Today…..

Today

Today

Today

Today

TODAY! TODAY! TODAY!

Today

Today

TODAY

Today

Today

Today

Today

TODAY

Today…

Today

TODAY

Today:

Today

Today

Today