1 / 28

Today

Advanced Database Technology Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Spring 2004 February 19, 2004 INDEXING I Lecture based on [GUW, 13.0-13.2] Slides based on Notes 04: Indexing for Stanford CS 245, fall 2002 by Hector Garcia-Molina. Today. Why indexing?

tadita
Download Presentation

Today

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Database TechnologyAnna Östlin Pagh and Rasmus PaghIT University of CopenhagenSpring 2004February 19, 2004INDEXING ILecture based on [GUW, 13.0-13.2] Slides based onNotes 04: Indexingfor Stanford CS 245, fall 2002by Hector Garcia-Molina

  2. Today • Why indexing? • Conventional indexes (dense/sparse) • Multi-level indexes • Secondary indexes • Next time: B-tree and hash indexes

  3. Why indexing? • Common queries involve conditions on the values of attributes, e.g.SELECT * FROM R WHERE a=11SELECT * FROM R WHERE 0<=b and b<42 • indexing an attribute (or set of attributes) speeds up finding tuples with specific values. (And gives other speed-ups as well.) • Conceptually similar to index in a book.

  4. Problem session • Consider an index data structure "similar to" the index in a book. • How many steps does it take to find the occurrences of a specific term? • What about the number of I/Os? • Does your encyclopaedia have an index?

  5. 10 20 30 40 50 60 70 80 90 100 Sequential File

  6. 10 10 20 20 30 30 40 40 50 50 60 60 70 80 70 80 90 100 90 100 110 120 Sequential File Dense Index

  7. 10 10 20 30 50 30 70 40 90 50 110 60 130 150 70 80 170 190 90 100 210 230 Sequential File Sparse Index

  8. 10 10 10 20 90 30 170 50 30 250 70 40 90 330 50 110 410 60 130 490 150 570 70 80 170 190 90 100 210 230 Sequential File Sparse 2nd level

  9. Sparse vs. Dense Trade-off • Sparse: Less index space per record can keep more of index in memory • Dense: Can tell if any record exists without accessing file (Later: • sparse better for insertions • dense needed for secondary indexes)

  10. Summary of terms • Index on sequential file • Search key (can be¹ primary key) • Primary index (on sequencing field) -secondary index works on other fields • Dense index (all search key values in) • Sparse index (one search key/ block) • Multi-level index (index on index)

  11. Next: • Duplicate keys • Deletion/Insertion • Secondary indexes

  12. 10 10 10 20 20 30 30 30 40 45 Duplicate keys

  13. Duplicate keys 10 10 10 10 10 10 10 10 10 10 10 10 20 20 20 20 20 20 20 20 30 30 30 30 30 30 30 30 30 30 30 30 40 40 45 45 Dense index, one way to implement?

  14. Duplicate keys Dense index, better way? 10 10 10 20 10 30 20 40 20 30 30 30 40 45 What assumption is made here?

  15. Duplicate keys Sparse index, one way? 40 10 10 30 20 10 20 30 45 30 careful if looking for 20 or 30! 10 10 20 30

  16. Duplicate keys Sparse index, another way? 40 10 10 30 20 10 20 30 45 30 should this be 40? • place first new key from block 10 20 30 30

  17. 70 10 30 50 20 40 60 80 Deletion from sparse index • delete record 40 10 30 50 70 90 110 130 150

  18. 10 30 50 70 20 40 60 80 40 40 Deletion from sparse index • delete record 30 10 30 50 70 90 110 130 150

  19. 70 10 30 50 20 40 60 80 50 70 Deletion from sparse index • delete records 30 & 40 10 30 50 70 90 110 130 150

  20. 70 10 50 20 80 60 40 40 Deletion from dense index • delete record 30 10 20 30 30 40 40 50 60 70 80

  21. Insertion (sparse index case) 10 30 40 60 20 50 34 • our lucky day! • we have free space • where we need it! • insert record 34 10 30 40 60

  22. Insertion - using overflow blocks 10 30 40 60 25 20 50 overflow blocks (reorganize later...) • insert record 25 10 30 40 60 Problem: Overflow blocks take longer to access

  23. Insertion - immediate reorganization 10 30 40 60 20 50 15 20 20 30 • insert record 15 10 30 40 60 • In general: Use same technique as for inserting in linked list.

  24. Secondary indexes 90 100 30 80 20 40 70 50 60 10 30 90 ... 20 80 100 does not make sense! Sequence field • Sparse index

  25. Secondary indexes 10 30 20 50 30 10 40 20 50 70 90 50 ... 80 60 40 sparse high level 70 ... 100 10 90 60 Sequence field • Dense index

  26. 20 10 10 10 10 20 20 40 20 10 30 40 40 40 10 40 40 40 30 ... 40 Duplicate values & secondary indexes one option... Problem: Uses more space than necessary

  27. 20 10 10 20 20 30 40 40 10 50 40 60 ... 10 40 30 40 Duplicate values & secondary indexes buckets

  28. Summary • Indexes allow finding a particular attribute value in a few I/Os. • Unresolved problems regarding insertions and deletions. • Next time: Also obtaining efficient updates (using B-trees). Hash indexes - sometimes more efficient.

More Related