Chapter 11: Indexing and Hashing

1 / 30

# Chapter 11: Indexing and Hashing - PowerPoint PPT Presentation

Chapter 11: Indexing and Hashing. Indexing Basic Concepts Ordered Indices B+-Tree Index Files Hashing Static Dynamic Hashing. Hashing. Static hashing Dynamic hashing. Hashing. key  h(key). <key>. Buckets (typically 1 disk block). Example hash function.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Chapter 11: Indexing and Hashing

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
1. Chapter 11: Indexing and Hashing • Indexing • Basic Concepts • Ordered Indices • B+-Tree Index Files • Hashing • Static • Dynamic Hashing

2. Hashing • Static hashing • Dynamic hashing

3. Hashing key  h(key) <key> Buckets (typically 1 disk block) . . .

4. Example hash function • Key = ‘x1 x2 … xn’ n byte character string • Have b buckets • h: add x1 + x2 + ….. xn • compute sum modulo b

5.  This may not be best function … Good hash  Expected number of function: keys/bucket is the same for all buckets

6. Within a bucket: • Do we keep keys sorted? • Yes, if CPU time critical & Inserts/Deletes not too frequent

7. d a e c b EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 0 1 2 3 h(e) = 1

8. d maybe move “g” up EXAMPLE: deletion Delete:ef 0 1 2 3 a b d c c e f g

9. Rule of thumb: • Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit • If < 50%, wasting space • If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

10. How do we cope with growth? • Overflows and reorganizations • Dynamic hashing • Extensible • Linear

11. Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(K)  use i grows over time…. 00110101

12. (b) Use directory h(K)[i ] to bucket . . . . . .

13. i = 2 00 01 10 11 1 1 2 1010 New directory 2 1100 Example: h(k) is 4 bits; 2 keys/bucket 1 i = 0001 1 1001 1100 Insert 1010

14. 2 0000 0001 2 0111 2 2 Example continued i = 2 00 01 10 11 1 0001 0111 1001 1010 Insert: 0111 0000 1100

15. i = 3 000 001 010 011 100 101 110 111 3 1001 1001 2 1001 1010 3 1010 1100 2 Example continued 0000 2 0001 i = 2 00 01 10 11 0111 2 Insert: 1001

16. Extensible hashing: deletion • No merging of blocks • Merge blocks and cut directory if possible (Reverse insert procedure)

17. Deletion example: • Run thru insert example in reverse!

18. Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - - Extensible hashing Summary Can handle growing files - with less wasted space - with no full reorganizations +

19. Advanced indexing • Multiple attributes • Bitmap indexing

20. Multiple-Key Access • Use multiple indices for certain types of queries. • Example: select account-number from account where branch-name = “Perryridge” and balance = 1000 • Possible strategies?

21. Indices on Multiple Attributes • where branch-name = “PP” andbalance = 1000 Suppose we have an index on combined search-key (branch-name, balance). BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 CC,200 DD,200 DD,300 PP,800 PP,1000 PP,1300 PP,1500 PP,1560 AB,200 AC,200 CC,200 PP,300

22. Suppose we have an index on combined search-key (branch-name, balance). • where branch-name = “PP” and balance < 1000 search pp,0 BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 CC,200 DD,200 DD,300 PP,800 PP,1000 PP,1300 PP,1500 PP,1560 AB,200 AC,200 CC,200 PP,300 search pp,1000

23. Suppose we have an index on combined search-key (branch-name, balance). NO! • where branch-name < “PP” andbalance = 1000? BB,1000 CC,200 PP,800 PP,1500 AB,200 AA,2000 AA,2300 AA,2500 CC,200 DD,200 DD,300 PP,800 PP,1000 PP,1300 PP,1500 PP,1560 AB,200 AC,200 CC,200 PP,300

24. Bitmap Indices • An index designed for multiple valued search keys

25. Bitmap Indices (Cont.) The income-level value of record 3 is L1 Bitmap(size = table size) Unique values of gender Unique values of income-level

26. Bitmap Indices (Cont.) • Some properties of bitmap indices • Number of bitmaps for each attribute? • Size of each bitmap? • When is the bitmap matrix sparse and what attributes are good for bitmap indices?

27. Bitmap Indices (Cont.) • Bitmap indices generally very small compared with relation size • E.g. if record is 100 bytes, space for a single bitmap is 1/800 of space used by relation. • If number of distinct attribute values is 8, bitmap is only 1% of relation size • What about insertion? • Deletion?

28. Bitmap Indices Queries Sample query: Males with income level L1 even faster! 10010 AND 10100 = 10000 What about the number of males with income level L1?

29. Bitmap Indices Queries • Queries are answered using bitmap operations • Intersection (and) • Union (or) • Complementation (not)