carnegie mellon univ dept of computer science 15 415 database applications n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications PowerPoint Presentation
Download Presentation
Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Loading in 2 Seconds...

play fullscreen
1 / 57

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications - PowerPoint PPT Presentation


  • 80 Views
  • Uploaded on

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications. C. Faloutsos Indexing and Hashing – part II. General Overview - rel. model. Relational model - SQL Formal & commercial query languages Functional Dependencies Normalization Physical Design Indexing.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications' - zulema


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
carnegie mellon univ dept of computer science 15 415 database applications

Carnegie Mellon Univ.Dept. of Computer Science15-415 - Database Applications

C. Faloutsos

Indexing and Hashing – part II

general overview rel model
General Overview - rel. model
  • Relational model - SQL
    • Formal & commercial query languages
  • Functional Dependencies
  • Normalization
  • Physical Design
  • Indexing

15-415 - C. Faloutsos

indexing overview
Indexing- overview
  • ISAM and B-trees
  • hashing
  • Hashing vs B-trees
  • Indices in SQL
  • Advanced topics:
    • dynamic hashing
    • multi-attribute indexing

15-415 - C. Faloutsos

static hashing
(Static) Hashing

Problem: “find EMP record with ssn=123”

What if disk space was free, and time was at premium?

15-415 - C. Faloutsos

hashing
Hashing

A: Brilliant idea: key-to-address transformation:

#0 page

123; Smith; Main str

#123 page

#999,999,999

15-415 - C. Faloutsos

hashing1

#0 page

123; Smith; Main str

#123 page

#999,999,999

Hashing

Since space is NOT free:

  • use M, instead of 999,999,999 slots
  • hash function: h(key) = slot-id

15-415 - C. Faloutsos

hashing2

#0 page

123; Smith; Main str

#h(123)

M

Hashing

Typically: each hash bucket is a page, holding many records:

15-415 - C. Faloutsos

hashing3

#0 page

123; Smith; Main str.

#h(123)

M

Hashing

Notice: could have clustering, or non-clustering versions:

15-415 - C. Faloutsos

hashing4

EMP file

...

#0 page

...

234; Johnson; Forbes ave

123

#h(123)

123; Smith; Main str.

...

M

345; Tompson; Fifth ave

...

Hashing

Notice: could have clustering, or non-clustering versions:

15-415 - C. Faloutsos

indexing overview1
Indexing- overview
  • ISAM and B-trees
  • hashing
    • hashing functions
    • size of hash table
    • collision resolution
  • Hashing vs B-trees
  • Indices in SQL
  • Advanced topics:

15-415 - C. Faloutsos

design decisions
Design decisions

1) formula h() for hashing function

2) size of hash table M

3) collision resolution method

15-415 - C. Faloutsos

design decisions functions
Design decisions - functions
  • Goal: uniform spread of keys over hash buckets
  • Popular choices:
    • Division hashing
    • Multiplication hashing

15-415 - C. Faloutsos

division hashing
Division hashing

h(x) = (a*x+b) mod M

  • eg., h(ssn) = (ssn) mod 1,000
    • gives the last three digits of ssn
  • M: size of hash table - choose a prime number, defensively (why?)

15-415 - C. Faloutsos

division hashing1
eg., M=2; hash on driver-license number (dln), where last digit is ‘gender’ (0/1 = M/F)

in an army unit with predominantly male soldiers

Thus: avoid cases where M and keys have common divisors - prime M guards against that!

Division hashing

15-415 - C. Faloutsos

multiplication hashing
h(x) = [ fractional-part-of ( x * φ ) ] * M

φ: golden ratio ( 0.618... = ( sqrt(5)-1)/2 )

in general, we need an irrational number

advantage: M need not be a prime number

but φ must be irrational

Multiplication hashing

15-415 - C. Faloutsos

other hashing functions
quadratic hashing (bad)

...

conclusion: use division hashing

Other hashing functions

15-415 - C. Faloutsos

design decisions1
Design decisions

1) formula h() for hashing function

2) size of hash table M

3) collision resolution method

15-415 - C. Faloutsos

size of hash table
eg., 50,000 employees, 10 employee-records / page

Q: M=?? pages/buckets/slots

Size of hash table

15-415 - C. Faloutsos

size of hash table1
eg., 50,000 employees, 10 employees/page

Q: M=?? pages/buckets/slots

A: utilization ~ 90% and

M: prime number

Eg., in our case: M= closest prime to50,000/10 / 0.9 = 5,555

Size of hash table

15-415 - C. Faloutsos

design decisions2
Design decisions

1) formula h() for hashing function

2) size of hash table M

3) collision resolution method

15-415 - C. Faloutsos

collision resolution
Collision resolution
  • Q: what is a ‘collision’?
  • A: ??

15-415 - C. Faloutsos

collision resolution1
Collision resolution

#0 page

FULL

#h(123)

123; Smith; Main str.

M

15-415 - C. Faloutsos

collision resolution2
Collision resolution
  • Q: what is a ‘collision’?
  • A: ??
  • Q: why worry about collisions/overflows? (recall that buckets are ~90% full)
  • A: ‘birthday paradox’

15-415 - C. Faloutsos

collision resolution3
Collision resolution
  • open addressing
    • linear probing (ie., put to next slot/bucket)
    • re-hashing
  • separate chaining (ie., put links to overflow pages)

15-415 - C. Faloutsos

collision resolution4
Collision resolution

linear probing:

#0 page

FULL

#h(123)

123; Smith; Main str.

M

15-415 - C. Faloutsos

collision resolution5
Collision resolution

re-hashing

#0 page

h1()

FULL

#h(123)

123; Smith; Main str.

h2()

M

15-415 - C. Faloutsos

collision resolution6
Collision resolution

separate chaining

FULL

123; Smith; Main str.

15-415 - C. Faloutsos

design decisions conclusions
Design decisions - conclusions
  • function: division hashing
    • h(x) = ( a*x+b ) mod M
  • size M: ~90% util.; prime number.
  • collision resolution: separate chaining
    • easier to implement (deletions!);
    • no danger of becoming full

15-415 - C. Faloutsos

indexing overview2
Indexing- overview
  • ISAM and B-trees
  • hashing
  • Hashing vs B-trees
  • Indices in SQL
  • Advanced topics:
    • dynamic hashing
    • multi-attribute indexing

15-415 - C. Faloutsos

hashing vs b trees
Hashing vs B-trees:

Hashing offers

  • speed ! ( O(1) avg. search time)

..but:

15-415 - C. Faloutsos

hashing vs b trees1
Hashing vs B-trees:

..but B-trees give:

  • key ordering:
    • range queries
    • proximity queries
    • sequential scan
  • O(log(N)) guarantees for search, ins./del.
  • graceful growing/shrinking

15-415 - C. Faloutsos

hashing vs b trees2
Hashing vs B-trees:

thus:

  • B-trees are implemented in most systems

footnotes:

  • hashing is not (why not?)
  • ‘dbm’ and ‘ndbm’ of UNIX: offer one or both

15-415 - C. Faloutsos

indexing overview3
Indexing- overview
  • ISAM and B-trees
  • hashing
  • Hashing vs B-trees
  • Indices in SQL
  • Advanced topics:
    • dynamic hashing
    • multi-attribute indexing

15-415 - C. Faloutsos

indexing in sql
Indexing in SQL
  • create index <index-name> on <relation-name> (<attribute-list>)
  • create unique index <index-name> on <relation-name> (<attribute-list>)
  • drop index <index-name>

15-415 - C. Faloutsos

indexing in sql1
Indexing in SQL
  • eg.,

create index ssn-index

on STUDENT (ssn)

  • or (eg., on TAKES(ssn,cid, grade) ):

create index sc-index

on TAKES (ssn, c-id)

15-415 - C. Faloutsos

indexing overview4
Indexing- overview
  • ISAM and B-trees
  • hashing
  • Hashing vs B-trees
  • Indices in SQL
  • Advanced topics: (theoretical interest)
    • dynamic hashing
    • multi-attribute indexing

15-415 - C. Faloutsos

problem with static hashing
Problem with static hashing
  • problem: overflow?
  • problem: underflow? (underutilization)

15-415 - C. Faloutsos

solution dynamic extendible hashing
Solution: Dynamic/extendible hashing
  • idea: shrink / expand hash table on demand..
  • ..dynamic hashing

Details: how to grow gracefully, on overflow?

Many solutions - One of them: ‘extendible hashing’

15-415 - C. Faloutsos

extendible hashing
Extendible hashing

#0 page

FULL

#h(123)

123; Smith; Main str.

M

15-415 - C. Faloutsos

extendible hashing1
Extendible hashing

#0 page

solution:

split the bucket in two

FULL

#h(123)

123; Smith; Main str.

M

15-415 - C. Faloutsos

extendible hashing2
in detail:

keep a directory, with ptrs to hash-buckets

Q: how to divide contents of bucket in two?

A: hash each key into a very long bit string; keep only as many bits as needed

Eventually:

Extendible hashing

15-415 - C. Faloutsos

extendible hashing3
Extendible hashing

directory

0001...

0111...

00...

01...

10101...

10...

10011...

10110...

11...

1101...

101001...

15-415 - C. Faloutsos

extendible hashing4
Extendible hashing

directory

0001...

0111...

00...

01...

10101...

10...

10011...

10110...

11...

1101...

101001...

15-415 - C. Faloutsos

extendible hashing5
Extendible hashing

directory

0001...

0111...

00...

01...

10101...

10...

split on 3-rd bit

10011...

10110...

11...

101001...

1101...

15-415 - C. Faloutsos

extendible hashing6
Extendible hashing

directory

0001...

0111...

00...

01...

new page / bucket

10...

10011...

10101...

11...

101001...

10110...

1101...

15-415 - C. Faloutsos

extendible hashing7

0001...

0111...

000...

001...

010...

10011...

10101...

011...

101001...

10110...

100...

1101...

101...

110...

111...

Extendible hashing

directory (doubled)

new page / bucket

15-415 - C. Faloutsos

extendible hashing8

0001...

0111...

000...

00...

001...

01...

10101...

010...

10...

10101...

10011...

10110...

101001...

11...

011...

100...

101001...

10110...

101...

1101...

110...

111...

Extendible hashing

BEFORE

AFTER

0001...

0111...

10011...

1101...

15-415 - C. Faloutsos

extendible hashing9
Extendible hashing
  • Summary: directory doubles on demand
  • or halves, on shrinking files
  • needs ‘local’ and ‘global’ depth (see book)
  • Mainly, of theoretical interest - same for
    • ‘linear hashing’ of Litwin
    • ‘order preserving’
    • ‘perfect hashing’ (no collisions!)

15-415 - C. Faloutsos

indexing overview5
Indexing- overview
  • ISAM and B-trees
  • hashing
  • Hashing vs B-trees
  • Indices in SQL
  • Advanced topics:
    • dynamic hashing
    • multi-attribute indexing

15-415 - C. Faloutsos

multiple key access
multiple-key access
  • how to support queries on multiple attributes, like
    • grade>=3 and course=‘415’
  • major motivation: Geographic Information systems (GIS)

15-415 - C. Faloutsos

multiple key access1
multiple-key access

y

x

15-415 - C. Faloutsos

multiple key access2
multiple-key access

Typical query:

  • Find cities within x miles from Pittsburgh

thus, we want to store nearby cities on the same disk page:

15-415 - C. Faloutsos

multiple key access3
multiple-key access

y

x

15-415 - C. Faloutsos

multiple key access4
multiple-key access

y

x

15-415 - C. Faloutsos

multiple key access r trees
multiple-key access - R-trees

y

x

15-415 - C. Faloutsos

multiple key access r trees1
multiple-key access - R-trees
  • R-trees: very successful for GIS
  • (along with ‘z-ordering’)
  • more details: at ‘advanced topics’, later
  • even more details: in 15-826

15-415 - C. Faloutsos

indexing overview6
Indexing- overview

industry workhorse

  • ISAM and B-trees
  • hashing
  • Hashing vs B-trees
  • Indices in SQL
  • Advanced topics:
    • dynamic hashing
    • multi-attribute indexing

15-415 - C. Faloutsos