Cosc 2007 data structures ii
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

COSC 2007 Data Structures II PowerPoint PPT Presentation


  • 57 Views
  • Uploaded on
  • Presentation posted in: General

COSC 2007 Data Structures II. Chapter 13 Advanced Implementation of Tables III. Topics. Hashing Definition Hash function Key Hash value collision Open hashing. Common Problem. A common pattern in many programs is to store and look up data Find student record, given ID#

Download Presentation

COSC 2007 Data Structures II

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cosc 2007 data structures ii

COSC 2007 Data Structures II

Chapter 13

Advanced Implementation of Tables III


Topics

Topics

  • Hashing

    • Definition

      • Hash function

      • Key

      • Hash value

      • collision

  • Open hashing


Common problem

Common Problem

  • A common pattern in many programs is to store and look up data

    • Find student record, given ID#

    • Find person address, given phone #

  • Because it is so common, many data structures for it have been investigated

    • How?


Phone number problem

Phone Number Problem

  • Problem: phone company wants to implement caller ID.

    • given a phone number (the key), look up person’s name or address(the data)

    • lots of phone numbers (P=107-1) in a given area code

    • only a small fraction of them are in use

      • Nobody has a phone number :0000000 or 0000001


Comparison of time complexity average

Comparison of Time Complexity (average)

Operation Insertion Deletion Search

Unsorted ArrayO(1)O(n) O(n)

Unsorted reference O(1)O(n) O(n)

Sorted Array O(n)O(n) O(logn)

Sorted reference O(n)O(n) O(n)

BST O(logn)O(logn) O(logn)

Can we do better than O(logn)?


Can we do better than o log n

Can we do better than O(log N)?

  • All previous searching techniques require a specified amount of time (O(logn) or O(n))

  • Time usually depends on number of elements (n) stored in the table

  • In some situations searching should be almost instantaneous -- how?

    • Examples

      • 911 emergency system

      • Air-traffic control system


Can we do better than o log n1

•••

Null

Sub

Null

Null

Null

Null

Xu

Null

•••

•••

•••

000-0000

000-0001

000-0002

259-1623

263-3049

Can we do better than O(log N)?

  • Answer: Yes … sort of, if we're lucky.

  • General idea: take the key of the data record you’re inserting, and use that number directly as the item number in a list (array).

  • Search is O(1), but huge amount of space wasted. – how to solve this?


Hashing

Hashing

  • Basic idea:

    • Don't use the data value directly.

    • Given an array of size B, use a hash function, h(x), which maps the given data record x to some (hopefully) unique index (“bucket”) in the array.

0

1

h

x

h(x)

B-1


What is hash table

What is Hash Table?

  • The simplest kind of hash table is an array of records.

  • This example has 101 records.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[100]

. . .

An array of records


What is hash table1

[ 4 ]

What is Hash Table?

Number 256-2879

8888 Queen St.

Linda Kim

  • Each record has a special

    field, called its key.

  • In this example, the key

    is a long integer field

    called Number.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[100]

. . .

An array of records


What is hash table2

[ 4 ]

What is Hash Table?

Number 256-2879

  • The number is person's

    phone number,

    and the rest is

    person name or address.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[100]

. . .

An array of records


What is hash table3

Number 281942902

Number 233667136

Number 506643548

Number 155778322

What is Hash Table?

  • When a hash table is in use, some spots contain valid records, and other spots are "empty".

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[100]

. . .

An array of records


Inserting a new record

Number 281942902

Number 233667136

Number 506643548

Number 155778322

Inserting a New Record?

Number 265-1556

  • In order to insert a new record,

    the key must somehow be

    converted toan array index.

  • The index is called the

    hash valueof the key.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[100]

. . .

An array of records


Inserting a new record1

Number 281942902

Number 233667136

Number 506643548

Number 155778322

Inserting a New Record?

Number 265-1556

  • Typical way to create a hash value:

(Number mod 101)

What is (265-1556 mod 101) ?

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[100]

. . .

An array of records


Inserting a new record2

Number 281942902

Number 233667136

Number 506643548

Number 155778322

Inserting a New Record?

Number 265-1556

  • Typical way to create a hash value:

(Number mod 101)

3

What is (2651556 mod 101) ?

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[100]

. . .

An array of records


Inserting a new record3

[3]

Number 281942902

Number 233667136

Number 506643548

Number 155778322

Number 265-1556

Inserting a New Record?

  • The hash value is used for

    the location of the

    new record.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[100]

. . .

An array of records


Inserting a new record4

Number 281942902

Number 233667136

Number 580625685

Number 506643548

Number 155778322

Inserting a New Record?

  • The hash value is used for the location of the new record.

[ 0 ]

[ 1 ]

[ 2 ]

[ 3 ]

[ 4 ]

[ 5 ]

[100]

. . .

An array of records


What is hashing

What is Hashing?

  • What is hashing?

    • Each item has a unique key.

    • Use a large array called a Hash Table.

    • Use a Hash Function.

  • Hashing is like indexing in that it involves associating a key with a relative record address.

  • Hashing, however, is different from indexing in two important ways:

    • With hashing, there is no obvious connection between the key and the location.

    • With hashing two different keys may be transformed to the same address.

  • A Hash functionis a function h(K) which transforms a key K into an address.


  • What is hashing1

    0

    Array

    (Hash table)

    Search key

    Address Calculator

    (Hash function)

    N-1

    What is Hashing?

    • An address calculator (hashing function) is used to determine the location of the item


    What can be hashed

    What Can Be Hashed?

    • Anything!

    • Can hash on numbers, strings, structures, etc.

    • Java defines a hashing method for general objects which returns an integer value.


    Where do we use hashing

    Where do we use Hashing?

    • Databases (phone book, student name list).

    • Spell checkers.

    • Computer chess games.

    • Compilers.


    Hashing and tables

    Hashing and Tables

    • Hashing gives us another implementation of Table ADT

    • Hashing operations

      • Initialize

        • all locations in Hash Table are empty.

      • Insert

      • Search

      • Delete

    • Hash the key; this gives an index; use it to find the value stored in the table in O(1)

      • Great improvement over Log N.


    Hashing1

    Hashing

    • Insert pseudocode

      tableInsert (newItem)

      i = the array index that the address calculator gives you for the new item’s search key

      table[i]=newItem

    • Retrieval pseudocode

      tableRerieve (searchKey)

      i = array index for searchKey given by the hash function

      if (table[i].getKey( ) == searchKey)

      return table[i]

      else

      return null


    Hashing2

    Hashing

    • Deletion pseudocode

      tableDelete (searchKey)

      i = array index for searchKey given by the hash function

      success=(tabke[I].getKey() equals searchKey

      if (success)

      Delete the item from table[i]

      Return success


    Hash tables

    Hash Tables

    Table size

    Entries are numbered 0 to TSIZE-1

    Mapping

    Simple to compute

    Ideally 1-1: not possible

    Even distribution

    Main problems

    Choosing table size

    Choosing a good hash function

    What to do on collisions


    How to choose the table size

    TSIZE = 11

    110

    0

    110

    210

    320

    460

    520

    600

    110

    210

    320

    460

    520

    600

    0

    210,320

    1

    0

    15

    20

    22

    26

    49

    54

    20

    1

    2

    1

    520

    2

    3

    22

    2

    4

    3

    3

    5

    4

    4

    54

    5

    6

    5

    600

    15

    6

    7

    6

    26

    7

    8

    7

    8

    9

    8

    460

    9

    10

    9

    49

    How to choose the Table Size?

    H (Key) = Key mod TSIZE

    TSIZE = 10


    How to choose a hashing function

    How to choose a Hashing Function?

    • The hash function we choose depends on the type of the key field (the key we use to do our lookup).

      • Finding a good one can be hard

    • Rule

      • Be easy to calculate.

      • Use all of the key.

      • Spread the keys uniformly.


    How to choose a hashing function1

    How to choose a Hashing Function?

    • Example:

      • Student Ids (integers)

        h(idNumber) = idNumber % B

        eg. h(678921) = 678921 % 100 = 21

      • Names (char strings)

        h(name) = (sum over the ascii values) % B

        eg. h(“Bill”) = (66+105+108+108) % 101 = 86


    Collision

    Number 281942902

    Number 233667136

    Number 580625685

    Number 506643548

    Number 155778322

    Number 2641455

    Collision

    • Here is another new record to

      insert, with a hash value of 2.

    My hash

    value is [2].

    [ 0 ]

    [ 1 ]

    [ 2 ]

    [ 3 ]

    [ 4 ]

    [ 5 ]

    [100]

    . . .

    An array of records


    What to do on collisions

    What to do on collisions?

    Open hashing (separate chaining)

    Close hashing (open address)

    Linear Probing

    Quadratic Probing

    Double hashing


    Open hashing separate chaining

    0

    49

    4

    81

    1

    0

    25

    9

    16

    36

    64

    1

    2

    3

    4

    5

    6

    7

    8

    9

    Open hashing (separate chaining)

    • Keep a list of all elements that hash to the same value.

    0

    1

    4

    9

    16

    25

    36

    49

    64

    81


    Open hashing separate chaining1

    0

    9

    16

    49

    36

    64

    1

    81

    4

    25

    0

    1

    2

    3

    4

    5

    6

    7

    8

    9

    Open hashing (separate chaining)

    • Secondary Data Structure

      • List

      • Search tree

      • another hash table

    • We expect small collision

      • List

        • Simple

        • Small overhead


    Operations with chaining

    Operations with Chaining

    • Insert with chaining

      • Apply hash function to get a position.

      • Insert key into the Linked List at this position.

    • Search with chaining

      • Apply hash function to get a position.

      • Search the Linked List at this position.


    Open hashing separate chaining2

    Open hashing (separate chaining)

    public class ChainNode

    {

    Private KeyedItem item;

    private ChainNode next;

    public ChainNode(KeyedItem newItem, ChainNode nextNode) {

    item = newItem;

    next= nextNode;

    // set and get methods

    }

    } // end of ChainNode


    Open hashing separate chaining3

    Open hashing (separate chaining)

    public class HashTable

    {

    private final int HASH_TABLE_SIZE = 101; // size of hash table

    private ChainNode [] table; //hash table

    private int size; //size of hash table

    public HashTable() {

    table = new ChainNode [HASH_TABLE_SIZE];

    size =0;

    }

    public bool tableIsEmpty() { return size ==0;}

    public int tableLength() { return size;}

    public void tableInsert(KeyedItem newItem) throws HashException {}

    public boolean tableDelete(Comparable searchKey) {}

    public KeyedIten tableRetrieve(Comparable searchKey) {}

    } // end of hashtable


    Open hashing separate chaining4

    Open hashing (separate chaining)

    tableInsert(newItem)

    if (table is not full) {

    searchKey= the search key of newItem

    i = hashIndex (searchKey)

    node= reference to a new node containing newItem

    node.setNext (table[I]);

    table[I] = node

    }

    else //table full

    throw new HashException ()


    Open hashing separate chaining5

    Open hashing (separate chaining)

    tableRetrieve (searchKey)

    i = hashIndex (searchKey)

    node= table [I];

    while ((node !=null)&& node.getItem().getKey()!= searchKey )

    node=getNext ()

    if (node !=null)

    return node.getITem()

    else

    return null


    Evaluation of chaining

    Evaluation of Chaining

    • Disadvantages of Chaining

      • More complex to implement.

      • Search and Delete are harder. We need to know: The number of elements in the table (N); the number of buckets (B); the quality of the hash function

        • Worse case (O(n)) for searching

    • Advantage of Chaining

      • Insertions is easy and quick.

      • Allows more records to be stored.

        • The size of table is dynamic


    Review

    Review

    • A(n) ______ maps the search key of a table item into a location that will contain the item.

      • hash function

      • hash table

      • AVL tree

      • heap


    Review1

    Review

    • A hash table is a(n) ______.

      • stack

      • queue

      • array

      • list


    Review2

    Review

    • The condition that occurs when a hash function maps two or more distinct search keys into the same location is called a(n) ______.

      • disturbance

      • collision

      • Rotation

      • congestion


    Review3

    Review

    • ______ is a collision-resolution scheme that searches the hash table sequentially, starting from the original location specified by the hash function, for an unoccupied location.

      • Linear probing

      • Quadratic probing

      • Double hashing

      • Separate chaining


    Review4

    Review

    • ______ is a collision-resolution scheme that searches the hash table for an unoccupied location beginning with the original location that the hash function specifies and continuing at increments of 12, 22, 32, and so on.

      • Linear probing

      • Double hashing

      • Quadratic probing

      • Separate chaining


    Review5

    Review

    • ______ is a collision-resolution scheme that uses an array of linked lists as a hash table.

      • Linear probing

      • Double hashing

      • Quadratic probing

      • Separate chaining


    Review6

    Review

    • The load factor of a hash table is calculated as ______.

      • table size + current number of table items

      • table size – current number of table items

      • current number of table items * table size

      • current number of table items / table size


  • Login