Searching and Hashing

Searching and Hashing

Concepts This Lecture • Searching an array • Linear search • Binary search • Comparing algorithm performance

Searching • Searching = looking for something • Searching an array is particularly common • Goal: determine if a particular value is in the array • We'll see that more than one algorithm will work

Searching Algorithms • The algorithm used to find a number in a phone book is practical and efficient for human but not so good for computers • It's not precise • It's not consistent • Let's imagine another scenario. Suppose that you have • A pile of cards containing names of customers • They are not organized in any particular way • You want to find the card with name Sarah (your key) • The procedure you'll will use is likely to be: look a each card's key (one by one) until one matches your target • This is an algorithm and is called Linear Search

Searching as a Function • Specification: • Let b be the array to be searched, n is the size of the array, and b is x is value being search for. • If x appears in b[0..n-1], return its index, i.e., • return k such that b[k]==x. If x not found, return –1 • None of the parameters are changed by the function • Function outline: void Lookup ((const int vec[ ], int vSize, int key, Boolean& found, int& loc) { ... }

Linear Search • Algorithm: start at the beginning of the array and examine each element until x is found, or all elements have been examined void Lookup (const int vec[ ], int vSize, int key, Boolean& found, int& loc) { loc = 0; while (loc < vSize && vec[loc] != key) loc++; found = (loc < vSize); }

Found It! Linear Search • Test: • search(v, 8, 6) 3 12 -5 6 142 21 -17 45 b

Ran off the end! Not found. Linear Search • Test: • search(v, 8, 15) 3 12 -5 6 142 21 -17 45 b

Linear Search • Note: The loop condition is written so vec[loc] is not accessed if loc >= vSize. • while ( loc < vSize && vec[loc] != key ) • (Why is this true? Why does it matter?) 3 12 -5 6 142 21 -17 45 b

NodeType linearSearch(NodeType *start, int target) { if (start->key == target) return *start; if (start == NULL) return NULL; else return LinearSearch(start->next, target); } Write a Recursive Linear Search

Linear Search-Linked List for each item in the list if the item's key match the target stop and report "success" report failure

Linear Search (target = 9) head 5 12 9 / head

Linear Search (target = n) NodeType linearSearch(NodeType *start, int target) { NodeType *temp = start; while (temp != NULL) { if (temp->key == target) return *temp; temp = temp->next; } return NULL; }

Analyzing Linear Search • Best case analysis • The element is always found in the first position of the list, which means that we do one comparison: O(1) • Worst case analysis • The element is never present in the list. This means that we are going to do n comparisons where n is the size of the list • we have to go through the whole list to be sure whether the element is present: O(N) • Average case analysis • The search key can be found anywhere in the list • If we "run" the algorithm for each possibility where the key may appear we get: 1+2+….+vSize/vSize => (vSize*(vSize+1)/2)/vSize = (vSize+1)/2 = O(N)

Can we do better? • Time needed for linear search is proportional to the size of the array. • An alternate algorithm, "Binary search," works if the array is sorted • 1. Look for the target in the middle. • 2. If you don't find it, you can ignore half of the array, and repeat the process with the other half. • Example: Find first page of pizza listings in the yellow pages

Binary Search • In some cases, you get a list which is already ordered. • In this case we can use algorithms that take this into consideration • The idea of binary search is • Split the list in two halves and compare the target with the key in the middle of the list • Based on this comparison we can tell which half of the list may contain the target • Binary search eliminates half of the list at each iteration • It requires direct access to the list elements

0 L R n <= x > x b 0 L R n <= x ? > x b Binary Search Strategy • What we want: Find split between values larger and smaller than x: Situation while searching Step: Look at b[(L+R)/2]. Move L or R to the middle depending on test.

0 L R n <= x ? > x b Binary Search Strategy • More precisely • Values in b[0..L] <= x • Values in b[R..n-1] > x • Values in b[L+1..R-1] are unknown

0 L R n b <= x ? > x Binary SearchIterative Approach /* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */ NodeType binarySearch(NodeType list[], int size, int target){ int front, back, mid; ___________________ ; while ( _______________ ) { } _________________ ; }

0 L R n b <= x ? > x Binary SearchIterative Approach /* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */ NodeType binarySearch(NodeType list[], int size, int target){ int front, back, mid; ___________________ ; while ( _______________ ) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; } _________________ ; }

0 L R n b <= x ? > x Loop Termination /* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */ NodeType binarySearch(NodeType list[], int size, int target){ int front, back, mid; ___________________ ; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } _________________ ; }

0 L R n b <= x > x Initialization /* If x appears in b[0..n-1], return its location, i.e., return k so that b[k]==x. If x not found, return -1 */ NodeType binarySearch(NodeType list[], int size, int target) { int front(0); int back(size-1); int mid; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } _________________ ; }

0 L R n b <= x > x Return Result NodeType binarySearch(NodeType list[], int size, int target) { int front(0); int back(size-1); int mid; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } return NULL; \\ Indicates target was not found; }

L L L R R mid mid mid Binary Search 0 1 2 3 4 5 6 7 • Test: bsearch(v,8,3); -17 -5 3 6 12 21 45 142 b while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;

L L L R R mid mid mid Binary Search 0 1 2 3 4 5 6 7 • Test: bsearch(v,8,17); -17 -5 3 6 12 21 45 142 b while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;

L L L L L R mid mid mid mid Binary Search 0 1 2 3 4 5 6 7 • Test: bsearch(v,8,143); -17 -5 3 6 12 21 45 142 b while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;

L R R R R mid mid mid Binary Search 0 1 2 3 4 5 6 7 • Test: bsearch(v,8,-143); -17 -5 3 6 12 21 45 142 b while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1;

Binary Search (target = n) NodeType binarySearch(NodeType list[], int size, int target) { int front(0); int back(size-1); int mid; while (front <= back) { mid = (front+back)/2; if (target == list[mid].key) return list[mid]; else if (target < list[mid].key) back = mid-1; else front = mid+1; } return NULL; \\ Indicates target was not found; }

Binary Search (target = 7) 4 6 7 12 18 22 23 28 30 back(8) front(0)

Binary Search (target = 7) 4 6 7 12 18 22 23 28 30 front(0) back(8) mid(4)

Is it worth the trouble? • Suppose you had 1000 elements • Ordinary search would require maybe 500 comparisons on average • Binary search • after 1st compare, throw away half, leaving 500 elements to be searched. • after 2nd compare, throw away half, leaving 250. Then 125, 63, 32, 16, 8, 4, 2, 1 are left. • After at most 10 steps, you're done! • What if you had 1,000,000 elements??

How Fast Is It? • Another way to look at it: How big an array can you search if you examine a given number of array elements?

The table shows that the number of iterations grows proportionally to the logarithm base 2 of the size of the list O(log n) Analyzing Binary Search • We only need to concentrate in the main loop • The loop is different from the linear search because its number of executions is not a multiple of n (list size) • We can easily see that the size of the input is halved in each interaction. This should already give a "hint" of each function describes this algorithm, but let's use a table

Time for Binary Search • Key observation: for binary search: size of the array n that can be searched with k comparisons: n ~ 2k • Number of comparisons k as a function of array size n: k ~ log2n • This is fundamentally faster than linear search (where k ~ n)

Write a Recursive Binary SeachFunction BinarySearch( ) • BinarySearch takes sorted array vec, and two subscripts, fromLoc and toLoc, and key as arguments. It returns false if key is not found in the elements vec[fromLoc…toLoc]. Otherwise, it returns true. • BinarySearch is O(log2N).

found = BinarySearch(vec, 25, 0, 14 ); key fromLoc toLoc indexes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 vec 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 16 18 20 22 24 26 28 24 26 28 24 NOTE: denotes element examined

Recursive Binary Seach-- basic idea • This is an example of a recursive function where arguments are halved. Given: a sorted array a of values (integers, strings, ..) from range [s,t] Task: search if a value x is in the array. If yes, return position, otherwise -1.

Recursive Binary Seach-- basic idea • Consider how you search for a name in a phone book: you don't use algorithm 1 (otherwise it would take ages to find a name starting with Z). • instead, you open the book somewhere, and then continue searching in the half that contains the name then open up somewhere in that half, and continue searching in the portion that contains the name, etc.

Recursive Binary Seach-- basic idea Now let's do this for a sorted array of integers, but let's always check the middle of the remaining range. Example: search for 7 in the following array 0 1 2 3 4 5 6 7 8 9 2 5 7 11 17 24 31 38 40 41 low high mid: (0+9)/2 = 4, 7< a[4], so look in lower half 2 5 7 11 low high mid = (0+3)/2 == 1, 7> a[1], so look in upper half 7 11 low high mid = (2+3)/2 == 2, 7 == a[2], found!

Recursive Binary Seach-- basic idea • Example: array contains 3,5. Search for 4. (0+1)/2 is 0 (integer div). so if we don't exclude mid, the sub array starts again at index 0 and ends at 1. => infinite number of recursive calls in the code on the next page, mid is excluded from the subarray to prevent this.

Recursive Binary Seach-- basic idea • Let's think about the design of the recursive fct before coding it: 1. recursive calls: call function with that half of the current subrange that contains x Define subrange with start and end index 2. base case: when should the recursive calls stop: when we find x • what if x is not in the array? -- stop if a single cell that does not contain x check: does the (start + end)/2 procedure always end in an array of length 1? A: depends on how you implement it. You must ensure that array gets at least smaller by 1.

Recursive Binary Seach Boolean BinarySearch ( int vec[ ] , int key , int fromLoc , int toLoc ) // PRE: vec [ fromLoc . . toLoc ] sorted in ascending order // POST: FCTVAL == ( key in vec [ fromLoc . . toLoc] ) { int mid ; if ( fromLoc > toLoc ) // base case -- not found return false ; else { mid = ( fromLoc + toLoc ) / 2 ; if ( vec [ mid ] == key ) // base case-- found at mid return true ; else if ( key < vec [ mid ] ) // search lower half return BinarySearch ( vec, key, fromLoc, mid-1 ) ; else // search upper half return BinarySearch( vec, key, mid + 1, toLoc ) ; } } 49

#include <stdio.h> /* prototype */ int binSearch(int array[], int first, int last, int N); void main(void){ int index; int value; int list[] = {1,2,3,5,6}; printf(“Enter a search value:”); scanf(“%i”,&value);/* the function binSearch returns the index of the array */ /* where the match is found, otherwise a –1 */ index = binSearch(list,0,4,value); if (index == -1) printf(“Value not found!\n”); else printf(“Value matches the %i element in the array!\n”,++index); } /* code continued on next slide */

Searching and Hashing

Searching and Hashing

Presentation Transcript

Hashing

Indexing and Hashing

Hashing

Hashing

Indexing and Hashing

Similarity Searching in High Dimensions via Hashing

Hashing

Maps and Hashing

Searching / Hashing

Maps and hashing

Searching, Maps, Hashing

Hashing

Hashing

Hashing

Hashing

HASHING

Hashing

Similarity Searching in High Dimensions via Hashing

Hashing, Hashing Tables

Hashing for searching