1 / 46

Binary Searching

Binary Searching. Binary Search. Searching an ordered list. First compare the target to the key in the center of the list. If it is smaller, restrict the search to the left half; otherwise restrict the search to the right half, and repeat.

galena
Download Presentation

Binary Searching

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Binary Searching

  2. Binary Search • Searching an ordered list. • First compare the target to the key in the center of the list. • If it is smaller, restrict the search to the left half; otherwise restrict the search to the right half, and repeat. • In this way, at each step we reduce the length of the list to be searched by half. • Roughly we need lg(n) comparison compared to n in sequential search

  3. Binary Search vs. Sequential • Roughly we need log2(n) comparison compared to n in sequential search.

  4. Algorithm Development • Our binary search algorithm will use two indices, topand bottom, to enclose the part of the list in which we are looking for the target key. • At each iteration, we shall reduce the size of this part of the list by about half. • To keep track of the progress of the algorithm the following assertion need to be true: “The target key, provided it is present, will be found between the indices bottom and top, inclusive.” • We establish the initial correctness of this assertion by setting bottom to 0 and top to the_list.size( ) - 1. • To do binary search, we first calculate the index mid halfway between bottom and top as mid = (bottom + top)/2 • Next, we compare the target key against the key at position mid and then we change the appropriate one of the indices top or bottom so as to reduce the list to either its bottom or top half.

  5. Algorithm Development • Next, we note that binary search should terminate when top <= bottom; that is, when the remaining part of the list contains at most one item, providing that we have not terminated earlier by finding the target. • Finally, we must make progress toward termination by ensuring that the number of items remaining to be searched, top - bottom + 1, strictly decreases at each iteration of the process. • Several slightly different algorithms for binary search can be written.

  6. Algorithm • Initialization: Set bottom = 0; top = the list.size( ) - 1; • Compare target with the Record at the midpoint, mid = (bottom + top)/2; • Change the appropriate index top or bottom to restrict the search to the appropriate half of the list. • Loop terminates when top <= bottom, if it has not terminated earlier by finding the target.

  7. Different versions of binary search • The Forgetful Version: binary_search_1( ) forget the possibility that the Key target might be found quickly and continue to subdivide the list until what remains has length 1. • Recognizing Equality: binary_search_2( ) it seems that the previous one will often make unnecessary iterations because it fails to recognize that it has found the target before continuing to iterate. Thus we might hope to save computer time with a variation that checks at each stage to see if it has found the target.

  8. recursive_binary_1( ) Error_code recursive_binary_1(const Ordered_list &the_list, const Key &target, int bottom, int top, int &position) { Record data; if (bottom < top) { // List has more than one entry. int mid = (bottom + top)/2; the_list.retrieve(mid, data); if (data < target) // Reduce to top half of list. return recursive_binary_1 (the_list, target, mid ‡ 1, top, position); else // Reduce to bottom half of list. return recursive_binary_1(the_list, target, bottom, mid, position); } elseif (top < bottom) return not_present; // List is empty. else { // List has exactly one entry. position = bottom; the_list.retrieve(bottom, data); if (data == target) return success; else return not_present; } } // end of recursive_binary_1( )

  9. recursive_binary_1( ) recursive_binary_1(the_list, target, bottom, top, position) { if (bottom < top) { // List has more than one entry. Calculate mid; // int mid = (bottom + top)/2; get data; // the_list.retrieve(mid, data); if (data < target) // Reduce to top half of list. call recursive_binary_1 (the_list, target, mid+1, top, position); else // Reduce to bottom half of list. call recursive_binary_1 (the_list, target, bottom, mid, position); } elseif (top < bottom) return not_present; // List is empty. else { // top == bottom; List has exactly one entry. position = bottom; get data; // the_list.retrieve(bottom, data); if (data == target) // Once we arrived to one entry return success; // we are doing equality comparison else return not_present; } } // end of recursive_binary_1( )

  10. binary_search_1( ) Error_code binary_search_1 (const Ordered_list &the_list, const Key &target, int &position) { Record data; int bottom = 0, top = the_list.size( ) - 1; while (bottom < top) { int mid = (bottom + top)/2; the_list.retrieve(mid, data); if (data < target) bottom = mid + 1; else top = mid; } if (top < bottom) return not_present; else { position = bottom; the_list.retrieve(bottom, data); if (data == target) return success; else return not_present; } } Everytime we fetch mid data we only compare with target. We are performing only one comparisons of each mid One element

  11. binary_search_1( ) The division of the list into sublists is described in the following diagram: for each mid if (data < target) bottom = mid + 1; else top = mid;

  12. recursive_binary_2( ) Error_code recursive_binary_2(const Ordered_list &the_list, const Key &target,int bottom, int top, int &position) { Record data; if (bottom <= top) { int mid = (bottom + top)/2; the_list.retrieve(mid, data); if (data == target) { position = mid; return success; } else if (data < target) return recursive_binary_2(the_list, target, mid+1, top, position); else return recursive_binary_2(the_list, target, bottom, mid-1, position); } else return not_present; } Every time we fetch mid data we compare for equality with target. We are performing two comparisons of each mid

  13. binary_search_2( ) Error_code binary_search_2(const Ordered_list &the_list, const Key &target, int &position) { Record data; int bottom = 0, top = the_list.size( ) - 1; while (bottom <= top) { position = (bottom + top)/2; the_list.retrieve(position, data); if (data == target) return success; if (data < target) bottom = position + 1; else top = position - 1; } return not_present; } Everytime we fetch mid data we compare for equality with target. We are performing two comparisons of each mid

  14. binary_search_2( ) The division of the list into sublists is described in the following diagram: for each mid if (data == target) return success; if (data < target) bottom = position + 1; else top = position - 1;

  15. Comparison Trees The comparison tree of an algorithm is obtained by tracing the action of the algorithm, representing each comparison of keys by a vertexof the tree (which we draw as a circle). Inside the circle we put the index of the key against which we are comparing the target key. The number of comparisons done by an algorithm in a particular search is the number of internal vertices traversed in going from the top of the tree, called its root, down the appropriate path to a leaf.

  16. Comparison Tree Root Branch internal vertices Leaf

  17. Level & height • The number of branches traversed to reach a vertex from the root is called the levelof the vertex. Thus the root itself has level 0, the vertices immediately below it have level 1, and so on. • The largest level that occurs is called the heightof the tree.

  18. path length • The external path lengthof a tree is the sum of the number of branches traversed in going from the root once to every leaf in the tree. • The internal path lengthis defined to be the sum, over all vertices that are not leaves, of the number of branches from the root to the vertex. • We call the vertices immediately below a vertex v the childrenof v and the vertex immediately above v the parentof v.

  19. Comparison tree for sequential search If it is an unsuccessful search the number of comparisons is n, that is the internal path length + 1 of the tree.

  20. 5 <= > Target less than key at mid: Search from 1 to mid. Target greater than key at mid: Search from mid +1 to top. Comparison tree for binary_1( ); Forgetful Version;n = 10

  21. Comparison tree for binary_1( ) Forgetful Version if (target >mid data) mid = (1+5)/2 mid = (6+10)/2

  22. Comparison treefor binary_2( ),n = 10

  23. Comparison treefor binary_2( ),n = 10

  24. Comparison Count for binary_search_1; n = 10 • In binary_search_1, every search terminates at a leaf; to obtain the average number of comparisons for both successful and unsuccessful searches, we need what is called the external path length of the tree: the sum of the number of branches traversed in going from the root once to every leaf in the tree. the external path length is: (4 x 5) + (6 x 4) + (4 x 5) + (6 x 4) = 88 • Half the leaves correspond to successful searches, and half to unsuccessful searches. • Hence the average number of comparisons needed for either a successful or unsuccessful search by binary_search_1( ) is 44/10 = 4.4 when n = 10.

  25. external path length for binary_1( ) Forgetful Version 4 x 5 4 x 5 6 x 4) 6 x 4)

  26. Comparison Count for binary_search_2; n = 10 • In binary_search_2, all the leaves correspond to unsuccessful searches; hence the external path length leads to the number of comparisons for an unsuccessful search. the external path length is: (5 x 3) + (6 x 4) = 39 • We shall assume for unsuccessful searches that the n ‡ 1 intervals (less than the first key, between a pair of successive keys, or greater than the largest) are all equally likely; for the diagram we therefore assume that any of the 11 failure leaves are equally likely. Thus the average number of comparisons for an unsuccessful search is (2 x 39)/11 = 7.1

  27. external path length for binary_2( ) Equality Version 5 x 3) 6 x 4

  28. Comparison Count for binary_search_2; n = 10 • For successful searches, we need the internal path length, which is defined to be the sum, over all vertices that are not leaves, of the number of branches from the root to the vertex. the internal path length (no. of branches) is: 0 + 1 + 2 + 2 + 3 + 1 + 2 + 3 + 2 + 3 = 19 • Recall that binary_search_2 does two comparisons for each non-leaf except for the vertex that finds the target, and note that the number of these internal vertices traversed is one more than the number of branches (for each of the n = 10 internal vertices). We thereby obtain the average number of comparisons for a successful search to be 2 x (19/10 +1) - 1 = 4.8 • The subtraction of 1 corresponds to the fact that one fewer comparison is made when the target is found.

  29. Generalization • What happens when n is larger than 10? • For longer lists, it may be impossible to draw the complete comparison tree, but from the examples with n = 10, we can make some observations that will always be true.

  30. 2-Trees A 2-tree is a tree in which every vertex except the leaves has exactly twochildren. • Lemma 7.1 The number of vertices on each level of a 2-tree is at most twice the number on the level immediately above. Hence, in a 2-tree, the number of vertices on level t is at most 2tfor t >=0. • Lemma 7.2 If a 2-tree has k vertices on level t , then t >=lg k, where lg denotes a logarithm with base 2.

  31. Floor & ceiling The floor of a real number x is the largest integerless than or equal to x, and the ceiling of x is the smallest integergreater than or equal to x. We denote: the floor of x by xand the ceiling of x by x. e.g., x = 5.46; floor = 5; ceiling = 6;

  32. Analysis of binary_search_1 We can now turn to the general analysis of binary_search_1 on a list of n entries. The final step done in binary_search_1 is always a check for equality with the target; hence both successful and unsuccessful searches terminate at leaves, and so there are exactly 2n leaves altogether. As illustrated in Figure 7.3 for n = 10, all these leaves must be on the same level or on two adjacent levels. (This observation can be proved by using mathematical induction to establish the following stronger statement: If T1 and T2 are the comparison trees of binary_search_1 operating on lists L1 and L2 whose lengths differ by at most 1, then all leaves of T1 and T2 are on the same or adjacent levels. The statement is clearly true when L1 and L2 are lists with length at most 2 ( L1=1 & L2=2). Moreover, if binary_search_1 divides two larger lists whose sizes differ by at most one, the sizes of the four halves also differ by at most 1, and the induction hypothesis shows that their leaves are all on the same or adjacent levels.)

  33. Analysis of binary_search_1 From Lemma 7.2 it follows that the maximum level t of leaves in the comparison tree satisfies t =  lg 2n . Since one comparison of keys is done at the root (which is level 0), but no comparisons are done at the leaves (level t ), it follows that the maximum number of key comparisons is also t =  lg 2n . Furthermore, the maximum number is at most one more than the average number, since all leaves are on the same or adjacent levels. maximum number of key comparisons is also t =  lg 2n  = lg (n x 2) = lg n +lg 2 = lg n + 1 (worst case) When n is odd all leaves will be on the same level and # of comparisons = lg n

  34. binary search 1

  35. Analysis of binary_search_2, Unsuccessful Search • To count the comparisons made by binary_search_2 for a general value of n for an unsuccessful search, we shall examine its comparison tree. • For reasons similar to those given for binary_search_1, this tree is again full at the top, with all its leaves on at most two adjacent levels at the bottom. • For binary_search_2, all the leaves correspond to unsuccessful searches, so there are exactly n + 1 leaves, corresponding to the n + 1 unsuccessful outcomes: less than the smallest key, between a pair of keys, and greater than the largest key.

  36. Analysis of binary_search_2, Unsuccessful Search • Since these leaves are all at the bottom of the tree, Lemma 7.1 implies that the number of leaves is approximately 2h, where h is theheight of the tree, 2h= n + 1 Taking (base 2) logarithms, we obtain that h lg(n + 1). This value is the approximate distance from the root to one of the leaves. • Since, in binary_search_2, two comparisons of keys are performed for each internal vertex, the number of comparisons done in an unsuccessful search is approximately 2 lg(n + 1)

  37. Binary search 2, Unsuccessful Search

  38. Analysis of binary_search_2, Successful Search • In the comparison tree of binary_search_2, the distance to the leaves is lg(n + 1), as we have seen. The number of leaves is n + 1, so the external path length (E) is about, no. of leaves x height: E = (n + 1)lg(n + 1) • Theorem 7.3 (E = I + 2q ) then shows that the internal path length is about I = (n + 1)lg(n + 1)- 2n • To obtain the average number of comparisons done in a successful search, we must first divide by n (the number of non-leaves) and then add 1 and double, since two comparisons were done at each internal node. Finally, we subtract 1, since only one comparison is done at the node where the target is found. The result is: E = I + 2q I = E - 2q

  39. Analysis of binary_search_2, Successful Search Average number of comparisons = ( Internal path length / no. of vertices +1) 2 -1 [+1 because every internal path length (x) has (x+1) vertices] = ((n + 1)lg(n + 1)- 2n)/n +1)2 -1 = (2(n + 1)lg(n + 1))/n -2 -1 = (2(n + 1)lg(n + 1))/n -3 if n is big average number of comparisons = (2 lg n)/n -3

  40. Binary search 2

  41. The Path-Length Theorem

  42. Proof • To prove the theorem we use the method of mathematical induction, using the number of vertices in the tree to do the induction. • If the tree contains only its root, and no other vertices, then E = I = q = 0, and the base case of the theorem is trivially correct. • Now take a larger tree, and let v be some vertex that is not a leaf, but for which both the children of v are leaves. • Let k be the number of branches on the path from the root to v. See Figure 7.6.

  43. Proof • Now let us delete the two children of v from the 2-tree. Since v is not a leaf but its children are, the number of non-leaves goes down from q to q - 1. • The internal path length I is reduced by the distance to v (= k) ; that is, to I - k. The distance to each child of v is k + 1, so the external path length is reduced from E to E - 2(k + 1), • but v is now a leaf, so its distance, k, must be added, giving a new external path length of E - 2(k + 1) + k = E - k - 2: • Since the new tree has fewer vertices than the old one, by the induction hypothesis we know that E - k - 2 = (I - k) + 2(q - 1): • Rearrangement of this equation gives the desired result. end of proof

  44. Comparison of Methods The number of comparisons of keys done by binary searches in searching a list of n items is approximately: Search-1: Successful & Unsuccessful Search-2: Successful & Unsuccessful If high value of n:

  45. Comparison of Methods for even higher value of n:

  46. interpolation search • To find the target key target, interpolation search then estimates, according to the magnitude of the number target relative to the first and last entries of the list, about where target would be in the list and looks there. It then reduces the size of the list according as target is less than or greater than the key examined. • It can be shown that on average, with uniformly distributed keys, interpolation search will take about lg lg ncomparisons of keys, which, for large n, is somewhat fewer than binary search requires. • If, for example, n = 1,000,000, then binary_search_1 will require about lg 106+ 1 21 comparisons, while interpolation search may need only about lg lg 1064.32 comparisons.

More Related