1 / 80

Disjoint sets

Disjoint sets. Outline. In this topic, we will cover disjoint sets, including: A review of equivalence relations The definition of a Disjoint Set An efficient data structure A general tree An optimization which results in Worst case O (ln( n )) height

ling
Download Presentation

Disjoint sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Disjoint sets

  2. Outline In this topic, we will cover disjoint sets, including: A review of equivalence relations The definition of a Disjoint Set An efficient data structure A general tree An optimization which results in Worst case O(ln(n)) height Average case O(a(n))) height Best case Q(1) height A few examples and applications

  3. Definitions Recall the properties of an equivalence relation: a ~ a for all a a ~ b if and only if b ~ a If a ~ b and b ~ c, it follows that a ~ c An equivalence relation partitions a set into distinct equivalence classes Each equivalence class may be represented by a single object: the representative object Another descriptive term for the sets in such a partition is disjoint sets

  4. Implicitly Defined Relations For example, big-Q defines an equivalence class of functions which grow at the same rate We choose a single function to represent the classe.g., we represent all functions with quadratic growth with n2 Another example: partition the numbers from 1 to 20 according to the relation a ~ b if a and b share the same factors of 2: {1, 3, 5, 7, 9, 11, 13, 15, 17, 19}, {2, 6, 10, 14, 18}, {4, 12, 20}, {8}, {16} These equivalence relations are implicitly defined by the relation

  5. Explicitly Defined Disjoint Sets Alternatively, a partition or collection of disjoint sets may be used to explicitly define an equivalence relation: a ~ b if and only if a and b are in the same partition For example, the 10 numerals 1, 2, 3, 4, 5, 6, 7, 8, 9, 0 can be partitioned into the three sets {1, 2, 3, 5, 7}, {4, 6, 9, 0} , {8} Therefore, 1~2, 2 ~3, etc.

  6. Explicitly Defined Disjoint Sets Consider simulating a device and tracking the connected components in a circuit This forms an equivalence relation: a ~ b if a and b are connected http://www.morphet.org.uk/ferro/s6mono.html

  7. Operations on Disjoint Sets There are two operations we would like to perform on disjoint sets: Determine if two elements are in the same disjoint set, and Take the union of two disjoint sets creating a single set We will determine if two objects are in the same disjoint set by defining a function which finds the representative object of one of the disjoint sets If the representative objects are the same, the objects are in the same disjoint set

  8. Implementation Given two elements a and b, we will say that they are in the same set if find( a ) == find( b ) What find returns is irrelevant so long as: If a and b are in the same set, find( a ) == find( b ) If a and b are not in the same set, find( a ) != find( b ) We will have find return an integer

  9. Implementation Here is a poor implementation: Have two arrays and the second array stores the representative object Finding the representative object isQ(1) However, taking the union of two sets is Q(n) It would be necessary to check each array entry

  10. Implementation As an alternate implementation, let each disjoint set be represented by a general tree The root of the tree is the representative object To take the union of two such sets, we will simply attach one tree to the root of the other Find and union are now both O(h)

  11. Implementation Normally, a node points to its children: We are only interested in the root; therefore, our interest is in storing the parent

  12. Implementation For simplicity, we will assume we are creating disjoint sets the n integers 0, 1, 2, ..., n – 1 We will define an array parent = new int[n]; for ( int i = 0; i < n; ++i ) { parent[i] = i; } If parent[i] == i, then i is a root node Initially, each integer is in its own set

  13. Implementation We will define the function intDisjoint_set::find( inti ) const { while( parent[i] != i ) { i = parent[i]; } return i; } Tfind(n) = O(h)

  14. Implementation Initially, you will note that find( i ) != find( j ) for i != j, and therefore, we begin with each integer being in its own set We must next look at the union operation how to join two disjoint sets into a single set

  15. Implementation This function is also easy to define: void set_union( int i, int j ) { i = find( i ); j = find( j ); if ( i != j ) { // slightly sub-optimal... parent[j] = i; } } Tset_union(n) = 2Tfind(n) + Q(1) = O(h) The keyword union is reserved in C++

  16. Example Consider the following disjoint set on the ten decimal digits:

  17. Example If we take the union of the sets containing 1 and 3 set_union(1, 3), we perform a find on both entries and update the second

  18. Example Now, find(1) and find(3) will both return the integer 1

  19. Example Next, take the union of the sets containing 3 and 5, set_union(3, 5); we perform a find on both entries and update the second

  20. Example Now, if we take the union of the sets containing 5 and 7 set_union(5, 7), we update the value stored in find(7) with the value find(5):

  21. Example Taking the union of the sets containing 6 and 8set_union(6, 8), we update the value stored in find(8) with the value find(6):

  22. Example Taking the union of the sets containing 8 and 9set_union(8, 9), we update the value stored in find(8) with the value find(9):

  23. Example Taking the union of the sets containing 4 and 8set_union(4, 8), we update the value stored in find(8) with the value find(4):

  24. Example Finally, if we take the union of the sets containing 5 and 6 (union(5, 6)), we update the entry of find(6) with the value of find(5):

  25. Optimizations To optimize both find and set_union, we must minimize the height of the tree Therefore, point the root of the shorter tree to the root of the taller tree The height of the taller will increase if and only if the trees are equal in height

  26. Worst-Case Scenario Let us consider creating the worst-case disjoint set As we are always attaching the tree with less height to the root of the tree with greater height, the worst case must occur when both trees are equal in height

  27. Worst-Case Scenario Thus, building on this, we take the union of two sets with one element We will keep track of the number of nodes at each depth 1 1

  28. Worst-Case Scenario Next, we take the union of two sets, that is, we join two worst-case sets of height 1: 1 2 1

  29. Worst-Case Scenario And continue, taking the union of two worst-case trees of height 2: 1 3 3 1

  30. Worst-Case Scenario Taking the union of two worst-case trees of height 3: 1 4 6 4 1

  31. Worst-Case Scenario And of four: 1 5 10 10 5 1

  32. Worst-Case Scenario And finally, take the union of two worst-case trees of height 5: These are binomial trees 1 6 15 20 15 6 1

  33. Worst-Case Scenario From the construction, it should be clear that this would define Pascal’s triangle The binomial coefficients 1 1 1 6 1 5 1 4 15 1 3 10 1 2 6 20 1 3 10 1 4 15 1 5 1 6 1 1

  34. Worst-Case Scenario Thus, suppose we have a worst-case tree of height h We need the number of nodes and the average depth of a node Using Maple > sum( binomial( h, k ), k = 0..h ); > sum( k*binomial( h, k ), k = 0..h );we get: Therefore, the average depth is The height and average depth of the worst case are O(ln(n))

  35. Best-Case Scenario In the best case, all elements point to the same entry with a resulting height of Q(1):

  36. Average-Case Scenario What is the average case? Could it be any better than O(ln(n))? is there something better? To answer this, I created a program which, given the integers from 0 to 2n– 1 continued to randomly choose numbers until all entries were in a single large set For each n, I did this multiple times and found the mean (average) height

  37. Average-Case Scenario The resulting graph shows the average height of a randomly generated disjoint set data structure with 2n elements This suggests that the average height of such a tree is o(ln(n)) (or better!) See reference [1] for a detailed analysis

  38. Average-Case Scenario The actual asymptotic behaviour isO(a(n)) where a(n) is the inverse of the function A(n, n) where A(m, n) is the Ackermann function: The first values are: A(0,0) = 1, A(1,1) =3, A(2, 2) = 7, A(3, 3) = 61

  39. Average-Case Scenario However, A(4, 4) = 2A(3, 4) – 3 where A(3,4) is the19729-decimal-digit number A(3, 4) =  Thus, A(4, 4) – 3, in binary, is 1 followed by this many zeros.... http://xkcd.com/207/

  40. Average-Case Scenario Therefore, we (as engineers) can, in clear conscience, state that the average run-time is Q(1) as there are no physical circumstances where the average depth could by anything more than 4

  41. Optimizations Another optimization is that, whenever find is called, update the object to point to the root void Disjoint_set::find( int n ) { if ( parent[n] == n ) { return n; } else { parent[n] = find( parent[n] ); return parent[n]; } } The next call to find(n) is Q(1); the cost is O(h) memory

  42. Implementation Summary We now have two exceptionally fast operations: both find and union will run in Q(1) time on average, and O(ln(n)) in the worst-case scenario

  43. Application: Image Processing One common application is in image processing Suppose you are attempting to recognize similar features within an image Within a photograph, the same object may be separated by an obstruction a road may be split in two in an image by a telephone pole a road on an aerial photograph may be separated by an overpass

  44. Application: Image Processing Consider the following image of the author climbing up the Niagara Escarpment at Rattlesnake Point Suppose we have a programwhich recognizes skin tones

  45. Application: Image Processing A first algorithm may make an initial pass and recognize five different regions which are recognized as exposed skin the left arm and hand are separated by a watch Each region would be represented by aseparate disjoint set

  46. Application: Image Processing Next, a second algorithm may take sets which are close in proximity and attempt to determine if they are from the same person In this case, the algorithm may take the union of: the red and yellow regions, and the dark and light blue regions

  47. Application: Image Processing Finally, a third algorithm may take more distant sets and, depending on skin tone and other properties, may determine that they come from the same individual In this example, the third pass may, ifsuccessful, take the union of the red, blue,and green regions

  48. Application: Maze Generation Another fun application is in the generation of mazes Impress your (non-engineering) friends They’ll never guess how easy this is...

  49. Application: Maze Generation Here we have a maze which spansa 500 × 500 grid ofsquares where: There is one unique solution Each point can be reached byone unique path from the start Ref: Lance Hampton http://littlebadwolf.com/mazes/

  50. Application: Maze Generation Zooming in on the maze, you will note that it is rather complexand seeminglyrandom Ref: Lance Hampton http://littlebadwolf.com/mazes/

More Related