Disjoint Set Structures for Operations over Set

Disjoint set structures --for Operations over set(Reference: textbook, pp175-180) CS2223 Recitation 3 March 30, 2005 Song Wang

Problem Description • Given: • A set S with N objects, identified using number 1 to N. • Disjoint partitions (subsets) of the set S. • Any item belongs to one partition • No one item belongs to more than one partitions. • What to do: • Find: given an object, find which set contains it. • Merge: given two set, merge them into one set. • Why: • Basic and frequently used functions for set operations, like union, intersection, and etc. • Consequently, important problem for many other algorithms, like finding the minimum spanning tree.

1 2 7 6 5 4 3 8 9 Set 1 Set 2 Set 3 Preliminaries • Data Structure for Set: Tree • Ex. Parent Node denotes each set Smallest object as the parent node (one choice)

1 3 1 1 1 2 3 2 3 3 2 1 1 0 0 3 0 1 Some adaptation: Index: 1 2 3 4 5 6 7 8 9 Array Preliminaries II • Degraded Linked List: Array to record parent only Index: 1 2 3 4 5 6 7 8 9 Array

Index: 1 2 3 4 5 6 7 8 9 Array 1 2 3 3 2 3 1 1 1 Solution 1: find1() find1(7): 1--belongs to set 1 find1(2): 2—belongs to set 2 Function find1(x) return set[x]

Index: 1 2 3 4 5 6 7 8 9 Index: 1 2 3 4 5 6 7 8 9 Array Array 1 1 1 2 3 3 2 3 1 1 1 1 1 3 3 1 3 1 Solution 1: merge1() Merge set 1 and 2: Procedure merge1(a,b) i<- min (a, b) j<-max (a, b) for k<-1 to N do if set[k]=j then set[k]<-i Scan

Performance Analysis of find1() and merge1() • Case Study: n times of find and <=N-1 times of merge. (n is comparable to N) • Function find1 takes constant time: Θ(1) • Procedure merge1 takes linear time: Θ(N) • Total: n* Θ(1)+(N-1)Θ(N)= Θ(N2) or Θ(n2)

8 1 2 7 6 5 4 3 9 2 8 5 7 1 8 5 9 2 1 9 7 Set 1 Set 2 Set 3 Set 1 Set 1 Can We do Better? Merge set 1 and 2:

Index: 1 2 3 4 5 6 7 8 9 Index: 1 2 3 4 5 6 7 8 9 Array Array 1 1 1 2 3 3 2 3 1 1 1 1 1 3 3 2 3 1 Solution 2: merge2() Merge set 1 and 2: Procedure merge2(a,b) if a<b then set[b]<-a else set[a]<-b Guarantee the root of the tree is the smallest

Index: 1 2 3 4 5 6 7 8 9 Array 1 1 3 3 3 1 1 1 2 8 1 2 7 5 9 Set 1 Solution 2: find2() find1(5): 1 Need traverse the whole path from node 5 to the root node 1 Function find2(x) r<-x while set[r]!=r do r<-set[r] return r Only for root, r=set[r]

Performance Analysis of find2() and merge2() • Case Study: n times of find and <=N-1 times of merge. (n is comparable to N) • Function find2 takes linear time: Θ(N) in the worst case. • Procedure merge2 takes constant time: Θ(1) • Total: n* Θ(N)+(N-1)Θ(1)= Θ(N2) or Θ(n2) • No improvement!

Merge2(5,6) 6 4 4 1 2 3 6 4 5 6 1 5 3 4 2 1 2 1 2 6 5 5 3 3 Merge2(4,5) …… Merge2(1,2) What is the Problem? • The worst case: linear tree Find2(6)? Height of the tree is essential for performance

5 3 4 3 5 7 6 2 1 4 1 7 6 2 1 4 3 5 7 6 2 How to Avoid a Bad Merge Tree Merge(1,4)

Who’s whose subtree? • Tree t1 has height h1 and Tree t2 has height h2 • If h1< h2 : t1 becomes subtree of t2 and merged tree’s height is h2 • If h1== h2 : t1 becomes subtree of t2 and merged tree’s height is h1+1 • The root of the tree is not always the smallest node any more!

Theorem 5.9.1, pp 177 • A tree containing k nodes has a height at most └log k┘ • Proof by induction.

Solution 3: merge3() Procedure merge3(a,b) if height[a]=height[b] then height[a]<-height[a]+1 set[b]<-a else if height[a]>height[b] then set[b]<-a else set[a]<-b

Performance Analysis of find2() and merge3() • Case Study: n times of find and <=N-1 times of merge. (n is comparable to N) • Function find2 takes <linear time: Θ(logN) in the worst case. • Procedure merge3 takes constant time: Θ(1) • Total: n* Θ(logN)+(N-1)Θ(1)= Θ(n log n) • Some improvement

16 20 11 21 20 12 10 21 16 10 11 12 9 4 8 6 1 4 1 6 8 9 Path Compression in find3() • Intuitive explanation • More fan-out of children, less height of the tree. Find3(20)

Solution 3: find3() Function find3(x) r<-x while set[r]!=r do r<-set[r] i<-x while i!=r do j<-set[i] set[i]<-r i<-j return r First traverse of the path Find the root Second traverse of the path Connect nodes on path to root

Performance Analysis of find3() and merge3() • Case Study: n times of find and <=N-1 times of merge. (n is comparable to N) • Function find3 takes little more than constant time. • Procedure merge3 takes constant time: Θ(1) • Total: close to Θ(n) • Best one!

2 5 7 1 9 9 8 5 7 2 1 8 Summery Find1() and merge1(): Best for find, worst for merge (height =1, always ) Find2() and merge2() Best for merge, worst for find (height = N, worst case) Mixing above Mixing above Find2() and merge3() (height = lgN, worst case) Find3() and merge3() (height close to 1) Best for both

Disjoint Set Structures for Operations over Set