HEURISTIC & SPECIAL CASE ALGORITHMS FOR DISPERSION PROBLEMS - RAVI, ROZENKRANTZ, TAYI

HEURISTIC & SPECIAL CASE ALGORITHMS FOR DISPERSION PROBLEMS - RAVI, ROZENKRANTZ, TAYI • ROB CHURCHILL • (THANKS TO BEHZAD)

Problem: • given V = {v1, v2, …, vn}, find a subset of p nodes (2 <= p <= n) such that some distance function between nodes is maximized • My first reaction: sounds like a Max-k Cover Problem except instead of covering, maximizing distances

Max-Min Facility Dispersion(MMFD) • Given non-negative, symmetric distance function w(x,y) where x, y ∈ V • Find a subset P = {vi1, vi2, …, vip} of V where |P| = p, s.t. f(P) = minx,y ∈ P{w(x, y)} is maximized.

Max-Avg Facility Dispersion (MAFD) • Given non-negative, symmetric distance function w(x,y) where x, y ∈ V • Find a subset P = {vi1, vi2, …, vip} of V where |P| = p, s.t. f(P) = 2/[p(p-1)] * Σx,y ∈ Pw(x, y) is maximized.

MMFD & MAFD are NP-Hard • Even when distance function is a metric • Reduction to the NP-Complete problem CLIQUE. • Checks to see if a given graph G = (V, E) contains a clique of size >= J

Reduction • w(x, y) = 1 if they are connected, 0 otherwise • set J = p • For MAFD, if Clique(J) = 1, then there exists a clique of size J. If J < 1, then there does not exist a clique of size J • For MMFD, Clique(J) = 1 if there exists a clique of size J and 0 if there does not

How do we solve these? • If we can’t get an optimal solution, we will settle for a good approximation • There are no absolute approximation algorithms for MMFD or MAFD unless P = NP • We want a relative approximation algorithm

Use Greedy Algorithms “Greed is good.” - Gordon Gekko

Max-Min Greedy Algorithm • Step 1. Let vi and vj be the endpoints of an edge of maximum weight. • Step 2. P <— {vi, vj}. • Step 3. while ( |P| < p ) do • begin • a. Find a node v ∈ V \ P such that minv' ∈ P {w(v, v’)} is maximum among the nodes in V \ P. • b. P <— P U {v} • end • Step 4. Output P. • Provides a 2-approximation to the optimal value

Max-Avg Greedy Algorithm • Step 1. Let vi and vj be the endpoints of an edge of maximum weight. • Step 2. P <— {vi, vj}. • Step 3. while ( |P| < p ) do • begin • a. Find a node v ∈ V \ P such that Σv’ ∈ P w(v, v') is maximum among the nodes in V \ P. • b. P <— P U {v}. • end • Step 4. Output P. • Provides a 4-approximation of the optimal solution

Special Cases • For one dimensional data points, you can solve MMFD & MAFD optimally in polynomial time • For two dimensional data points, you can solve MAFD slightly more accurately than the greedy algorithm in polynomial time • 2-D MMFD is NP-Hard, 2-D MAFD is open

1-D MAFD & MMFD • Restricting the points to 1-D allows for a dynamic programming optimal solution in polynomial time • O(max{n log n, pn}) • V = {x1, x2, …, xn}

How it works • Sort the points in V (n log n time) • w(x, y) = distance from x to y • OPT(j, k) = the solution value with k points picked from x1, …, xj • OPT(n, p) = optimal solution for the whole set

Recursive Statement • OPT(j, k) = max {OPT(j-1, k), OPT(j-1, k-1) U xj}

Runtime MAFD • OPT(j-1, k) and OPT(j-1, k-1) are constant time lookups • Store the representative of OPT(j-1, k-1) in μ(j-1, k-1) • OPT(j-1, k-1) U xj is constant time:w(xj, μ(j-1, k-1)) + OPT(j-1, k-1)*(k-1) / k = average distance

Runtime MMFD • Store the most recently picked element in the optimal solution in f(j-1, k-1) • This gives a constant time computation of OPT(j-1, k-1) U xj:min {OPT(j-1, k-1), w(xj, f(j-1, k-1))}

Runtime • Both are O (nlogn + pn) since their computation times per iteration are constant if the right information is stored

The Dynamic Programming Algorithm (*- - In the following, array F represents the function f in the formulation. - -*) Step 1. Sort the given points, and let {x, x2, …, xn} denote the points in increasing order. Step 2. for j := 1 to n do F [0, j] <— 0; Step 3. F [1,1] <— 0. Step 4. (*- - Compute the value of an optimal placement - - *) for j := 2 to n do for k:= 1 to min (p,j) do begin t1 <— F[k, j - 1] + k(p - k)(xj - xj-1); t2 <— F[k - 1, j - 1] + (k - 1)(p - k + 1)(xj - xj-1); if t1 > t2, then (*- - do not include xj - - *) F[k, j] <— t1; else (*- - Include xj - - *) F[k, j] <— t2; end; —>

The Algorithm cont. Step 5. (*- - Construct an optimal placement - - *) P <— {x1}; k <— p; j <— n; while k > 1 do begin if F[k, j] = F[k - 1, j - 1] + (k - 1)(p- k + 1)(xj - xj-1), then (*- - xj to be included in optimal placement - - *) begin P <— P U {xj}; k <— k - 1; end; j <— j - 1; end; Step 6. Output P.

2-D MAFD Heuristic • Uses 1-D MAFD algorithm as the base • Gives a π/2-approximation

How it works • given V = {v1, v2, …, vn} • vi = {xi, yi} (coordinates) • p <= n = |V|

The Algorithm • Step 1. Obtain the projections of the given set V of points on each of the four axes defined by the equations • y = 0, y = x, x = 0, and y = -x • Step 2. Find optimal solutions to each of the four resulting instances of 1-D MAFD. • Step 3. Return the placement corresponding to the best of the four solutions found in Step 2.

Relation to Study Group Formation & High Variance Clusters • These create one maximum distance group, not k max distance groups • If you want k-HVclusters, set p = n/k and run the algorithm (whichever you choose) k-1 times (last n/k points are the last cluster • This could guarantee that the first couple of groups have a high variance, but not the later ones

Study group formation • Most study groups only study one subject • If you wanted to assign students one study group per subject, you could simplify their attributes to one dimension per subject and solve each subject optimally. • Instead of the exact algorithm described, minimize the distance from the mean, but stay on the opposite side of the mean as the teacher node • Maybe have positive & negative distances to reflect which side of the mean a point is on • This would ensure that people who would learn (under mean) would be picked before people who would not learn • You want multiple study groups and highest amount of learning • Not sure how to do this…

References • S.S. Ravi, D.J. Rosenkrantz, and G.K. Tayi. 1994. Heuristic and Special Case Algorithms for Dispersion Problems.

HEURISTIC & SPECIAL CASE ALGORITHMS FOR DISPERSION PROBLEMS - RAVI, ROZENKRANTZ, TAYI