1 / 17

A Lock-free Multi-threaded Algorithm for the Max-flow Problem

A Lock-free Multi-threaded Algorithm for the Max-flow Problem. Bo Hong Electrical and Computer Engineering Department Drexel University bohong@coe.drexel.edu http://www.ece.drexel.edu/faculty/bohong. The Max-flow Problem. 4. 6. 5. 8. 4. a. S. t. c. b. d. 7. 6. 4. 3. Find:

sheryl
Download Presentation

A Lock-free Multi-threaded Algorithm for the Max-flow Problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Lock-free Multi-threaded Algorithm for the Max-flow Problem Bo Hong Electrical and Computer Engineering Department Drexel University bohong@coe.drexel.edu http://www.ece.drexel.edu/faculty/bohong

  2. The Max-flow Problem 4 6 5 8 4 a S t c b d 7 6 4 3 • Find: • maximum flow from s to t • Subject to: • edge capacity constraints • zero net-flow for u є V- {s,t} Bo Hong

  3. Existing algorithms • Sequential Algorithms • Augmenting Path • Ford-Fulkerson , pseudo-polynomial • Edmonds and Karp, O(|V|∙|E|2) • Dinitz, O(|V|2∙|E|) • Preflow Push • Karzanov, O(|V|3) • Push-Relabel • Goldberg, O(|V|2∙|E|), with dynamic trees O(|V| ∙ |E| ∙ log(|V|2∙|E|) ) • Parallel Algorithms • Shiloach, etc. O(|V|2 ∙log|V| ) with |V|-processor PRAM • Goldberg, O(|V|2 ∙log|V| ) with |V|-processor PRAM • Anderson, etc. Global relabeling • Bader, etc. Gap relabeling Bo Hong

  4. Two vertex properties 3 1 a a b c b c t t S S d d 3 0 Excessive flow: the net flow into a vertex e.g. e(c) = 5 Every vertex has an integer valued height e.g. h(c) = 2 Bo Hong

  5. Existing parallel algorithms a a b c b c t t S S d d • Push: • applicable when e(a)>0 and • there exists cf(a,v) > 0 and h(v)=h(a)-1 • Actions: • Lock a and v • a->v still pushable? • d = min( e(a), cf(a,v) ) • e(a) = e(a) – d • e(v) = e(v) + d • cf(a,v) = cf(a,v) – d • cf(v,a) = cf(v,a) + d • Unlock a and v • Lift: • applicable when e(c)>0 and • all cf(c,x) > 0 implies h(x) ≥ h(c) • Actions: • Lock v • v = lowest such vertex x • h(c) = h(v) + 1 • Unlock v Bo Hong

  6. Impact of locking Locks protect shared accesses But locks are expensive • P1 • Lock • x ← x+1 • Unlock • P2 • Lock • x ← x+1 • Unlock l n 16 l 14 n l l n 12 n l Read x Increase 1 Update x Actual l 10 l T n l n Lock acquisition time (us) n 8 n l 6 n l n 4 Read x Increase 1 Update x l time n Ideal l l 2 n l u u u uuuuuuuuuuuu l n n n n s u l 0 3 5 7 9 11 13 15 Number of processors Bo Hong

  7. New lock-free algoritm: model of the architecture • SMP computer with multiple processors sharing the memory • Multi-processor systems • Multi-core systems • Supports atomic ‘fetch-and-add’ instruction • Supports sequential consistency • P1 • x ← x+c1 • … • x ← x+c2 • P2 • x ← x+c3 • … • x ← x+c4 • Eventual result • x ← x+c1+c2+c3+c4 not matter how exactly the instructions were interleaved. Bo Hong

  8. New algorithm: two basic lock-free operations a a b c b c t t S S d d • Push: • applicable when e(a)>0 and • there exists cf(a,x) > 0 and h(x)<h(a) • Actions: • v = lowest such vertex x • d = min( e(a), cf(a,v) ) • e(a) = e(a) – d • e(v) = e(v) + d • cf(a,v) = cf(a,v) – d • cf(v,a) = cf(v,a) + d • Lift: • applicable when e(c)>0 and • all cf(c,x) > 0 implies h(x) ≥ h(c) • Actions: • v = lowest such vertex x • h(c) = h(v) + 1 Bo Hong

  9. The algorithm a b c t S d Initialize h(u), e(u), and f(u,v) h(s) = |V| h(u) = 0 for u є V – {s} f(s,u) = c(s,u) e(u) = c(s,u) f(u,v) = 0, otherwise While there exists applicable push or lift operations execute the push or lift operations asynchronously Bo Hong

  10. Asynchronous execution of the basic operations • while e(u) > 0 • e’ = e(u) • h’ = ∞ • for each (u,v) s.t. cf(u, v) > 0 • if h(v) < h’ • h’ = h(v) • v’ = v • if h(u) > h’ • d = min ( e’, cf(u, v’) ) • cf(u, v’) = cf(u, v’) + d • cf(v’, u) = cf(v’, u) – d • e(u) = e(u) – d • e(v’) = e(v’) + d • else • h(u) = h’ + 1 P1 • while e(u) > 0 • e’ = e(u) • h’ = ∞ • for each (u,v) s.t. cf(u, v) > 0 • if h(v) < h’ • h’ = h(v) • v’ = v • if h(u) > h’ • d = min ( e’, cf(u, v’) ) • cf(u, v’) = cf(u, v’) + d • cf(v’, u) = cf(v’, u) – d • e(u) = e(u) – d • e(v’) = e(v’) + d • else • h(u) = h’ + 1 P2 Bo Hong

  11. Seems rather chaotic? . . . Not really P1 P2 • while e(u) > 0 • e’ = e(u) • h’ = ∞ • for each (u,v) s.t. cf(u, v) > 0 • if h(v) < h’ • h’ = h(v) • v’ = v • if h(u) > h’ • d = min ( e’, cf(u, v’) ) • cf(u, v’) = cf(u, v’) + d • cf(v’, u) = cf(v’, u) – d • e(u) = e(u) – d • e(v’) = e(v’) + d • else • h(u) = h’ + 1 time or time Bo Hong

  12. An invariant property of the algorithm As long as cf(u,v) and e(u) are updated atomically, we always have h(u) ≤ h(v) + 1 for any cf(u,v) > 0, no matter how the threads are interleaved. Bo Hong

  13. Optimality of the algorithm • If any e(u) > 0, then the algorithm will not terminate Property of the push and lift operations • If the algorithm terminates, then there is no path from s to t in the residual graph Proof by contradiction, if such path exists, then the invariant property of function f has to be broken • If the algorithm terminates, it finds a maximum flow Termination implies all e(u)=0, meaning this is a feasible flow. No path from s to t, by max-flow min-cut theorem, it has to be a maximum flow Bo Hong

  14. Convergence of the algorithm (complexity bound) • For any u s.t. e(u) > 0, there exists a path from u to s in the residual graph Property of network flow • The height of any vertex is less than 2|V| - 1 The longest path can have at most |V| vertices • The total number of lift operations is bound by 2|V|2-|V| Bound by the height of vertices • The total number of saturated pushes is bound by (2|V|-1)∙|E| Bound by the total number of lift operations • The total number of un-saturated pushes is bound by 4|V|2 ∙|E| Bound by the number of lift and saturated pushes • Therefore the algorithm terminates with O(|V|2 ∙|E|) operations Bo Hong

  15. Lock-free termination detection • The algorithm terminates when e(u) = 0 for all u є V – {s,t} • e(u) = 0 at a single thread is insufficient to terminate the thread • An elegant solution: • The net flow out of source s decreases monotonically • The net flow into sink t increases monotonically • When the two values become equal, we must have e(u) = 0 for all u є V – {s,t}, a necessary and sufficient termination condition. Bo Hong

  16. Experimental results • Execution results on 2-way SMP with 3.2GHz Intel Xeon Processors • 4-thread results obtained when hyper-threading was enabled Scalability of the Lock-Free Algorithm Comparison Against Classical Lock-Based Algorithm Bo Hong

  17. Summary and future work • Developed a lock-free multi-threaded algorithm for the max-flow problem • having the same complexity bound as existing parallel algorithms • eliminated lock usages thereby improving thread-level parallelism • 20% improvement over existing lock-based parallel algorithms • Results indicate the effectiveness of algorithmic method in reducing synchronization overheads • Future work • Load balancing across the threads: vertex to thread assignment, static or dynamic or hybrid? • Optimize cache usages • Reduce the number of operations via global and gap relabling • What if edge capacities are floating-point? Bo Hong

More Related