1 / 30

Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition

Frédéric Gava. Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition. Background. Implicit. Explicit. BSML. Automatic parallelization. skeletons. Data-parallelism. Parallel extensions. Concurrent programming. Parallel programming. Projects. 2002-2004

gavin-hale
Download Presentation

Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Frédéric Gava Bulk-Synchronous Parallel ML Implementation of the Parallel Superposition

  2. Background Implicit Explicit BSML Automatic parallelization skeletons Data-parallelism Parallel extensions Concurrent programming Parallel programming

  3. Projects • 2002-2004 • ACI Grid • LIFO, LACL, PPS, INRIA • Design of parallel and Gridlibrairies for OCaml. • 2004-2007 • ACI « Young researchers » • LIFO, LACL • Production of a programming environment in which certified parallel programs can be written and safelyexecuted.

  4. Outline • The BSML language • Multi-programming (superposition) • Implementation of the superposition • Conclusion and future works

  5. The BSML language

  6. The BSML « spirite » • Bugs grow faster than Moore’s law. (G. Berry) • High-level language  lines of code  number of bugd • Certified library  number of bugs • Small is beautiful. (R. H. Bisseling) • BSML only use 5 primitives… • Who would drive a non-deterministic car ? (G. Berry) • Propriety of confluence of the semantic of BSML • French Proverb : « All the roads go to Roma » But the better way is to choose the shorter • One can give BSP costs to BSML programs • Different of concurrent programming : cost and confluence

  7. The BSP model Unit of synchronization P/M P/M P/M P/M P/M Network BSP architecture: • Characterized by: • pNumber of processors • rProcessors speed • LGlobal synchronization • gPhase of communication (1 word at most sent of received by each processor)

  8. Model of execution wi Super-step i ghi L wi+1 Super-step i+1 ghi+1 L Beginning of the super-step i Local computing on each processor Global (collective) communications between processors Global synchronization : exchanged data available for the next super-step Cost(i) = (max0x<p wxi) + hig + L

  9. Example : broadcast Direct broadcast (one super-step): BSP cost = png + L Broadcast with2 super-steps: BSP cost = 2ng + 2L

  10. The BSML language ML Parallel primitives Parallel constructions BS-calculus BSML -calculus • Structured parallelism as an explicit parallel extension of ML • Functionallanguage with BSP cost predictions • Allows the implementation of skeletons • Implemented as a parallel library for the "Objective Caml" language • Using a parallel data structure called parallel vector

  11. A BSML program f0 g0 f1 g1 … … fp-1 gp-1 Parallel part Replicated part Sequential part

  12. Parallel primitives of BSML • Asynchronous primitives: • Creation of a vector (creation of local values) mkpar : (int  )   par • Parallel point-wize application apply : () par  par  par • Synchronous and communications primitives: • Communications put : (int) par  (int) par • Projection of local values (to be replicated) proj :  par  (int)

  13. Semantics Small-steps semantics Distributed semantics Programming model Easy for proofs (Coq) Natural semantics Easy for costs Execution model Make asynchronous steps appear Close to a real implemantation

  14. Natural semantics • Semantics = set of axioms and inference rules • Easy to understand, makes proofs more easy • Example:

  15. Small steps semantics Local costs • Semantics = set of rewriting rules • Using contexts for the strategy • Easier understanding of costs and errors • Example: Global cost

  16. Distributed semantics Parts of the parallel vector Prog Prog Prog Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid Natural • Semantics = set of parallel rewriting rules • SPMD style: Parallel vector Distributed evaluation

  17. Multi-programming

  18. Parallel composition • Several programs on the same machine • Primitive of parallelcomposition: Superposition • Divide-and-conquer BSP algorithms

  19. Parallel Superposition • super: (unit ) (unit  b)   b • superE1E2 (E1 (), E2()) • Fusion of communications/synchronisations using super-threads • Keep the BSP model • Pure functional semantics

  20. Parallel Superposition

  21. Implementationof the superposition

  22. Semantics (1) • Natural semantics : • Small-step semantics: • Solution, the super-threads :

  23. Semantics (2) • Management of the communications : • Management of the superposition :

  24. Semantics based implementation • The semantics makes appear 3 low level primitives : • Send to send the data of the environment of communication • Rcv to received them • Wait to allow a super-thread to wait his brother • BSML primitives are thus simple calls of them (as in the small-steps semantics) • Super-threads could be implemented using threads • A scheduler of this threads is thus need for the special management of our super-threads • The environment of communications is just a Hashtable with pid of super-threads as keys

  25. Example, prefixes calculus scan : ()   par   par scan (+) <v0, …, vp-1> = <v0, v0+v1, …, v0+v1+…+ vp-1> scan (+) <v0, …, vm, …> = < w0 , … , wm , …> scan (+) <… ,vm+1, …, vp-1> =<…, wm+1 , … , wp+1> < w0 , … , wm , wm+wm+1, … , wm+wp+1> = <v0, v0+v1, v0+…+vm, v0+…+vm+1,…, v0+…+vp-1>

  26. Benchmarks Direct method (BSML+MPI) D-a-C method with superposition D-a-C method with juxtaposition Time (s) Size of the polynomials

  27. Conclusion and future works

  28. Conclusion • BSML=BSP+ML • Superposition = primitive of parallel composition • Small-step semantics of the superposition • Distributed semantics as small one • Superposition implemented using threads as in the small-step semantics

  29. Future works • Implementation using continuation (transformation of source’s code with the help of a type checker) and proof of equivalence using our semantics • Implentation of bigger algorithms for better benchmarks of BSML and its superposition • Implementation of parallel skeletons (management of tasks) using the superposition ? • BSP model-checking of high-level Petri-nets (M-nets). The main difficult : find a non-trivial algorithm as the community of concurrent programming does. Possible but need more theoretical optimisations…

  30. Thanks for your attention

More Related