1 / 28

Bulk-Synchronous Parallel ML Semantics and Implementation of the Parallel Juxtaposition

Frédéric Gava. Bulk-Synchronous Parallel ML Semantics and Implementation of the Parallel Juxtaposition. Background. Implicit. Explicit. BSML. Automatic parallelization. skeletons. Data-parallelism. Parallel extensions. Concurrent programming. Parallel programming. Projects.

teva
Download Presentation

Bulk-Synchronous Parallel ML Semantics and Implementation of the Parallel Juxtaposition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Frédéric Gava Bulk-Synchronous Parallel ML Semantics and Implementation of the Parallel Juxtaposition

  2. Background Implicit Explicit BSML Automatic parallelization skeletons Data-parallelism Parallel extensions Concurrent programming Parallel programming

  3. Projects • 2002-2004 • ACI Grid • LIFO, LACL, PPS, INRIA • Design of parallel and Gridlibrairies for OCaml. • 2004-2007 • ACI « Young researchers » • LIFO, LACL • Production of a programming environment in which certified parallel programs can be written and safelyexecuted.

  4. Outline • The BSML language • Parallel compositions • Superposition : types and semantics • Juxtaposition : types and semantics • Implementation of the juxtaposition • Conclusion and future works

  5. The BSML language

  6. The BSP model Unit of synchronization P/M P/M P/M P/M P/M Network BSP architecture: • Characterized by: • pNumber of processors • rProcessors speed • LGlobal synchronization • gPhase of communication (1 word at most sent of received by each processor)

  7. Model of execution T(s) = (max0i<p wi) + hg + L

  8. Example : broadcast Direct broadcast: cost = png + L Broadcast with2 phases : cost = 2ng + 2L

  9. The BSML language ML Parallel primitives Parallel constructions BS-calculus BSML -calculus • Structured parallelism as an explicit parallel extension of ML • Functionallanguage with BSP cost predictions • Allows the implementation of skeletons • Implemented as a parallel library for the "Objective Caml" language • Using a parallel data structure called parallel vector

  10. A BSML program f0 g0 f1 g1 … … fp-1 gp-1 Parallel part Replicated part Sequential part

  11. Parallel primitives of BSML • Asynchronous primitives: • Creation of a vector mkpar : (int  )   par • Parallel point-wize application apply : () par  par  par • Synchronous and communications primitives: • Communications put : (int option) par(int option) par • Projection of values proj :  option par(int option)

  12. Semantics Small-steps semantics Distributed semantics Programming model Easy for proofs (Coq) Natural semantics Easy for costs Execution model Make asynchronous steps appear Close to a real implemantation

  13. Parallel compositions

  14. Multi-programming • Several programs on the same machine • New primitives of parallelcomposition: • Superposition • Juxtaposition (implanted with the superposition) • Divide-and-conquer BSP algorithms

  15. Parallel Superposition • super: (unit ) (unit  b)   b • superE1E2 (E1 (), E2()) • Fusion of communications/synchronisations using super-threads • Keep the BSP model • Pure functional semantics

  16. Parallel Superposition

  17. Parallel juxtaposition v m-1 v’ p-1-m v 0 v 1 v i v’ 0 v’ 1 v’ j … … … … = … … … … v 0 v i v m-1 v’ 0 v’ j v’ p-1-m • juxta: int(unit par)(unit  par)   par • Fusion of communications/synchronisations on each sub-machine • Keep the BSP model • Side-effect on the number of processors Juxtam

  18. Parallel juxtaposition Communications Synchronisation Communications E2 Synchronisation E1 Communications Synchronisation E3 = (juxta 3 E1 E2) Communications Communications Synchronisation Synchronisation

  19. Distributed semantics Parts of the parallel vector Prog Prog Prog Prog scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid op vec) (fun()->scan'(mid+1) lst op vec))in let com = ...(* send wm to processes m+1…p+1 *) let op’ = ...(* applies op to wm and wi, m<i<p *) in parfun2 op’ com vec’ in scan' 0 (bsp_p()-1) op vec scan op vec = let rec scan' fst lst op vec = if fst>=lst then vec else let mid=(fst+lst)/2 in let vec'= mix mid (super (fun()->scan' fst mid Natural • Semantics = set of parallel rewriting rules • SPMD style: Parallel vector Distributed evaluation • Confluent • Equivalent

  20. Implementationof the juxtapositon

  21. Use of the superposition • 2 references that contain the number of processors of a sub-machine and the real PID of the virtual processor 0 (on a sub-machine) • Creation of uncompleted vectors • Each sub-machine in a super-thread

  22. Example, parallel prefixes Processors v op v v’ op a b op c d op e f op g h a c e g scan: ()   par   par scan (+) <v0, …, vp-1> = <v0, v0+v1, …, v0+v1+…+vp-1>

  23. Juxta versu Super • Code of a direct method : 12 lines • Code with superposition : 8 lines • Code with juxtaposition : 6 lines

  24. Performances Direct method (BSML+MPI) D-a-C method with superposition D-a-C method with juxtaposition Time (s) Size of the polynomials

  25. Conclusion and future works

  26. Conclusion • BSML=BSP+ML • Superposition = primitive of parallel composition • Juxtaposition is easier for divide-and-conquer algorithms • Distributed semantics of the juxtaposition • Juxtaposition implemented using superposition • Similar performances

  27. Future works • Proofs of the implementation using semantics • Implentation of bigger algorithms • BSP model-checking of high-level Petri-nets (M-nets)

  28. Thanks for your attention

More Related