1 / 39

Optimistic and Pessimistic Parallelization

Optimistic and Pessimistic Parallelization. Dependences. Dependences: ordering constraints on program execution “At runtime, this instruction/statement must complete before that one can begin” Dependence analysis: compile-time analysis to determine dependences in program

deanmyers
Download Presentation

Optimistic and Pessimistic Parallelization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optimistic and Pessimistic Parallelization

  2. Dependences • Dependences: ordering constraints on program execution • “At runtime, this instruction/statement must complete before that one can begin” • Dependence analysis: compile-time analysis to determine dependences in program • Needed to generate parallel code • Also needed to correctly transform a program by reordering statements • Safe approximation: it is OK for the compiler to assume more ordering constraints than are actually needed for correct execution • At the worst, this prevents you from doing some transformation or parallelization that would have been correct • In contrast, leaving out dependences would be unsafe

  3. Two kinds of dependences • Data dependences • Arise from reads and writes to memory locations • Classified into flow, anti and output dependences • Control dependences • Arise from flow of control • (eg) statements in two sides of if-then-else are control dependent on predicate • We will not worry too much about control dependences in this course

  4. Data Dependence Example • S1: x = 5; • S2: y = x; • S3: x = 3;

  5. Flow Dependence • S1: x = 5; • S2: y = x; • S3: x = 3; (i) S1 is executed before S2 (ii)S1 must write to x before S2 reads from it Flow dependence S1  S2

  6. Anti-dependence • S1: x = 5; • S2: y = x; • S3: x = 3; (i) S2 is executed before S3 (ii)S2 must read variable x before S3 overwrites it Anti-dependence S2  S3

  7. Output Dependence • S1: x = 5; • S2: y = x; • S3: x = 3; (i) S1 is executed before S3 (ii)S1 must write to x before S3 overwrites it Output dependence S1  S3

  8. Summary • Flow dependence Si  Sj • Si is executed before Sj • Si writes to a location that is read by Sj • Anti-dependence Si  Sj • Si is executed before Sj • Si reads from a location that is overwritten by Sj • Output dependence Si  Sj • Si is executed before Sj • Si writes to a location that is overwritten by Sj

  9. Are output dependences needed? • Goal of computing program dependences in a compiler is to determine partial order on program statement executions • From example, it seems that output dependence constraint is covered by transitive closure of flow and anti-dependences • So why do we need output dependences? • Answer: aliases

  10. Aliases • Aliases: two or more program names that may refer to the same location • indirect array references (sparse matrices) • …… • A [X[I]] := …. • A [X[J]] := …. • assignment through pointers (trees, graphs) • …….. • *p1 := … • *p2 := … • call by reference • procedure foo(var x, var y) //call by reference • x : = 2; • y : = 3; • May-aliases: two names that may or may not refer to the same location • Must-aliases: two names that we know for sure must refer to the same location every time the statements are executed

  11. Dependence analysis and aliasing • What constraints should we assume between these program statements? • …… • A [X[I]] := …. • A [X[J]] := …. • Answer: if the two names are aliases, we must order the two statement executions by a dependence to be safe • What kind of dependence makes sense? • output dependence

  12. Dependence analysis in the presence of aliasing • Flow dependence Si  Sj • Si is executed before Sj • Si may write to location that may be read by Sj • Anti-dependence Si  Sj • Si is executed before Sj • Si may read from location that may be overwritten by Sj • Output dependence Si  Sj • Si is executed before Sj • Si may write to location that may be overwritten by Sj

  13. Dependences in loops • for (i = 0; i < 5; i++) { • S1: t = i * A[i]; • S2: A[i] = 3 * t; • } • S1 and S2 are executed many times • What does it mean to have a dependence S1  S2? • Answer: by convention, this means that there is a • dependence between some two instances of S1 and S2

  14. Dependence analysis in the presence of loops and aliasing • Flow dependence Si  Sj • Instance of Si is executed before instance of Sj • That instance of Si may write to a location that may be read by that instance of Sj • Anti-dependence Si  Sj • Instance of Si is executed before instance of Sj • That instance of Si may read from a location that may be overwritten by that instance of Sj • Output dependence Si  Sj • Instance of Si is executed before instance of Sj • That instance of Si may write to a location that may be overwritten by that instance of Sj

  15. Dependence Example • for (i = 0; i < 5; i++) { • S1: t = i * A[i]; • S2: A[i] = 3 * t; • } o output S1 flow anti S2 If we think of A as a single monolithic location, there would be an output dependence S2  S2 More refined picture: treat each element of A as a different location  no output dependence S2  S2

  16. Parallel execution of loop • for (i = 0; i < 5; i++) { • S1: t = i * A[i]; • S2: A[i] = 3 * t; • } o output S1 flow anti S2 Can we execute loop iterations in parallel? Dependence inhibits parallel execution only if dependent source and destination statement instances are in different loop iterations In this example, dependences S1  S2 do not prevent parallel execution of loop iterations but other two dependences do.

  17. Loop-carried vs loop-independent dependence • for (i = 0; i < 5; i++) { • S1: t = i * A[i]; • S2: A[i] = 3 * t; • } o output S1 flow anti S2 If source and destination of dependence are in different iterations, we will say dependence is loop-carried dependence. Otherwise it is loop-independent dependence. In this example, dependences S1  S2 are loop-independent. Only loop-carried dependences inhibit inter-iteration parallelism.

  18. Transformations to enhance parallelism • for (i = 0; i < 5; i++) { • S1: t[i]= i * A[i]; • S2: A[i] = 3 * t[i]; • } S1 flow S2 In many programs, we can perform transformations to enhance parallelism In this example, all dependences are loop-independent. So all iterations can be executed in parallel. This is called a DO-ALL loop. To get this parallel version, we expanded variable t into an array. Transformation: scalar expansion

  19. Example • for (i = 1; i <= N; i++) • for (j = 1; j <= N; j++) • for (k = 1; k <= N; k++) • C[i,j] = C[i,j] + A[i,k]*B[k,j} What is the dependence graph? Notice that two outer loops are parallel. Inner loop is not parallel (unless you allow additions to be done in any order) Notion of loop-carried/loop-independent dependence must be generalized for multiple loop case: dependence vectors

  20. Questions • How do we compute the dependence relation of a loop nest? • What transformations should we perform to enhance parallelism in loop nests? • What abstractions of the dependence relation are useful for parallelization and transformation of loop nests? • Answers are very dependent on data structures used in the code • We know a lot about array programs • Problem is much harder for programs that manipulate irregular data structures like graphs

  21. Parallelizing Irregular Problems

  22. Delaunay Meshes • Meshes useful for • Finite element method for solving PDEs • Graphics rendering • Delaunay meshes (2-D) • Triangulation of a surface, given vertices • Delaunay property: circumcircle of any triangle does not contain another point in the mesh • Related to Voronoi diagrams

  23. Delaunay Mesh Refinement • Want all triangles in mesh to meet quality constraints • (e.g.) no angle < 30° • Mesh refinement: fix bad triangles through iterative refinement

  24. Iterative Refinement • Choose “bad” triangle

  25. Mesh Refinement • Choose “bad” triangle • Add new vertex at center of circumcircle

  26. Mesh Refinement • Choose “bad” triangle • Add new vertex at center of circumcircle • Gather all triangles that no longer satisfy Delaunay property into cavity

  27. Mesh Refinement • Choose “bad” triangle • Add new vertex at center of circumcircle • Gather all triangles that no longer satisfy Delaunay property into cavity • Re-triangulate affected region, including new point

  28. Mesh Refinement • Choose “bad” triangle • Add new vertex at center of circumcircle • Gather all triangles that no longer satisfy Delaunay property into cavity • Re-triangulate affected region, including new point • Add newly created bad triangles to worklist • Iterate till no more bad triangles • Final mesh depends on the order of processing of bad triangles but any order will give you a good mesh at the end

  29. Refinement Example Original Mesh Refined Mesh

  30. Program Mesh m = /* read in mesh */ WorkQueue wq; wq.enqueue(mesh.badTriangles()); while (!wq.empty()) { Triangle t = wq.dequeue(); //choose bad triangle Cavity c = new Cavity(t); //determine new vertex c.expand(); //determine affected triangles c.retriangulate(); //re-triangulate region m.update(c); //update mesh wq.enqueue(c.badTriangles()); //add new bad triangles to queue }

  31. Parallelization Opportunities Bad triangles with non-overlapping cavities can be processed in parallel.

  32. Parallelization Study • Estimated available parallelism for mesh of 1M triangles • Actual ability to exploit parallelism dependent on scheduling of processing • C. Antonopolous, X. Ding, A. Chernikov, F. Blagojevic, D. Nikolopolous and N. Chrisochoides Multigrain parallel Delaunay Mesh generation, ICS05

  33. Problem • Identifying dependences at compile time tractable for scalars, arrays • Second half of course... • What if we cannot determine statically what the dependences are? • Pointer based data structures hard to analyze • Dependences may be input-dependent • This is the case for mesh generation 

  34. Solution: optimistic parallelization • Like speculative execution in microprocessors • Execute speculatively, correct if mistake is made • Idea: Execute code in parallel speculatively • Perform dynamic checks for dependence between parallel code • If dependence is detected, parallel execution was not correct • Roll back execution and try again!

  35. Program Transformation: use the right abstractions • Using a queue introduces dependences that have nothing to do with the problem • Abstractly, all we need is a set of bad triangles • Queue is over-specification • WorkSet abstraction • getAny operation • Does not make ordering guarantees • Removes dependence between iterations • In the absence of cavity interference, iterations can execute in parallel • Replace WorkQueue with WorkSet • compare this with scalar expansion in array case • similar in spirit

  36. Rewritten Program Mesh m = /* read in mesh */ WorkSet ws; ws.add(g.badNodes()); while (!ws.empty()) { Triangle t = ws.getAny(); //choose bad node Cavity c = new Cavity(t); //determine new vertex c.expand(); //determine affected triangles c.retriangulate(); //re-triangulate region m.update(c); //update mesh ws.add(c.badTriangles()); //add new bad nodes to set }

  37. Optimistic parallelization • Can now exploit “getAny” to parallelize loop • Can try to expand cavities in parallel • Expansions can still conflict • In practice, most cavities can be expanded in parallel safely • No way to know this a priori • Only guaranteed safe approach is serialization • What if we perform parallelization without prior guarantee of safety?

  38. Parallelization Issues • Must ensure that run-time checks are efficient • How do we perform roll backs • Would like to minimize conflicts • Scheduling becomes important • Number of available cavities for expansion exceeds computational resources • Choose cavities to expand to minimize conflicts • Empirical testing: ~30% of cavity expansions conflict

  39. Questions • What are the right abstractions for writing irregular programs? • WorkQueue vs. WorkSet • How do we determine where to apply optimistic parallelization? • How do we perform dependence checks dynamically w/o too much overhead? • Can hardware support help? • How do we implement roll-backs? • How do we reduce the probability of roll-backs?

More Related