1 / 47

Implementing a Task Decomposition

Implementing a Task Decomposition. Introduction to Parallel Programming – Part 9. Review & Objectives. Previously: Described how the OpenMP task pragma is different from the for pragma

elam
Download Presentation

Implementing a Task Decomposition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Implementing a Task Decomposition Introduction to Parallel Programming – Part 9

  2. Review & Objectives • Previously: • Described how the OpenMP taskpragma is different from the forpragma • Showed how to code task decomposition solutions for while loop and recursive tasks, with the OpenMP task construct • At the end of this part you should be able to: • Design and implement a task decomposition solution

  3. Case Study: The N Queens Problem Is there a way to place N queens on an N-by-N chessboard such that no queen threatens another queen?

  4. A Solution to the 4 Queens Problem

  5. Exhaustive Search

  6. Design #1 for Parallel Search • Create threads to explore different parts of the search tree simultaneously • If a node has children • The thread creates child nodes • The thread explores one child node itself • Thread creates a new thread for every other child node

  7. Design #1 for Parallel Search Thread W Thread W New Thread X New Thread Y New Thread Z

  8. Pros and Cons of Design #1 • Pros • Simple design, easy to implement • Balances work among threads • Cons • Too many threads created • Lifetime of threads too short • Overhead costs too high

  9. Design #2 for Parallel Search • One thread created for each subtree rooted at a particular depth • Each thread sequentially explores its subtree

  10. Design #2 in Action Thread 1 Thread 2 Thread 3

  11. Pros and Cons of Design #2 • Pros • Thread creation/termination time minimized • Cons • Subtree sizes may vary dramatically • Some threads may finish long before others • Imbalanced workloads lower efficiency

  12. Design #3 for Parallel Search • Main thread creates work pool—list of subtrees to explore • Main thread creates finite number of co-worker threads • Each subtree exploration is done by a single thread • Inactive threads go to pool to get more work

  13. Work Pool Analogy • More rows than workers • Each worker takes an unpicked row and picks the crop • After completing a row, the worker takes another unpicked row • Process continues until all rows have been harvested

  14. Design #3 in Action Thread 1 Thread 2 Thread 3 Thread 3 Thread 1

  15. Pros and Cons of Strategy #3 • Pros • Thread creation/termination time minimized • Workload balance better than strategy #2 • Cons • Threads need exclusive access to data structure containing work to be done, a sequential component • Workload balance worse than strategy #1 • Conclusion • Good compromise between designs 1 and 2

  16. Implementing Strategy #3 for N Queens • Work pool consists of N boards representing N possible placements of queen on first row

  17. Parallel Program Design • One thread creates list of partially filled-in boards • Fork: Create one thread per core • Each thread repeatedly gets board from list, searches for solutions, and adds to solution count, until no more board on list • Join: Occurs when list is empty • One thread prints number of solutions found

  18. Search Tree Node Structure • /* The ‘board’ struct contains information about a node in the search tree; i.e., partially filled- in board. The work pool is a singly linked list of ‘board’ structs. */ • struct board { • int pieces; /* # of queens on board*/ • int places[MAX_N]; /* Queen’s pos in each row */ • struct board *next; /* Next search tree node */ • };

  19. Key Code in main Function • struct board *stack; • ... • stack = NULL; • for (i = 0; i < n; i++) { • initial=(struct board *)malloc(sizeof(struct board)); • initial->pieces = 1; • initial->places[0] = i; • initial->next = stack; • stack = initial; • } • num_solutions = 0; • search_for_solutions (n, stack, &num_solutions); • printf ("The %d-queens puzzle has %d solutions\n", n, num_solutions);

  20. Insertion of OpenMP Code • struct board *stack; • ... • stack = NULL; • for (i = 0; i < n; i++) { • initial=(struct board *)malloc(sizeof(struct board)); • initial->pieces = 1; • initial->places[0] = i; • initial->next = stack; • stack = initial; • } • num_solutions = 0; • #pragmaomp parallel • search_for_solutions(n, stack, &num_solutions); • printf ("The %d-queens puzzle has %d solutions\n", n, num_solutions);

  21. Original C Function to Get Work • void search_for_solutions (int n, • struct board *stack, int *num_solutions) • { • struct board *ptr; • void search (int, struct board *, int *); • while (stack != NULL) { • ptr = stack; • stack = stack->next; • search (n, ptr, num_solutions); • free (ptr); • } • }

  22. C/OpenMP Function to Get Work • void search_for_solutions (int n, • struct board *stack, int *num_solutions) • { • struct board *ptr; • void search (int, struct board *, int *); • while (stack != NULL) { • #pragmaomp critical • { ptr = stack; stack = stack->next; } • search (n, ptr, num_solutions); • free (ptr); • } • }

  23. Original C Search Function • void search (int n, struct board *ptr, • int *num_solutions) • { • int i; • int no_threats (struct board *); • if (ptr->pieces == n) { • (*num_solutions)++; • } else { • ptr->pieces++; • for (i = 0; i < n; i++) { • ptr->places[ptr->pieces-1] = i; • if (no_threats(ptr)) • search (n, ptr, num_solutions); • } • ptr->pieces--; • } • }

  24. C/OpenMP Search Function • void search (int n, struct board *ptr, • int *num_solutions) • { • inti; • intno_threats (struct board *); • if (ptr->pieces == n) { • #pragmaomp critical • (*num_solutions)++; • } else { • ptr->pieces++; • for (i = 0; i < n; i++) { • ptr->places[ptr->pieces-1] = i; • if (no_threats(ptr)) • search (n, ptr, num_solutions); • } • ptr->pieces--; • } • }

  25. Only One Problem: It Doesn’t Work! • OpenMP program throws an exception • Culprit: Variable stack Heap stack

  26. Problem Site • int main () • { • struct board *stack; • ... • #pragmaomp parallel • search_for_solutions(n, stack, &num_solutions); • ... • } • void search_for_solutions (int n, • struct board *stack, int *num_solutions) • { • ... • while (stack != NULL) ...

  27. stack stack 1. Both Threads Point to Top Thread 1 Thread 2

  28. stack 2. Thread 1 Grabs First Element Thread 1 Thread 2 stack ptr

  29. 3. Thread 2 Grabs “Next” Element Error #1 Thread 2 grabs same element Thread 1 Thread 2 stack stack ptr ptr

  30. stack 4. Thread 1 Deletes Element Error #2Thread 2’s stack pointer dangles Thread 1 Thread 2 stack ptr ?

  31. stack Demonstrate error #2 Thread 1 Thread 2 stack ptr Thread 1 gets hits critical region & reads stack

  32. stack Demonstrate error #2 Thread 1 Thread 2 stack ptr Thread 1 copies stack to ptr

  33. stack Demonstrate error #2 Thread 1 Thread 2 stack ptr Thread 1 advances stack

  34. stack Demonstrate error #2 Thread 1 Thread 2 stack ptr Thread 1 exits critical region

  35. stack Demonstrate error #2 Thread 1 Thread 2 stack ptr Thread 1 frees ptr Thread 2 stack points to undefined value ?

  36. Remedy 1: Make stack Static Thread 1 Thread 2 stack

  37. Remedy 1: Make stack Static Thread 1 Thread 2 stack stack ptr ptr stack Why would this work?

  38. Remedy 1: Make stack Static Thread 1 Thread 2 stack stack ptr ptr stack Thread 1 enters critical region

  39. Remedy 1: Make stackStatic Thread 1 Thread 2 stack stack ptr ptr stack Thread 1 copies stack to ptr

  40. Remedy 1: Make stackStatic Thread 1 Thread 2 stack stack ptr ptr stack Thread 1 advances stack

  41. Remedy 1: Make stackStatic Thread 1 Thread 2 stack stack ptr ptr stack Thread 1 exits critical region

  42. Remedy 1: Make stack Static Thread 1 Thread 2 stack stack ptr ptr stack Thread 1 frees ptr – no dangling memory

  43. Remedy 2: Use Indirection (Best choice) Thread 1 Thread 2 &stack Now data is encapsulated inside function calls and no longer susceptible to overwriting “global/static” variable

  44. Corrected main Function • struct board *stack; • ... • stack = NULL; • for (i = 0; i < n; i++) { • initial=(struct board *)malloc(sizeof(struct board)); • initial->pieces = 1; • initial->places[0] = i; • initial->next = stack; • stack = initial; • } • num_solutions = 0; • #pragmaomp parallel • search_for_solutions(n, &stack, &num_solutions); • printf ("The %d-queens puzzle has %d solutions\n", n, num_solutions);

  45. Corrected Stack Access Function • void search_for_solutions (int n, • struct board **stack, int *num_solutions) • { • struct board *ptr; • void search (int, struct board *, int *); • while (*stack != NULL) { • #pragmaomp critical • {ptr = *stack; • *stack = (*stack)->next; } • search (n, ptr, num_solutions); • free (ptr); • } • }

  46. References • RohitChandra, Leonardo Dagum, Dave Kohr, DrorMaydan, Jeff McDonald, and RameshMenon, Parallel Programming in OpenMP, Morgan Kaufmann (2001). • Barbara Chapman, Gabriele Jost, Ruud van der Pas, Using OpenMP: Portable Shared Memory Parallel Programming, MIT Press (2008). • Michael J. Quinn, Parallel Programming in C with MPI and OpenMP, McGraw-Hill (2004).

More Related