1 / 41

Concurrency Idea

Concurrency Idea. Concurrency idea. Challenge Print primes from 1 to 10 10 Given Ten-processor multiprocessor One thread per processor Goal Get ten-fold speedup (or close). 2. Load Balancing. Split the work evenly Each thread tests range of 10 9. 1. 10 10. 10 9. 2·10 9. …. ….

cloris
Download Presentation

Concurrency Idea

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Concurrency Idea

  2. Concurrency idea Challenge Print primes from 1 to 1010 Given Ten-processor multiprocessor One thread per processor Goal Get ten-fold speedup (or close) 2

  3. Load Balancing Split the work evenly Each thread tests range of 109 1 1010 109 2·109 … … P0 P1 P9 3

  4. Procedure for Thread i void primePrint { int i = ThreadID.get(); // IDs in {0..9} for(j = i*109+1, j<(i+1)*109; j++) { if(isPrime(j)) print(j); } } 4

  5. Issues Higher ranges have fewer primes Yet larger numbers harder to test Thread workloads Uneven Hard to predict 5

  6. Issues Higher ranges have fewer primes Yet larger numbers harder to test Thread workloads Uneven Hard to predict Need dynamic load balancing rejected 6

  7. Shared Counter 19 each thread takes a number 18 17 7

  8. Procedure for Thread i int counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); } } 8

  9. Procedure for Thread i Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); } } Shared counter object 9

  10. Where Things Reside cache cache cache Bus Bus void primePrint { int i = ThreadID.get(); // IDs in {0..9} for(j = i*109+1, j<(i+1)*109; j++) { if(isPrime(j)) print(j); } } Local variables code shared memory 1 shared counter 10

  11. Procedure for Thread i Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j = counter.getAndIncrement(); if (isPrime(j)) print(j); } } Stop when every value taken 11

  12. Procedure for Thread i Counter counter = new Counter(1); void primePrint { long j = 0; while (j < 1010) { j =counter.getAndIncrement(); if (isPrime(j)) print(j); } } Increment & return each new value 12

  13. Counter Implementation public class Counter{ private long value; public long getAndIncrement() { return value++; } } 13

  14. Counter Implementation public class Counter { private long value; public long getAndIncrement() { return value++; } } OK for single thread, not for concurrent threads 14

  15. What It Means public class Counter { private long value; public long getAndIncrement() { return value++; } } 15

  16. What It Means public class Counter { private long value; public long getAndIncrement() { return value++; } } temp = value; value = temp + 1; return temp; 16

  17. Not so good… time Value… 1 2 3 2 read 1 write 2 read 2 write 3 read 1 write 2 17

  18. Is this problem inherent? !! !! write read read write If we could only glue reads and writes together… 18

  19. Challenge public class Counter { private long value; public long getAndIncrement() { temp = value; value = temp + 1; return temp; } } 19

  20. Challenge public class Counter { private long value; public long getAndIncrement() { temp = value; value = temp + 1; return temp; } } Make these steps atomic (indivisible) 20

  21. Hardware Solution public class Counter { private long value; public long getAndIncrement() { temp = value; value = temp + 1; return temp; } } ReadModifyWrite() instruction 21

  22. An Aside: Java™ public class Counter { private long value; public long getAndIncrement() { synchronized{ temp = value; value = temp + 1; } return temp; } } 22

  23. An Aside: Java™ public class Counter { private long value; public long getAndIncrement() { synchronized{ temp = value; value = temp + 1; } return temp; } } Synchronized block 23

  24. An Aside: Java™ public class Counter { private long value; public long getAndIncrement() { synchronized { temp = value; value = temp + 1; } return temp; } } Mutual Exclusion 24

  25. Why do we care? We want as much of the code as possible to execute concurrently (in parallel) A larger sequential part implies reduced performance Amdahl’s law: this relation is not linear… 25

  26. Amdahl’s Law Speedup= …of computation given nCPUs instead of 1 26

  27. Amdahl’s Law Speedup= 27

  28. Amdahl’s Law Parallel fraction Speedup= 28

  29. Amdahl’s Law Sequential fraction Parallel fraction Speedup= 29

  30. Amdahl’s Law Sequential fraction Parallel fraction Speedup= Number of processors 30

  31. Example Ten processors 60% concurrent, 40% sequential How close to 10-fold speedup? 31

  32. Example Ten processors 60% concurrent, 40% sequential How close to 10-fold speedup? Speedup = 2.17= 32

  33. Example Ten processors 80% concurrent, 20% sequential How close to 10-fold speedup? 33

  34. Example Ten processors 80% concurrent, 20% sequential How close to 10-fold speedup? Speedup = 3.57= 34

  35. Example Ten processors 90% concurrent, 10% sequential How close to 10-fold speedup? 35

  36. Example Ten processors 90% concurrent, 10% sequential How close to 10-fold speedup? Speedup = 5.26= 36

  37. Example Ten processors 99% concurrent, 01% sequential How close to 10-fold speedup? 37

  38. Example Ten processors 99% concurrent, 01% sequential How close to 10-fold speedup? Speedup = 9.17= 38

  39. Back to Real-World Multicore Scaling Speedup 2.9x 2x 1.8x User code Multicore Not reducing sequential % of code 40

  40. Fine grained parallelism has huge performance benefit The reason we get only 2.9 speedup c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c Shared Data Structures Fine Grained Coarse Grained 25% Shared 25% Shared 75% Unshared 75% Unshared

  41. Multiprocessor Programming This is what this course is about… The % that is not easy to make concurrent yet may have a large impact on overall speedup 43

More Related