1 / 36

Sound and Precise Analysis of Parallel Programs through Schedule Specialization

Sound and Precise Analysis of Parallel Programs through Schedule Specialization. Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University. Motivation. Analyzing parallel programs is difficult. . precision. Total Schedules. Dynamic Analysis. Analyzed Schedules. ?.

jag
Download Presentation

Sound and Precise Analysis of Parallel Programs through Schedule Specialization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Sound and Precise Analysis ofParallel Programs throughSchedule Specialization Jingyue Wu, Yang Tang, Gang Hu, Heming Cui, Junfeng Yang Columbia University

  2. Motivation • Analyzing parallel programs is difficult. precision Total Schedules Dynamic Analysis Analyzed Schedules ? Static Analysis Analyzed Schedules soundness (# of analyzed schedules / # of total schedules)

  3. Schedule Specialization • Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime. Total Schedules precision Dynamic Analysis Schedule Specialization Enforced Schedules Static Analysis Analyzed Schedules soundness (# of analyzed schedules / # of total schedules)

  4. Enforcing Schedules Using Peregrine • Deterministic multithreading • e.g. DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet (ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11) • Performance overhead • e.g. Kendo: 16%, Tern & Peregrine: 39.1% • Peregrine • Record schedules, and reuse them on a wide range of inputs. • Represent schedules explicitly.

  5. Schedule Specialization • Precision: Analyze the program over a small set of schedules. • Soundness: Enforce these schedules at runtime. precision Dynamic Analysis Schedule Specialization Enforced Schedules Static Analysis Analyzed Schedules soundness (# of analyzed schedules / # of total schedules)

  6. Framework • Extract control flow and data flow enforced by a set of schedules Schedule Specialization Specialized Program Program C/C++ program with Pthread Extra def-use chains Schedule Total order of synchronizations

  7. Outline • Example • Control-Flow Specialization • Data-Flow Specialization • Results • Conclusion

  8. Running Example int results[p_max]; intglobal_id = 0; intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } void *worker(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } Thread 1 Thread 0 Thread 2 create create lock Race-free? unlock lock unlock join join 8

  9. Control-Flow Specialization atoi intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i < p ++i create i = 0 i < p ++i create join create return join join 9

  10. Control-Flow Specialization atoi atoi intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i = 0 i < p ++i i < p create create i = 0 i < p ++i create join create return join join 10

  11. Control-Flow Specialization atoi atoi intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i = 0 i < p ++i i < p create create create i = 0 ++i i < p ++i i < p create join create create return join join 11

  12. Control-Flow Specialization atoi atoi i < p intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); for (i = 0; i < p; ++i) pthread_create(&child[i], 0, worker, 0); for (i = 0; i < p; ++i) pthread_join(child[i], 0); return 0; } i = 0 i = 0 i = 0 i < p ++i i < p i < p create join create i = 0 ++i ++i i < p ++i i < p i < p create join create join create return ++i ++i join i < p join return 12

  13. Control-Flow Specialized Program atoi i < p intmain(intargc, char *argv[]) { inti; int p = atoi(argv[1]); i = 0; // i < p == true pthread_create(&child[i], 0, worker.clone1, 0); ++i; // i < p == true pthread_create(&child[i], 0, worker.clone2, 0); ++i; // i < p == false i = 0; // i < p == true pthread_join(child[i], 0); ++i; // i < p == true pthread_join(child[i], 0); ++i; // i < p == false return 0; } i = 0 i = 0 i < p i < p join create ++i ++i i < p i < p create join ++i ++i i < p return

  14. More Challenges onControl-Flow Specialization • Ambiguity Caller Callee S1 call call S2 ret • A schedule has too many synchronizations

  15. Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = global_id global_id++ unlock lock my_id = global_id global_id++ unlock join join 15

  16. Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = global_id global_id++ unlock lock my_id = global_id global_id++ unlock join join 16

  17. Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = 0 global_id= 1 unlock lock my_id = global_id global_id++ unlock join join 17

  18. Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id= 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); intmy_id = global_id++; pthread_mutex_unlock(&global_id_lock); results[my_id] = compute(my_id); return 0; } global_id = 0 create create lock my_id = 0 global_id= 1 unlock lock my_id = 1 global_id = 2 unlock join join 18

  19. Data-Flow Specialization Thread 1 Thread 0 Thread 2 intglobal_id = 0; void *worker.clone1(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 1; pthread_mutex_unlock(&global_id_lock); results[0] = compute(0); return 0; } void *worker.clone2(void *arg) { pthread_mutex_lock(&global_id_lock); global_id = 2; pthread_mutex_unlock(&global_id_lock); results[1] = compute(1); return 0; } global_id = 0 create create lock my_id = 0 global_id= 1 unlock lock my_id = 1 global_id = 2 unlock join join 19

  20. More Challenges onData-Flow Specialization • Must/May alias analysis • global_id • Reasoning about integers • results[0] = compute(0) • results[1] = compute(1) • Many def-use chains

  21. Evaluation • Applications • Static race detector • Alias analyzer • Path slicer • Programs • PBZip2 1.1.5 • aget 0.4.1 • 8 programs in SPLASH2 • 7 programs in PARSEC

  22. Static Race Detector # of False Positives

  23. Static Race Detector # of False Positives

  24. Static Race Detector # of False Positives

  25. Static Race Detector # of False Positives

  26. Static Race Detector: Harmful Races Detected • 4 in aget • 2 in radix • 1 in fft

  27. Precision of Schedule-AwareAlias Analysis

  28. Precision of Schedule-AwareAlias Analysis

  29. Precision of Schedule-AwareAlias Analysis

  30. Conclusion and Future Work • Designed and implemented schedule specialization framework • Analyzes the program over a small set of schedules • Enforces these schedules at runtime • Built and evaluated three applications • Easy to use • Precise • Future work • More applications • Similar specialization ideas on sequential programs

  31. Related Work • Program analysis for parallel programs • Chord (PLDI ’06), RADAR (PLDI ’08), FastTrack (PLDI ’09) • Slicing • Horgon (PLDI ’90), Bouncer (SOSP ’07), Jhala (PLDI ’05), Weiser (PhD thesis), Zhang (PLDI ’04) • Deterministic multithreading • DMP (ASPLOS ’09), Kendo (ASPLOS ’09), CoreDet (ASPLOS ’10), Tern (OSDI ’10), Peregrine (SOSP ’11), DTHREADS (SOSP ’11) • Program specialization • Consel (POPL ’93), Gluck (ISPL ’95), Jørgensen (POPL ’92), Nirkhe (POPL ’92), Reps (PDSPE ’96)

  32. Backup Slides

  33. Specialization Time

  34. Handling Races • We do not assume data-race freedom. • We could if our only goal is optimization.

  35. Input Coverage • Use runtime verification for the inputs not covered • A small set of schedules can cover a wide range of inputs

More Related