using loop perforation to dynamically adapt application behavior to meet real time deadlines n.
Skip this Video
Loading SlideShow in 5 Seconds..
Using Loop Perforation to Dynamically Adapt Application Behavior to Meet Real-Time Deadlines PowerPoint Presentation
Download Presentation
Using Loop Perforation to Dynamically Adapt Application Behavior to Meet Real-Time Deadlines

Loading in 2 Seconds...

  share
play fullscreen
1 / 37
Download Presentation

Using Loop Perforation to Dynamically Adapt Application Behavior to Meet Real-Time Deadlines - PowerPoint PPT Presentation

desiree-bowers
147 Views
Download Presentation

Using Loop Perforation to Dynamically Adapt Application Behavior to Meet Real-Time Deadlines

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Using Loop Perforation to Dynamically Adapt Application Behavior to Meet Real-Time Deadlines Henry Hoffmann, Sasa Misailovic, Stelios Sidiroglou, Anant Agawal and Martin Rinard CSAIL Massachusetts Institute of Technology Cambridge, MA 02139

  2. Outline • Introduction/Motivation • Problem • Solution: Loop Perforation • Loop Perforation • Finding Loops to Perforate • Controlling Perforation Dynamically • Experiments • Using Perforation to Adapt to Faults • Conclusion

  3. Problem • Program is too slow • Misses real-time deadlines

  4. Solution: Loop Perforation Perforate: to make a hole through an object or structure • Loop Perforation: • Do not execute all iterations • Skip some instead Profile Program Find loops that take the most time Perforate those loops for (i = 0; i < n; i++){ … } for (i = 0; i < n; i += 2){ … } • A Perforated Program: • Consumes fewer computational resources • Runs faster, consumes less energy, or both • Can meet its real-time deadlines!

  5. Loop Perforation(cont’d) Q: Won’t perforation change the result? A: Yes, so we target applications that have a range of acceptable outputs Maintain Acceptable Quality of Service Increase Speed ? Perforate Don’t Perforate

  6. Static vs. Dynamic Perforation • Static loop perforation • Speeds up an application for some QoS loss • Allows applications to be repurposed • E.g., a broadcast video encoder can be transitioned to video conferencing • Dynamic loop perforation • Allows full QoS unless something bad happens • When something bad happens system adapts to maintain speed • Determine which loops to perforate using profiling • Our implemented system supports both static and dynamic perforation, • this talk focuses on dynamic perforation

  7. Outline • Introduction/Motivation • Problem • Solution: Loop Perforation • Loop Perforation • Finding Loops to Perforate • Controlling Perforation Dynamically • Experiments • Using Perforation to Adapt to Faults • Conclusion

  8. A Perforating Compiler Responsibility of User Provided as input to the perforating compiler QoS bound – the maximum acceptable loss of QoS C/C++ Program Representative Inputs QoS Metric & Bound • Perforating Compiler • Maximizes speedup for QoS bound • Discards loops which cause: • Slow down • Unacceptable QoS loss • Dynamic errors in Valgrind Find costly loops Profile Program Analyze QoS Perforate Perforatable Loops • Result • Set of Perforatable Loops • Speedup application given QoS bound • Perforation may be dynamic This process is discussed in detail in: Misailovic, Sidiroglou, Hoffmann, Rinard. Quality of Service Profiling. To Appear, ICSE 2010

  9. Use PARSEC Benchmarks to Test Approach *http://parsec.cs.princeton.edu/ • PARSEC Benchmarks* represent emerging workloads • We pick seven benchmark applications for which we can define QoS metric • x264 (H.264 video encoding) • bodytrack (human movement tracking) • swaptions (financial analysis) • ferret (content-based similarity search) • canneal (engineering – circuit place & route) • blackscholes (financial analysis) • streamcluster (online approx. of k-means) • We augment the benchmark suite with additional data sets and divide into • Training (about 25% of inputs) • Production (remaining 75% of inputs)

  10. Performance/QoS Tradeoffs for PARSEC Benchmarks

  11. Dynamically Controlling Perforation Application Heartbeat API Heartbeat API • Application registers a heartbeat using Application Heartbeats API* • Runtime monitors heartbeat • Heartbeat too slow? • Increase perforation to trade QoS for increased performance • Heartbeat too fast? • Decrease perforation to reclaim QoS Loop 1 Runtime Monitor Loop 2 Perforation Selection Perforation Selection Loop i *Hoffmann, Eastep, Santambrogio, Miller, Agarwal. Application Heartbeats for Software Performance and Health. PPoPP 2010

  12. Outline • Introduction/Motivation • Problem • Solution: Loop Perforation • Loop Perforation • Finding Loops to Perforate • Controlling Perforation Dynamically • Experiments • Using Perforation to Adapt to Faults • Conclusion

  13. Evaluation Methodology • Two applications (from PARSEC benchmark suite): • x264 (media application performs H.264 video encoding) • bodytrack (computer vision application tracks a body through a scene) • Two changing environments: • Core Failure: During execution 3 of 8 cores fail • Frequency Scaling: During execution clock frequency rises and falls • For each app and scenario: • Goal: keep performance within .95 to 1.1x that of system with no failures • Measure: • Baseline performance (no failure) • Performance with failure and no perforation • Performance with failure and dynamic perforation

  14. x264 Core Loss Experiment Lose 3 of 8 cores

  15. bodytrack Core Loss Experiment Lose 3 of 8 cores

  16. bodytrack Results (Core Failure) • Maintains track on head, chest, and legs despite loss of 37.5% of compute

  17. x264 Frequency Scaling Experiment Frequency Rises (1.6 GHz → 2.53 GHz) Frequency Drops (2.53 GHz → 1.6 GHz)

  18. bodytrack Frequency Scaling Experiment Frequency Rises (1.6 GHz → 2.53 GHz) Frequency Drops (2.53 GHz → 1.6 GHz)

  19. bodytrack Results (Frequency Scaling) • Perforation allows app to maintain track while frequency is low. • When frequency rises again, high-quality track is reestablished.

  20. Conclusion • Presented loop perforation • Speedup programs by making performance/QoS tradeoffs • Showed as much as 2x speedup for 5% degradation in QoS • Presented dynamic loop perforation • Allow system to detect performance loss and respond by perforating loops • Maintain performance in changing environment • Can respond to any environmental change that affects performance More detail on dynamic perforation available in: Hoffmann, Misailovic, Sidiroglou, Agarwal, Rinard. Using Code Perforation to Improve Performance, Reduce Energy Consumption, and Respond to Failures. MIT-CSAIL-TR-2209-042. August, 2009.

  21. Backup

  22. Perforatable Loops in PARSEC Benchmarks Number of loops

  23. x264, Training

  24. x264, Production

  25. x264 Uncompressed Video Frame Sequence Encoder Compressed Video Stream

  26. Motion Estimation Reference Frame Current Frame ? All Perforated Loops Are In Motion Estimation Computation

  27. x264 Loop Nest Sum of Hadamard transformed differences loop nest (computes match metric between cur and ref blocks) short temp[4][4]; for (i = 0; i < h;i += 4) { for (j = 0; j < w; j += 4) { element_wise_subtract(temp, cur, ref, cs, rs); hadamard_transform(temp, 4); value += sum_abs_matrix(temp, 4); } cur += 4*cs; ref += 4*rs; } return value;

  28. Perforated x264 Loop Nest • Perforation Effect • New block match metric • Uses block with best match(as measured by metric) • New metric works fine Sum of Hadamard transformed differences loop nest (computes match metric between cur and ref blocks) short temp[4][4]; for (i = 0; i < h; i += 8) { for (j = 0; j < w; j += 8) { element_wise_subtract(temp, cur, ref, cs, rs); hadamard_transform(temp, 4); value += sum_abs_matrix(temp, 4); } cur += 4*cs; ref += 4*rs; } return value;

  29. Why Not Just Skip Motion Estimation? Runs 6.8 times faster But encoded video is 3.55 times bigger!

  30. bodytrack Training

  31. bodytrack Production

  32. bodytrack • Particle method • Annealing layers • Dispersed particles • Compute with particles

  33. bodytrack • Next annealing layer • Particle dispersion affected by previous layer • Continue until done with annealing layers

  34. bodytrack Loop for (i = 0; i < layers; i++) { disperse particles for layer do particle computation }

  35. Perforated bodytrack Loop • Perforation Effect • Perform fewer annealing layers • Perform less work, finish faster for (i = 0; i < layers; i += 2) { disperse particles for layer do particle computation }

  36. Other Perforated Loops in bodytrack • Concepts • bodytrack maintains probabilistic model of where body parts are in previous frame • Reads image data from 4 cameras • Performs image processing to get information about where it thinks body is in current frame • Computes probabilistic model for current frame • Many perforated loops in error calculations • Between probabilistic model from previous frame • And image data from current frame • Used to obtain probabilistic model for current frame

  37. Perforated Image Quality Panning camera