1 / 25

Hardware-based Job Queue Management for Manycore Architectures and OpenMP Environments

Hardware-based Job Queue Management for Manycore Architectures and OpenMP Environments. Junghee Lee, Chrysostomos Nicopoulos , Yongjae Lee, Hyung Gyu Lee and Jongman Kim. Presented by Junghee Lee. Introduction. Manycore systems Number of cores is increasing

emory
Download Presentation

Hardware-based Job Queue Management for Manycore Architectures and OpenMP Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardware-based Job Queue Management for Manycore Architectures and OpenMP Environments Junghee Lee, ChrysostomosNicopoulos, Yongjae Lee, HyungGyu Lee and Jongman Kim Presented by Junghee Lee

  2. Introduction • Manycore systems • Number of cores is increasing • Challenges in scalability • Memory • Power consumption • Cache coherence protocol • Load balancing

  3. Contents • Introduction • Background • Programming models • Motivation • IsoNet • Fault-tolerance • Evaluation • Conclusion

  4. Programming Models • Parallel programming models • MPI • OpenMP • Fine-grained parallelism • Emerging applications:Recognition, Mining and Synthesis • Execution time of each computation kernel is very short but it has abundant parallelism • Excessive overhead in multithreading

  5. Job Queuing • Creates jobs instead of threads • One thread per core is created • Thread: a set of instructions and states of execution • Job: a set of data that is processed by a thread • Job queue • Manages the list of jobs • Maintains load balance Job Job Job Thread Thread CPU CPU

  6. Conflicts in Job Queue • Chance of conflicts increases as: • The number of cores increases • The time taken to update the job queue increases • The job queue is accessed more frequently (job is short) • Previous approaches • Distributed queues • Load balance is maintained by job-stealing • The chance of conflicts in one local queue is decreased • Hardware implementation • Time spent on updating the queue is reduced

  7. Profile of SMVM Conflicts Stealing job Processing job 1.0 Ratio of execution time 0.8 0.6 0.4 0.2 0 4 128 256 8 16 32 64 Number of cores

  8. Objectives • Requirements of load balancer • Scalability: conflict-free • Fault-tolerance • The probability of faults increases exponentially as technology scales • Contributions of this paper • Light weight micro-network for load balancing • Scalable even with more than a thousand cores • Comprehensive fault-tolerance support

  9. Contents • Introduction • Background • IsoNet • Architecture • Implementation • Fault-tolerance • Evaluation • Conclusion

  10. System View I I I CPU CPU CPU R R R I I I CPU CPU CPU R R R

  11. Microarchitecture of IsoNet Node Job Count Job Count MUX MUX Max Selector Min Selector Comp Comp Switch MUX Job Job DEMUX Dual Clock Stack

  12. How It Works 1 1 1 1 2 1 1 2 1 1 2 1 2 1 1 1 1 1 1 2 1 1 1 1 1 2 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 0 1 Tree-based routing: for fault-tolerance

  13. Single Cycle Implementation • Estimated critical path delay • 11.38 ns (87.8 MHz) • By Elmore delay model • Single cycle implementation offers low hardware cost Leaf node Int. node Root node Int. node Src or Dest Swt Swt Src node Dest node

  14. Hardware Cost Estimation 674.50 * 240 * 4 = 647.52 K = 0.046% of 1.4 B (NVIDIA GTX285)

  15. Contents • Introduction • Background • IsoNet • Fault-tolerance • Transparent mode • Reconfiguration mode • Evaluation • Conclusion

  16. Supporting Fault-Tolerance • Transparent mode • For faulty CPUs • Bypass the corresponding IsoNet node • Reconfiguration mode • For faulty IsoNet node • Operation • When a fault is detected, all IsoNet nodes go into the reconfiguration mode • Reconfigure the topology of IsoNet so that the faulty node is excluded • Assign a new root node if the root node fails

  17. Reconfiguration 3 3 3 2 3 1 3 3 2 2 2 1 3 0 3 1 3 2 3 3 3 2 3 1 3 2 3 3 Root Node Candidate

  18. Contents • Introduction • Background • IsoNet • Fault-tolerance • Evaluation • Experimental setup • Results • Conclusion

  19. Experimental Setup • Simulation framework • Wind River’s Simics full-system simulator • CMP with 4~64 x86 compatible cores • Fedora 12 with kernel 2.6.33 • Benchmarksfrom recognition, mining and synthesis applications • GS: Gauss-Seidel • MMM: Dense Matrix-Matrix Multiply • SVA: Scaled Vector Addition • MVM: Dense Matrix Vector Multiply • SMVM: Sparse Matrix Vector Multiply

  20. Results MMM (6,473 instructions) SMVM (2,872 instructions) 50 25 7 14 Execution time (107 cycles) Execution time (107 cycles) 45 6 12 40 20 Speed up 5 Speed up 10 35 15 30 8 4 25 6 3 10 20 2 4 15 5 1 10 2 5 0 0 4 8 16 32 64 4 8 16 32 64 Number of cores Number of cores Job stealing Carbon IsoNet IsoNet speed up Carbon speedup

  21. Beyond Hundred Cores • MMM (6,473 instructions) 1.0 Relative Execution Time 0.8 0.6 0.4 0.2 0 128 4 8 16 32 64 256 512 1024 Number of cores Carbon IsoNet

  22. Profile of IsoNet Conflicts Stealing job Processing job 1.0 Ratio of execution time 0.8 0.6 0.4 0.2 0 4 8 16 32 64 Number of cores

  23. Conclusion • Scalability is one of key challenges in manycore domain • Scalability in load balancing is critical to utilize a number of processing elements • This paper proposes a novel hardware-based dynamic load distributor and balancer, called IsoNet • IsoNet also provides comprehensive fault-tolerance support • Experimental results in a full-system simulation with real applications demonstrate that IsoNet scales better than alternative techniques

  24. Questions? Contact info Junghee Lee junghee.lee@gatech.edu Electrical and Computer Engineering Georgia Institute of Technology

  25. Thank you!

More Related