1 / 18

An Analytical Model to Exploit Memory Task Scheduling

An Analytical Model to Exploit Memory Task Scheduling. Hsiang-Yun Cheng, Jian Li, and Chia-Lin Yang Dept. of Computer Science & Information Engineering National Taiwan University IBM Austin Research Laboratory. Motivation. Off-chip bandwidth on CMPs is a precious resource

trey
Download Presentation

An Analytical Model to Exploit Memory Task Scheduling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Analytical Model to Exploit Memory Task Scheduling Hsiang-Yun Cheng, Jian Li, and Chia-Lin Yang Dept. of Computer Science & Information Engineering National Taiwan University IBM Austin Research Laboratory

  2. Motivation • Off-chip bandwidth on CMPs is a precious resource • If too many cores execute memory operations simultaneously • Bandwidth contention ↑ memory access latency ↑

  3. Objective • Software task scheduling to reduce bandwidth contention and improve system performance • Utilize stream programming property to decouple threads into memory and compute tasks • Avoid too many concurrent memory tasks • Challenge: how many concurrent memory tasks is allowable to maximize system performance?

  4. Stream Programming Style • Decouple computation and memory access • Gather  Compute  Scatter • Example

  5. Exploiting Stream Programming Properties • Task division according to stream programming property • Memory tasks • Fetch data from off-chip memory to on-chip caches • Compute tasks • Directly access data from on-chip cache without cache misses

  6. Memory Task Scheduling • Main Idea • Restrict Memory Task Limit (MTL) to reduce memory bandwidth contention • MTL : number of memory tasks that can be scheduled simultaneously • MTL↓ bandwidth contention↓ memory access latency ↓ • MTL↓ scheduling constraint↑ CPU may unnecessarily stay idle

  7. Memory Task Scheduling • Application with different characteristics (memory to compute ratios) may perform best under different MTL • Example: MTL=1 performs best

  8. Memory Task Scheduling • Application with different characteristics (memory to compute ratios) may perform best under different MTL • Example: MTL=2 performs best 10

  9. Performance Modeling for Different MTLs • Develop an analytical model to analyze performance speedup under different MTL constraint • Given Tmk, Tc, MTL=k, n, t • Tmk: average execution time of memory tasks under MTL=k • Tc: average execution time of compute tasks • n: number of processor cores • t: number of memory tasks • Estimate performance speedup under MTL=k

  10. Would CPU Idle under MTL=k? • If then CPU always busy • If then CPU sometimes idle • Example: n=4 • MTL=1  CPU won’t idle if • MTL=2  CPU won’t idle if • MTL=3  CPU won’t idle if

  11. If CPU always busy If CPU sometimes idle Performance Model

  12. Performance Trend • Comparing workloads with same Tmk, same optimal MTL, but different Tc • Optimal MTL: MTL that achieves the best speedup

  13. Experimental Setup • Workloads • Experimental environment

  14. Experimental Result – Performance Speedup

  15. Experimental Result – Performance Speedup 17

  16. Experimental Result – Reduction of Bandwidth Contention

  17. Experimental Result – Real Workload

  18. Thank You Hsiang-Yun Cheng r96027@csie.ntu.edu.tw

More Related