An Analytical Model to Exploit Memory Task Scheduling

An Analytical Model to Exploit Memory Task Scheduling Hsiang-Yun Cheng, Jian Li, and Chia-Lin Yang Dept. of Computer Science & Information Engineering National Taiwan University IBM Austin Research Laboratory

Motivation • Off-chip bandwidth on CMPs is a precious resource • If too many cores execute memory operations simultaneously • Bandwidth contention ↑ memory access latency ↑

Objective • Software task scheduling to reduce bandwidth contention and improve system performance • Utilize stream programming property to decouple threads into memory and compute tasks • Avoid too many concurrent memory tasks • Challenge: how many concurrent memory tasks is allowable to maximize system performance?

Stream Programming Style • Decouple computation and memory access • Gather  Compute  Scatter • Example

Exploiting Stream Programming Properties • Task division according to stream programming property • Memory tasks • Fetch data from off-chip memory to on-chip caches • Compute tasks • Directly access data from on-chip cache without cache misses

Memory Task Scheduling • Main Idea • Restrict Memory Task Limit (MTL) to reduce memory bandwidth contention • MTL : number of memory tasks that can be scheduled simultaneously • MTL↓ bandwidth contention↓ memory access latency ↓ • MTL↓ scheduling constraint↑ CPU may unnecessarily stay idle

Memory Task Scheduling • Application with different characteristics (memory to compute ratios) may perform best under different MTL • Example: MTL=1 performs best

Memory Task Scheduling • Application with different characteristics (memory to compute ratios) may perform best under different MTL • Example: MTL=2 performs best 10

Performance Modeling for Different MTLs • Develop an analytical model to analyze performance speedup under different MTL constraint • Given Tmk, Tc, MTL=k, n, t • Tmk: average execution time of memory tasks under MTL=k • Tc: average execution time of compute tasks • n: number of processor cores • t: number of memory tasks • Estimate performance speedup under MTL=k

Would CPU Idle under MTL=k? • If then CPU always busy • If then CPU sometimes idle • Example: n=4 • MTL=1  CPU won’t idle if • MTL=2  CPU won’t idle if • MTL=3  CPU won’t idle if

If CPU always busy If CPU sometimes idle Performance Model

Performance Trend • Comparing workloads with same Tmk, same optimal MTL, but different Tc • Optimal MTL: MTL that achieves the best speedup

Experimental Setup • Workloads • Experimental environment

Experimental Result – Performance Speedup

Experimental Result – Performance Speedup 17

Experimental Result – Reduction of Bandwidth Contention

Experimental Result – Real Workload

Thank You Hsiang-Yun Cheng r96027@csie.ntu.edu.tw

An Analytical Model to Exploit Memory Task Scheduling

An Analytical Model to Exploit Memory Task Scheduling

Presentation Transcript

Real-Time Task Scheduling

Memory Access Scheduling

An XML Data Model for Analytical Instruments

SCHEDULING TASK FORCE

Task Allocation and Scheduling

Staged Memory Scheduling

Task Scheduling for Highly Concurrent Analytical and Transactional Main-Memory Workloads

Validating a New Memory Measure: An Experimental Memory Task

Scheduling Memory Transactions

An Analytical Model Relating FPGA Architecture Parameters to Routability

An Analytical Model for a GPU

APERIODIC TASK SCHEDULING

RADAR – Scheduling Task

Low-Cost Task Scheduling for Distributed-Memory Machines

An Analytical Model for CMPs

An analytical model for ATLAS

Periodic Task Scheduling

The Analytical Model

An XML Data Model for Analytical Instruments

Cloud Task Scheduling

Task scheduling

Memory Model