application to core mapping policies to reduce memory system interference n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Application-to-Core Mapping Policies to Reduce Memory System Interference PowerPoint Presentation
Download Presentation
Application-to-Core Mapping Policies to Reduce Memory System Interference

Loading in 2 Seconds...

play fullscreen
1 / 27

Application-to-Core Mapping Policies to Reduce Memory System Interference - PowerPoint PPT Presentation


  • 112 Views
  • Uploaded on

Application-to-Core Mapping Policies to Reduce Memory System Interference. Reetuparna Das * Rachata Ausavarungnirun $ Onur Mutlu $ Akhilesh Kumar § Mani Azimi § * University of Michigan $ Carnegie Mellon University § Inte l. Multi-Core to Many-Core. Multi-Core. Many-Core.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Application-to-Core Mapping Policies to Reduce Memory System Interference' - gaerwn


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
application to core mapping policies to reduce memory system interference

Application-to-Core Mapping Policies to Reduce Memory System Interference

Reetuparna Das*RachataAusavarungnirun$OnurMutlu$Akhilesh Kumar§ Mani Azimi§

*University of Michigan $Carnegie Mellon University §Intel

multi core to many core
Multi-Core to Many-Core

Multi-Core

Many-Core

many core on chip communication
Many-Core On-Chip Communication

Applications

Light

Heavy

$

Shared

Cache Bank

$

Memory

Controller

task scheduling
Task Scheduling
  • Traditional

When to schedule a task? – Temporal

  • Many-Core

When to schedule a task? – Temporal

+ Where to schedule a task?–Spatial

  • Spatial scheduling impacts performance of memory hierarchy
    • Latency and interference in interconnect, memory, caches
problem spatial task scheduling
Problem: Spatial Task Scheduling

Applications

Cores

How to map applications to cores?

challenges in spatial task scheduling
Challenges in Spatial Task Scheduling

Applications

Cores

How to reducecommunicationdistance?

How to reducedestructiveinterference between applications?

How to prioritizeapplications to improve throughput?

application to core mapping
Application-to-Core Mapping

Improve Bandwidth

Utilization

Improve Bandwidth

Utilization

Reduce Interference

Improve Locality

Reduce Interference

step 1 clustering
Step 1 — Clustering

Memory

Controller

Inefficient data mapping to memory and caches

step 1 clustering1
Step 1 — Clustering

Cluster 0

Cluster 2

Cluster 1

Cluster 3

Improved Locality

Reduced Interference

step 1 clustering2
Step 1 — Clustering
  • Clustering memory accesses
    • Locality aware page replacement policy (cluster-CLOCK)
      • When allocating free page, give preference to pages belonging to the cluster’s memory controllers (MCs)
      • Look ahead “N” pages beyond the default replacement candidate to find page belonging to cluster’s MC
  • Clustering cache accesses
    • Private caches automatically enforce clustering
    • Shared caches can use Dynamic Spill Receive* mechanism

*Qureshi et al, HPCA 2009

step 2 balancing
Step 2 — Balancing

Applications

Cores

Heavy

Light

Too much load in clusters with heavy applications

step 2 balancing1
Step 2 — Balancing

Applications

Cores

Heavy

Light

Better bandwidth utilization

Is this the best we can do? Let’s take a look at application characteristics

application types
Application Types

(c) PHD Comics

application types1
Application Types

Applications

Sensitive

Medium

Heavy

Light

Low Miss Rate

High Miss Rate

Low MLP

Med Miss Rate

High MLP

High Miss Rate

High MLP

  • Identify and isolate sensitive applications while ensuring load balance

Thesis Committee

Advisor

Sensitive

Guru

There for cookies

Adversary

Bitter rival

Nice Guy

No opinions

Asst.

Professor

(c) PHD Comics

step 3 isolation
Step 3 — Isolation

Applications

Cores

Sensitive

Light

Medium

Heavy

Isolate sensitive applications to a cluster

Balance load for remaining applications across clusters

step 3 isolation1
Step 3 — Isolation
  • How to estimate sensitivity?
    • High Miss— high misses per kilo instruction (MPKI)
    • Low MLP— high relative stall cycles per miss (STPM)
    • Sensitive if MPKI > Threshold and relative STPM is high
  • Whether to or not to allocate cluster to sensitive applications?
  • How to map sensitive applications to their own cluster?
    • Knap-sack algorithm
step 4 radial mapping
Step 4 — Radial Mapping

Applications

Cores

Sensitive

Light

Medium

Heavy

Map applications that benefit most from

being close to memory controllers close to these resources

step 4 radial mapping1
Step 4 — Radial Mapping
  • What applications benefit most from being close to the memory controller?
    • High memory bandwidth demand
    • Also affected by network performance
    • Metric => Stall time per thousand instructions
putting it all together
Putting It All Together
  • Balancing
  • Isolation

Inter-Cluster

Mapping

Intra-Cluster

Mapping

  • Radial Mapping
  • Clustering

Improve Locality

Reduce Interference

Improve Shared Resource Utilization

evaluation methodology
Evaluation Methodology
  • 60-core system
    • x86 processor model based on Intel Pentium M
    • 2 GHz processor, 128-entry instruction window
    • 32KB private L1 and 256KB per core private L2 caches
    • 4GB DRAM, 160 cycle access latency, 4 on-chip DRAM controllers
    • CLOCK page replacement algorithm
  • Detailed Network-on-Chip model
    • 2-stage routers (with speculation and look ahead routing)
    • Wormhole switching (4 flit data packets)
    • Virtual channel flow control (4 VCs, 4 flit buffer depth)
    • 8x8 Mesh (128 bit bi-directional channels)
configurations
Configurations
  • Evaluated configurations
    • BASE—Random core mapping
    • BASE+CLS—Baseline with clustering
    • A2C
  • Benchmarks
    • Scientific, server, desktop benchmarks (35 applications)
    • 128 multi-programmed workloads
    • 4 categories based on aggregate workload MPKI
      • MPKI500, MPKI1000, MPKI1500, MPKI2000
system performance
System Performance

System performance improves by 17%

network power
Network Power

Average network power consumption reduces by 52%

summary of other results
Summary of Other Results
  • A2C can reduce page fault rate
summary of other results1
Summary of Other Results
  • A2C can reduce page faults
  • DynamicA2C also improves system performance
    • Continuous “Profiling” + “Enforcement” intervals
    • Retains clustering benefits
    • Migration overheads are minimal
  • A2C complements application-aware packet prioritization* in NoCs
  • A2C is effective for a variety of system parameters
    • Number of and placement of memory controllers
    • Size and organization of last level cache

*Das et al, MICRO2009

conclusion
Conclusion
  • Clustering
  • Balancing
  • Radial
  • Isolation
  • Problem:Spatial scheduling for Many-Core processors
    • Develop fundamental insights for core mapping policies
  • Solution:Application-to-Core (A2C) mapping policies
  • A2C improves system performance, system fairness and network power significantly
application to core mapping policies to reduce memory system interference1

Application-to-Core Mapping Policies to Reduce Memory System Interference

Reetuparna Das*RachataAusavarungnirun$OnurMutlu$Akhilesh Kumar§ Mani Azimi§

*University of Michigan $Carnegie Mellon University §Intel