Power savings in embedded processors through decode filter cache
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

Power Savings in Embedded Processors through Decode Filter Cache PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on
  • Presentation posted in: General

Power Savings in Embedded Processors through Decode Filter Cache. Weiyu Tang, Rajesh Gupta , Alex Nicolau. Overview. Introduction Related Work Decode Filter Cache Results and Conclusion. Introduction. Instruction delivery is a major power consumer in embedded systems Instruction fetch

Download Presentation

Power Savings in Embedded Processors through Decode Filter Cache

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Power savings in embedded processors through decode filter cache

Power Savings in Embedded Processors through Decode Filter Cache

Weiyu Tang, Rajesh Gupta, Alex Nicolau


Overview

Overview

  • Introduction

  • Related Work

  • Decode Filter Cache

  • Results and Conclusion


Introduction

Introduction

  • Instruction delivery is a major power consumer in embedded systems

    • Instruction fetch

      • 27% processor power in StrongARM

    • Instruction decode

      • 18% processor power in StrongARM

  • Goal

    • Reduce power in instruction delivery with minimal performance penalty


Related work

Related Work

  • Architectural approaches to reduce instruction fetch power

    • Store instructions in small and power efficient storages

    • Examples:

      • Line buffers

      • Loop cache

      • Filter cache


Related work1

Related Work

  • Architectural approaches to reduce instruction decode power

    • Avoid unnecessary decoding by saving decoded instructions in a separate cache

    • Trace cache

      • Store decoded instructions in execution order

      • Fixed cache access order

        • Instruction cache is accessed on trace cache misses

      • Targeted for high-performance processors

        • Increase fetch bandwidth

        • Require sophisticated branch prediction mechanisms

      • Drawbacks

        • Not power efficient as the cache size is large


Related work2

Related Work

  • Micro-op cache

    • Store decoded instructions in program order

    • Fixed cache access order

      • Instruction cache and micro-op cache are accessed in parallel to minimize micro-op cache miss penalty

    • Drawbacks

      • Need extra stage in the pipeline, which increases misprediction penalty

      • Require a branch predictor

      • Per access power is large

        • Micro-op cache size is large

        • Power consumption from both micro-op cache and instruction cache


Decode filter cache

Decode Filter Cache

  • Targeted processors

    • Single issue, In-order execution

  • Research goal

    • Use a small (and power efficient) cache to save decoded instructions

    • Reduce instruction fetch power and decode power simultaneously

    • Reduce power without sacrificing performance

  • Problems to deal with

    • What kind of cache organization to use

    • Where to fetch instructions as instructions can be provided from multiple sources

    • How to minimize decode filter cache miss latency


Decode filter cache1

Decode

filter cache

fetch

decode

execute

mem

writeback

predictor

Line buffer

I-cache

1

5

4

3

2

Decode Filter Cache

fetch

address

Processor Pipeline


Decode filter cache2

Decode Filter Cache

  • Decode filter cache organization

    • Problems with traditional cache organization

      • The decoded instruction width varies

      • Save all the decoded instructions will waste cache space

    • Our approach

      • Instruction classification

        • Classify instructions into cacheable and uncacheable depending on instruction width distribution

        • Use a “cacheable ratio” to balance the cache utilization vs. the number of instructions that can be cached

      • Sectored cache organization

        • Each instruction can be cached independently of neighboring lines

        • Neighboring lines share a tag to reduce cache tag store cost


Decode filter cache3

Decode Filter Cache

  • Where to fetch instructions

    • Instructions can be provided from one of the following sources

      • Line buffer

      • Decode filter cache

      • Instruction cache

    • Predictive order for instruction fetch

      • For power efficiency, either the decode filter cache or the line buffer is accessed first when an instruction is likely to hit

      • To minimize decode filter cache miss penalty, the instruction cache is accessed directly when the decode filter cache is likely to miss


Decode filter cache4

Decode Filter Cache

  • Prediction mechanism

    • When next fetch address and current address map to the same cache line

      • If current fetch source is line buffer, the next fetch source remain the same

      • If current fetch source is decode filter cache and the corresponding instruction is valid, the next fetch source remain the same

      • Otherwise, the next fetch source is instruction cache

    • When fetch address and current address map to different cache lines

      • Predict based on next fetch prediction table, which utilizes control flow predictability

      • If the tag of current fetch address and the tag of the predicted next fetch address are same, next fetch source is decode filter cache

      • Otherwise, next fetch source is instruction cache


Results

Results

  • Simulation setup

    • Media Benchmark

    • Cache size

      • 512B decode filter cache, 16KB instruction cache, 8KB data cache.

    • Configurations investigated


Results reduction in i cache fetches

Results: % reduction in I-cache fetches


Results reduction in instruction decodes

Results: % reduction in instruction decodes


Results normalized delay

Results: normalized delay


Results reduction in processor power

Results: % reduction in processor power


Conclusion

Conclusion

  • There is a basic tradeoff between

    • no. of the instructions cached as in instruction caches, and

    • greater savings in power by reducing decoding, fetch work (as in decode caches).

  • We tip this balance in the favor of decode cache by a coordinated operation of

    • instruction classification/selective decoding (into smaller widths)

    • sectored caches built around this classification

  • The results show

    • Average 34% reduction in processor power

      • 50% more effective in power savings than an instruction filter cache

    • Less than 1% performance degradation due to effective prediction mechanism


  • Login