design space exploration with simplescalar
Download
Skip this Video
Download Presentation
Design Space Exploration with SimpleScalar

Loading in 2 Seconds...

play fullscreen
1 / 25

Design Space Exploration with SimpleScalar - PowerPoint PPT Presentation


  • 126 Views
  • Uploaded on

Design Space Exploration with SimpleScalar. The SimpleScalar Toolset. The Simplescalar Toolset. Simluation Suite. SimpleScalar ISA. clean and simple instruction set architecture: MIPS/ DLX + more addressing modes - delay slots 64- bit inst encoding facilitates instruction set research

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Design Space Exploration with SimpleScalar' - raiden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
simplescalar isa
SimpleScalar ISA
  • clean and simple instruction set architecture:
  • MIPS/ DLX + more addressing modes - delay slots
  • 64- bit inst encoding facilitates instruction set research
    • 16- bit space for hints, new insts, and annotations
    • four operand instruction format, up to 256 registers
out of order simulator
Out of order simulator

Configurable set of

FUs

configurable memory hierarchy
Configurable Memory Hierarchy
  • All caches and TLB configurations specified with same format:

< nsets>:< bsize>:< assoc>:< repl>

  • Block replacement policy

l - for LRU

f - for FIFO

r - for RANDOM

design space exploration
Design Space Exploration
  • Metric definition
    • Energy*Delay
    • Area*Delay
  • Design space definition
    • L1 and L2 caches, n° ALUs ...
  • Embedded Application Definition
  • Metric minimization
    • Exhaustive search
    • Greedy search
    • Gradient search
    • Simulated Annealing and so on
design space exploration a case study
Design Space Exploration:A case study.
  • Metric Defined:

Price over Performance= area*CPI

  • Design space:
    • Sets, block, associativity and replacement polocy for each cache;
    • number of integer ALUs;
    • number of integer multipliers;
    • number of floating-point ALUs;
    • number of floating-point multipliers.

Design space exploration performed by F. Cassoli and A. Ferrante @ ALARI

design space definition
Design Space Definition
  • Ranges for each parameter
    • DL1:128:{32, 64}:4:L
    • IL1:{256, 512}:32:1:L
    • UL2:{1024, 2048}:{64, 128}:4:{L, F}
    • IALU:{2, 4}
    • IMULT:{1, 2, 4}
    • FPALU:{1, 4}
    • FPMULT:{1, 2}
  • 768 different cases
embedded application
Embedded Application
  • EPIC decoder (Efficient Pyramid Image deCoder)
    • Image data compression utility written in C.
    • Free Mediabench Source
    • Based on wavelet decomposition and a Huffman entropy (de)coder.
cost function
Cost Function

F(x)= A(x)*D(x)

  • Area of x (sum of equivalent gates of each module). Models found in the literature.
  • Delay of x (computed through simulation of EPIC on architecture x).
optimal configuration
Optimal Configuration
  • The lowest value of the PoP is 998’732.31, obtained with:

DL1: 128:32:4:L

IL1: 256:32:1:L

UL2: 1024:64:4:F

IALU: 4

IMULT: 2

FPALU: 4

FPMULT: 2

cost function properties
Cost Function Properties
  • The difference between the PoPs for a DL1 cache of 32 and of 64 sets is very little.
  • The difference between the PoPs for a IL1 cache of 256 and of 512 sets is very little.
cost function properties1
Cost Function Properties
  • Increasing the sets of UL2 increases the PoP (in average).
  • Augmenting the dimension of the block of the UL2 cache always leads to an abrupt growth of the PoP.
  • The L2-cache dimension grows very much, so that the cache becomes significantly larger that the rest of the system.
conclusions
Conclusions
  • Reduction of PoP when the number of integer ALUs is doubled. Great benefit with reduced area increase.
  • Optimal configuration has IMULT = 2, (not 1 or 4, because EPIC does not expose much parallelism).
  • However FPALU = 4 leads to better results than FPALU = 1.
  • L2 FIFO policy outperforms LRU.
  • Same benefits when adding an FPMULT.
conclusions1
Conclusions
  • A greedy algorithm has also been applied to minimize the cost function.
  • Starting from different points
    • average number of simulations required= 49
    • minimum number of simulations required= 11
    • maximum number of simulations required=83
  • Full search optimum always reached
  • Considering that an exhaustive search needs 768 simulations, we reduce time of about 93.6%.
ad