Design space exploration with simplescalar
1 / 25

Design Space Exploration with SimpleScalar - PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Design Space Exploration with SimpleScalar. The SimpleScalar Toolset. The Simplescalar Toolset. Simluation Suite. SimpleScalar ISA. clean and simple instruction set architecture: MIPS/ DLX + more addressing modes - delay slots 64- bit inst encoding facilitates instruction set research

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Design Space Exploration with SimpleScalar

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Design space exploration with simplescalar

Design Space Explorationwith SimpleScalar

The simplescalar toolset

The SimpleScalar Toolset

The simplescalar toolset1

The Simplescalar Toolset

Simluation suite

Simluation Suite

Simplescalar isa

SimpleScalar ISA

  • clean and simple instruction set architecture:

  • MIPS/ DLX + more addressing modes - delay slots

  • 64- bit inst encoding facilitates instruction set research

    • 16- bit space for hints, new insts, and annotations

    • four operand instruction format, up to 256 registers

Simplescalar architected state

SimpleScalar Architected State

Out of order simulator

Out of order simulator

Configurable set of


Configurable memory hierarchy

Configurable Memory Hierarchy

  • All caches and TLB configurations specified with same format:

    < nsets>:< bsize>:< assoc>:< repl>

  • Block replacement policy

    l - for LRU

    f - for FIFO

    r - for RANDOM

Configurable memory hierarchy1

Configurable Memory Hierarchy

Design space exploration

Design Space Exploration

  • Metric definition

    • Energy*Delay

    • Area*Delay

  • Design space definition

    • L1 and L2 caches, n° ALUs ...

  • Embedded Application Definition

  • Metric minimization

    • Exhaustive search

    • Greedy search

    • Gradient search

    • Simulated Annealing and so on

Design space exploration a case study

Design Space Exploration:A case study.

  • Metric Defined:

    Price over Performance= area*CPI

  • Design space:

    • Sets, block, associativity and replacement polocy for each cache;

    • number of integer ALUs;

    • number of integer multipliers;

    • number of floating-point ALUs;

    • number of floating-point multipliers.

Design space exploration performed by F. Cassoli and A. Ferrante @ ALARI

Design space definition

Design Space Definition

  • Ranges for each parameter

    • DL1:128:{32, 64}:4:L

    • IL1:{256, 512}:32:1:L

    • UL2:{1024, 2048}:{64, 128}:4:{L, F}

    • IALU:{2, 4}

    • IMULT:{1, 2, 4}

    • FPALU:{1, 4}

    • FPMULT:{1, 2}

  • 768 different cases

Embedded application

Embedded Application

  • EPIC decoder (Efficient Pyramid Image deCoder)

    • Image data compression utility written in C.

    • Free Mediabench Source

    • Based on wavelet decomposition and a Huffman entropy (de)coder.

Cost function

Cost Function

F(x)= A(x)*D(x)

  • Area of x (sum of equivalent gates of each module). Models found in the literature.

  • Delay of x (computed through simulation of EPIC on architecture x).

Result of the exhaustive search

Result of the exhaustive search

Optimal configuration

Optimal Configuration

  • The lowest value of the PoP is 998’732.31, obtained with:







    FPMULT: 2

Cost function properties

Cost Function Properties

  • The difference between the PoPs for a DL1 cache of 32 and of 64 sets is very little.

  • The difference between the PoPs for a IL1 cache of 256 and of 512 sets is very little.

Cost function properties1

Cost Function Properties

  • Increasing the sets of UL2 increases the PoP (in average).

  • Augmenting the dimension of the block of the UL2 cache always leads to an abrupt growth of the PoP.

  • The L2-cache dimension grows very much, so that the cache becomes significantly larger that the rest of the system.

Cost function properties2

Cost Function Properties

Cost function properties3

Cost Function Properties

Cost function properties4

Cost Function Properties

Area cpi scatter plot

Area – CPI scatter plot



  • Reduction of PoP when the number of integer ALUs is doubled. Great benefit with reduced area increase.

  • Optimal configuration has IMULT = 2, (not 1 or 4, because EPIC does not expose much parallelism).

  • However FPALU = 4 leads to better results than FPALU = 1.

  • L2 FIFO policy outperforms LRU.

  • Same benefits when adding an FPMULT.



  • A greedy algorithm has also been applied to minimize the cost function.

  • Starting from different points

    • average number of simulations required= 49

    • minimum number of simulations required= 11

    • maximum number of simulations required=83

  • Full search optimum always reached

  • Considering that an exhaustive search needs 768 simulations, we reduce time of about 93.6%.

  • Login