Design space exploration with simplescalar
Download
1 / 25

Design Space Exploration with SimpleScalar - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

Design Space Exploration with SimpleScalar. The SimpleScalar Toolset. The Simplescalar Toolset. Simluation Suite. SimpleScalar ISA. clean and simple instruction set architecture: MIPS/ DLX + more addressing modes - delay slots 64- bit inst encoding facilitates instruction set research

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Design Space Exploration with SimpleScalar' - raiden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Design space exploration with simplescalar

Design Space Explorationwith SimpleScalar





Simplescalar isa
SimpleScalar ISA

  • clean and simple instruction set architecture:

  • MIPS/ DLX + more addressing modes - delay slots

  • 64- bit inst encoding facilitates instruction set research

    • 16- bit space for hints, new insts, and annotations

    • four operand instruction format, up to 256 registers



Out of order simulator
Out of order simulator

Configurable set of

FUs


Configurable memory hierarchy
Configurable Memory Hierarchy

  • All caches and TLB configurations specified with same format:

    < nsets>:< bsize>:< assoc>:< repl>

  • Block replacement policy

    l - for LRU

    f - for FIFO

    r - for RANDOM



Design space exploration
Design Space Exploration

  • Metric definition

    • Energy*Delay

    • Area*Delay

  • Design space definition

    • L1 and L2 caches, n° ALUs ...

  • Embedded Application Definition

  • Metric minimization

    • Exhaustive search

    • Greedy search

    • Gradient search

    • Simulated Annealing and so on


Design space exploration a case study
Design Space Exploration:A case study.

  • Metric Defined:

    Price over Performance= area*CPI

  • Design space:

    • Sets, block, associativity and replacement polocy for each cache;

    • number of integer ALUs;

    • number of integer multipliers;

    • number of floating-point ALUs;

    • number of floating-point multipliers.

Design space exploration performed by F. Cassoli and A. Ferrante @ ALARI


Design space definition
Design Space Definition

  • Ranges for each parameter

    • DL1:128:{32, 64}:4:L

    • IL1:{256, 512}:32:1:L

    • UL2:{1024, 2048}:{64, 128}:4:{L, F}

    • IALU:{2, 4}

    • IMULT:{1, 2, 4}

    • FPALU:{1, 4}

    • FPMULT:{1, 2}

  • 768 different cases


Embedded application
Embedded Application

  • EPIC decoder (Efficient Pyramid Image deCoder)

    • Image data compression utility written in C.

    • Free Mediabench Source

    • Based on wavelet decomposition and a Huffman entropy (de)coder.


Cost function
Cost Function

F(x)= A(x)*D(x)

  • Area of x (sum of equivalent gates of each module). Models found in the literature.

  • Delay of x (computed through simulation of EPIC on architecture x).



Optimal configuration
Optimal Configuration

  • The lowest value of the PoP is 998’732.31, obtained with:

    DL1: 128:32:4:L

    IL1: 256:32:1:L

    UL2: 1024:64:4:F

    IALU: 4

    IMULT: 2

    FPALU: 4

    FPMULT: 2


Cost function properties
Cost Function Properties

  • The difference between the PoPs for a DL1 cache of 32 and of 64 sets is very little.

  • The difference between the PoPs for a IL1 cache of 256 and of 512 sets is very little.


Cost function properties1
Cost Function Properties

  • Increasing the sets of UL2 increases the PoP (in average).

  • Augmenting the dimension of the block of the UL2 cache always leads to an abrupt growth of the PoP.

  • The L2-cache dimension grows very much, so that the cache becomes significantly larger that the rest of the system.






Conclusions
Conclusions

  • Reduction of PoP when the number of integer ALUs is doubled. Great benefit with reduced area increase.

  • Optimal configuration has IMULT = 2, (not 1 or 4, because EPIC does not expose much parallelism).

  • However FPALU = 4 leads to better results than FPALU = 1.

  • L2 FIFO policy outperforms LRU.

  • Same benefits when adding an FPMULT.


Conclusions1
Conclusions

  • A greedy algorithm has also been applied to minimize the cost function.

  • Starting from different points

    • average number of simulations required= 49

    • minimum number of simulations required= 11

    • maximum number of simulations required=83

  • Full search optimum always reached

  • Considering that an exhaustive search needs 768 simulations, we reduce time of about 93.6%.


ad