1 / 30

Exploring Design Space of VLIW Architectures

Università di Catania Dipartimento di Ingegneria Informatica e delle Telecomunicazioni. Exploring Design Space of VLIW Architectures. Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide Patti. DIIT - University of Catania, Italy. Outline. Introduction VLIW in past & future

lyndon
Download Presentation

Exploring Design Space of VLIW Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Università di Catania Dipartimento di Ingegneria Informatica e delle Telecomunicazioni Exploring Design Space of VLIW Architectures Giuseppe Ascia, Vincenzo Catania, Maurizio Palesi and Davide Patti DIIT - University of Catania, Italy

  2. Outline • Introduction • VLIW in past & future • Design Exploration Framework • ILP oriented compilation • Genetic Design Space Exploration • Conclusions

  3. Instruction Level Parallelism • high performance processors in the 1980s: maximize ILP • Issue more than one single instruction in a given clock cycle • Who decides which instructions can be executed in parallel? • Two different philosophies: • Superscalar • Very Long Instruction Word (VLIW)

  4. Run-time Foo.c Instruction stream Op1 Op2 Op3 Op4 Op5 … Op1,Op2 Op3 Op4,Op5 … compiler HW ILP philosophy: Superscalar • Hide the process of finding ILP • ILP is discovered dynamically at run-time by the control hardware of the processor

  5. Run-time Plan of execution Op1,Op2 Op3 Op4,Op5 compiler HW Foo.c Hardware resources configuration ILP philosophy: VLIW • Hardware resources are architecturally visible to the compiler • Compiler can create a sequence of Very Long Instructions that defines the plan of execution • HW simply execute the plan

  6. VLIW past & future • Decline of VLIWs for general purpose systems: • Couldn’t be integrated in a single chip • Binary compatibility between implementations • Rediscovery of VLIW in embbeded • No more integrability issues • Binary incompatibility not relevant • Advanteges of VLIW: • Simplified hardware • optimize ad-hoc the architecture to achieve ILP

  7. Reference architecture (HPL-PD) Prefetch Cache Fetch Unit Decode and Control Logic Instruction Queue L1 Data Cache Prefetch Unit L2 Unified Cache L1 Instruction Cache Predicate Registers Branch Registers General Prupose Registers Floating Point Registers Control Registers Branch Unit Integer Unit Floating Point Unit Load/Store Unit

  8. Configuration Space Three main parameter categories: • VLIW core: • Number of Registers in each register file (from 16 to 256) • Number of istancies for Functional Units of each type (from 1 to 6) • Mem Hierarchy: • Size, Blocksize, Associativity for each of the caches (L1 Instruction, L1 Data, L2) • Compiler: • Conservative compilation strategy (basic blocks) • Aggressive ILP oriented compilation strategy (hyperblocks) Total space size: 1.47 x 1013configurations !

  9. Configuration Compiler Simulator Estimator Application.c Exploration Algorithm Performances, Power, … Pareto configurations Required Tools • High level estimation models • Design Space Exploration strategy

  10. An Open Platform: EPIC Explorer • Interfacing to the Trimaran framework that provide VLIW compiler and simulator for dynamic statistics. • Estimator component implementing high level models • Explorer component implementing multi-objective design space exploration algorithms

  11. Foo.c IMPACT ELCOR Emulib foo.exe Execution statistics Processor Memory Explorer Estimator System configuration Cycles Energy Power The Exploration Data Flow

  12. Energy estimation • Subdivide architecture in Functional Block Unit (FBU) • Instruction decode logic, Integer units, floating point units, register files • For each FBU (from ST Microelectronics LX) • Active power: average power dissipated when the FBU is used • Inactive power: average power dissipated when the FBU is not used • From the execution statistic, we know how many cycles each FBU has been active/inactive • EFBU=(Pactivecyclesactive+ Pinactivecyclesinactive) Tclock • Discrete degree of accuracy (about 25%) • investigate relative power savings beetween designs

  13. Reference Application Set • Chosen from MediaBench suite

  14. Exploration Methodology • Preliminary analisys of compilation • Impact of ILP oriented code transformations • Predict the right compilation strategy: • Basic Blocks (conservative) • Hyper Blocks (aggressive, ILP-oriented) • Multi-objective Design Space Exploration • Extract Pareto Set

  15. Random subsets of n configurations CN ON T-test Compilation with (H) and without (N) hyperblock formation CH OH Is the mean effect on the objective significant respect to the chosen critical difference? Preliminary Analisys (1/3) • For each objective,Unpaired two sample t-test allows to estimate the average effect of hyperblock formation Configuration Space

  16. Preliminary Analisys (2/3) • Example of a metric for critical difference in means: d > 50% M

  17. Preliminary Analisys (3/3) ILP-oriented compilation impact (positive,negative)

  18. Chromosome Size BSize Assoc Func units Register Files DSE: Genetic Mapping Mem Cache VLIW core Bus ctrl

  19. Simulation Estimation Architecture configuration Performance Power Individual Fitness Evaluation Crossover Mutation Discendant Selected ? New Architecture configuration DSE: Genetic Iteration Current Population

  20. DSE: Experimental Results • Parameters : • Initial population: 30 individuals • Crossover probability: 0.8 • Mutation probability: 0.1 • Generations: 50 • Example of two different scenarios: • G721 encode: exploration should include the exploration of compilation strategy • Gsm-encode: hyperblock formation is predicted to be a better choice

  21. Pareto Set (G721 encode)

  22. Pareto Set (GSM-encode)

  23. Conclusions • Open platform for VLIW space exploration • Estimate Power, Energy and Performance • Preliminary Analisys of ILP-oriented compilation • Genetic multi-objective design space exploration • Future developments • Clustered VLIW • Network-on-chip multiprocessors • Open source: http://epic-explorer.sourceforge.net

  24. Thanks for your attention !

  25. Appendix • Bus Power Estimation • Implemented Algorithms • Multiobjective Fitness assignment • How Many Generations?

  26. Summarizing Table

  27. Power Estimation (buses) • Bus lines transitions computed from the list of data/address memory accesses Pbus = 0.5  (Vdd)2   f Cl • Vdd supply voltage •  switching activity • f clock frequency • Cl capacity of a bus line

  28. Design Space Exploration Implemented Algorithms : • Exhaustive: intuitive, simple and …unfeasible • Dependency analysis (dep), Givargis et al.,[TVLSI’02] • GA-based DSE (ga), Palesi et al., [CODES’01] • Sensitivity Analysis, Fornaciari et al., [DAES’02] • Pareto-based Sensitivity Analysis (pbsa), Palesi et al., [VLSI-SOC’01]

  29. Multiobjective Fitness assignment • Strength Pareto Approach [Zitzler,Thiele] • From current population P , is extracted an external set P*, containing the nondominated configuration of P. • Fitness of P* element j : fj = n/(N+1) • N = total size of P • n = # of P configurations dominated by j • Fitness of P element i: 1/S . • S is the sum of the fitness values of the P* elements that dominates i

  30. How Many Generations? • Fixed number of generations • Autostop criteria • Based on convergency delay power

More Related