1 / 36

CAPS team

CAPS team. Compilation et Architecture pour les Processeurs Superscalaires et Spécialisés Compiler and Architecture for superscalar and embedded processors. CAPS members. 2 INRIA researchers : A. Seznec, P. Michaud 2 professors : F. Bodin, J. Lenfant

kinsey
Download Presentation

CAPS team

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CAPS team Compilation et Architecture pour les Processeurs Superscalaires et Spécialisés Compiler and Architecture for superscalar and embedded processors

  2. CAPS members • 2 INRIA researchers:A. Seznec, P. Michaud • 2 professors: F. Bodin, J. Lenfant • 11 Ph D students: R. Amicel, R. Dolbeau, A. Monsifrot , L. Bertaux, K. Heydemann, L. Morin, G. Pokam, A. Djabelkhir, A. Fraboulet, O. Rochecouste, E.Toullec • 3 engineers:S. Bihan, P. Villalon, J. Simonnet

  3. CAPS themes • Two interacting activities • High performance microprocessor architecture • Performance oriented compilation

  4. CAPS Grail • Performance at the best cost Progress in computer science and applications are driven by performance

  5. CAPS path to the Grail • Defining the tradeoffs between: • what should be done through hardware • what can be done by the compiler • for maximum performance • or for minimum cost • or for minimum size, power ..

  6. Need for high-performance processors • Current applications • general purpose: scientific, multimedia, data bases … • embedded systems: cell phones, automotive, set-top boxes .. • Future applications • don’t worry: users have a lot of imagination ! • New software engineering techniques are CPU hungry: • reusability, generality • portability, extensibility (indirections, virtual machines) • safety (run-time verifications) • encryption/decryption

  7. CAPS (ancient) background • « ancient » background in hardware and software management of ILP • decoupled pipeline architectures • OPAC, an hardware matrix floating-point coprocessor • software pipeline for LIW • « Supercomputing » background • interleaved memories • Fortran-S

  8. CAPS background in architecture • Solid knowledge in microprocessor architecture • technological watch on microprocessors • A. Seznec worked with Alpha Development Group in 1999-2000 • Researches in cache architecture • Researches in branch prediction mechanisms

  9. CAPS background in compilers • Software optimizations for cache memories • Numerical algorithms on dense structures • Optimizing data layout • Many prototype environments for parallel compilers: • CT++ (with CEA): image processing C++ library for a SIMD architecture, • Menhir: a parallel compiler for MatLab • IPF (with Thomson-LER): Fortran Compiler for image processing on Maspar • Sage (with Indiana): Infrastusture for source level transformation

  10. We build on • SALTO: System for Assembly-Language Transformations and Optimizations • retargetable assembly source to source preprocessor • Erven Rohou’s Ph. D  • TSF: • Scripting language for program transformation on top of ForeSys (Simulog) • Yann Mevel’s Ph. D

  11. assemblylanguage Machine Description Transformationtool C++ interface SALTO assemblylanguage Salto overview • Assembly source to source preprocessor • Fine grain machine description • Independent from compilers

  12. Compiler activities • Code optimizations for embedded applications • infrastructures rather than compilers • optimizing compiler strategies rather than new code optimizations • Global constraints • performance /code sizes/ low power (starting) • Focus on interactive tools rather than automatic • code tuning • case based reasoning • assembly code optimizations

  13. Computer aided hand tuning • Automatic optimization has many shortcomings • rather provide the user with a testbed to hand-tune applications • Target applications • Fortran codes and embedded C applications • Our approach • case based reasoning • static code analysis and pattern matching • profiling • learning techniques • the user is the ultimate responsible

  14. CAHT • Prototype built on • Foresys: Fortran interactive front-end (from Simulog) • TSF: Scripting language for program transformation • Sage++:Infrastusture for source level transformation

  15. Analysis and Tuning tool for Low Level Assembly and Source code (with Thomson Multimedia) • ATLLAS objectives : • Has the compiler done a good job ? • Try to match source and optimized assembly at fine grain • Development/analysis environment: • Models for both source and assembly • Global and local analysis (WCET, …) at both levels • Interactive environment for codes visualization and manual/ automatic analysis and optimization • Built using Salto and Sage++: • Retargetable with compilers and architectures

  16. Code matching analysis and evaluations Graphic Display of Ass. And Src. Code End ATLLAS - Analysis and Tuning tool for Low Level Assembly and Source code : Tuning method Source Code Assembly Code Yes Good ? Atllas Processing Support Half-Automatic or Manual Source Optimisations Half-Automatic or Manual Assembly Optimisations compilation profiling Post-Processing

  17. Assembly Level Infrastrure for Software Enhancement (with STmicroelectonics) • ALISE • enhanced SALTO for code optimization: • better integration with code generation • interface with front-end • interface for profiling data • targets global optimization • based on component software optimization engines • Answer to a real need from industry: • A retargetable infrastructure

  18. ALISE • Environment for: • global assembly code optimization • providing optimization alternatives • Support for new embedded processors • ISAs with ILP support (VLIW, EPIC) • Predicated instructions • Functional unit clusters, ..

  19. ALISE Architecture Description D to M Architecture Model Intermediate Code Intermediate representation Optimized Program IR to Ass (Emit) Text Input High Level API P to IR Opt 1 Opt 2 Opt n Interfaces User interface External Infrastructure External Infrastructure G.U.I.

  20. Preprocessor for media processors (MEDEA+ Mesa project) • Multimedia instructions on embedded and general-purpose processors but : • no consensus on MMD instructions among constructors: • saturated arithmetic or not, different instructions, … • Multimedia instructions are not well handled by compilers: • but performance is very dependent

  21. Preprocessor for media processors:our approach • C source to source preprocessor • user oriented idioms recognition: • easy to retarget • target dedicated recognition • exploiting loop parallelism • vectorization techniques • multiprocessor systems • available soon • Collaboration with Stmicroelectonics

  22. Iterative compilation • Embedded systems: • Compile time is not critical • Performance/code size/power are critical • One can often relate on profiling • Classical compiler: local optimizations • but constraints are GLOBAL • Proof of concept for code sizes (Rohou ’s Ph. D) • new Ph. D. beginning in september 2000

  23. High performance instruction set simulation • Embedded processors: • // development of silicon, ISA, compiler and applications • Need for flexible instruction set simulation: • high performance • simulation of large codes • debugging • retargetable to experiment: • new ISA • various microarchitecture options • First results: up to 50x faster than ad-hoc simulator

  24. ABSCISS: Assembly Based System for Compiled Instruction Set Simulation C Source tmcc TriMedia Assembly Architecture Description ABSCISS tmas C/C++ Source TriMedia Binary gcc tmsim Compiled simulator

  25. Enabling superscalar processor simulation • Complete O-O-O microprocessor simulation: • 10000-100000 slower than real hardware • can not simulate realistic applications, but slices • even fast mode emulation is slow (50-100x): • simulation generally limited to slices at the beginning of the application • representativeness ? • Calvin2 + DICE: • combines direct execution with simulation • really fast mode: 1-2x slowdown • enables simulating slices distributed over the whole application

  26. checkpoint checkpoint checkpoint checkpoint checkpoint Switching event Switching event DICE Host ISA Emulator Original code calvin2 Static Code Annotation Tool SPARC V9 assembly code User analysis routines Emulation mode Calvin2 + DICE

  27. New 64bit ISA from Intel/HP: Explicitly Parallel Instruction Computing Predicated Execution Advanced loads (i.e. speculative) A very interesting platform for research !! Porting SALTO and Calvin2+DICE approach to IA64 Exploring new trade-offs enabled by instruction sets: predicting the predicates ? advanced loads against predicting dependencies ultimate out-of-order execution against compiler Moving tools to IA64

  28. Power consumption becomes a major issue: Embedded and general purpose Compilation (setting a collaboration with STmicroelectronics/Stanford/Milan): Is it different from performance optimization ? Global constraint optimization Instruction Set Architecture support ? Architecture: High order bits are generally null, … registers and memory ALUs Low power, compilation, architecture, …(just beginning :=)

  29. International CAPS visibility in architecture = skewed associative cache + decoupled sectored cache + multiple block ahead branch prediction + skewed branch predictor Continue recurrent work on these topics: multiple block ahead + tradeoffs complexity/accuracy Caches and branch predictors

  30. Sharing functional units among several processes Among the first groups working on this topic S. Hily’s Ph. D. SMT behavior well understood for independent threads now, focus on // threads from a single application Current research directions: speculative multithreading ultimate performance with a single thread through predicting threads performance/complexity tradeoffs: SMT/CMP/hybrid Simultaneous Multithreading

  31. « Enlarging » the instruction window (supported by Intel) • In an O-O-O processor, fireable instructions are chosen in a window of a few tens of RISC-like instructions. • Limitations are: • size of the window • number of physical registers • Prescheduling: • separate data flow scheduling from resource arbitration. • coarser units of work ? • Reducing the number of physical registers: • how to detect when a physical register is dead ? • Per group validation ? revisiting CISC/RISC war ?

  32. Unwritten rule on superscalar processor designs • For general purpose registers: Any physical register can be the source or the result of any instruction executed on any functional unit

  33. S0 S1 C1 C0 S1 S0 C2 C3 S3 S2 S2 S3 4-cluster WSRS architecture(supported by Intel) • Half the read ports, one • fourth the write ports • Register file: • Silicon area x 1/8 • Power x 1/2 • Access time x 0.6 • Gains on: • bypass network • selection logic

  34. Multiprocessor on a chip • Not just replicating board level solutions ! • A way to manage a large on-chip cache capacity: • how can a sequential application use efficiently a distributed cache ? • architectural supports for distributing a sequential application on several processors ? • how should instructions and data be distributed ?

  35. Need for unpredicable random number generation: sequences that cannot be reproduced State of the art: < 100 bit/s using the operating system 75Kbit/s using hardware generator on Pentium III Internal state of a superscalar can not be reproduced use this state to generate unpredictable random numbers HIPSORHIgh Performance SOftware Random number generation

  36. 1000’s of unmonitorable states modified by OS interrupts Hardware clock counter to indirectly probe these states Combined with in-line pseudo-random number generation 100 Mbit/s unpredictable random numbers HIPSOR (2) ARC INRIA with CODES

More Related