1 / 19

Architecture and Code optimization at INRIA March 2005 André Seznec

Architecture and Code optimization at INRIA March 2005 André Seznec. Alchemy: Paris CAPS : Rennes COMPSYS: Lyon R2D2: Rennes. INRIA: Institut National de Recherche Informatique et Automatique. French research institute in computer and applied mathematics:

hsiu
Download Presentation

Architecture and Code optimization at INRIA March 2005 André Seznec

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Architecture and Code optimization at INRIA March 2005André Seznec Alchemy: Paris CAPS : Rennes COMPSYS: Lyon R2D2: Rennes

  2. INRIA: Institut National de Recherche Informatique et Automatique • French research institute in computer and applied mathematics: • 500 senior research scientists • 8 locations: • Joined centers with universities and CNRS • About 1500 scientists total (including Ph. D students) • About 100 teams: • Cover all topics in computer science

  3. Four research groups in architecture/compiler • CAPS: Compilation and Architectures for Superscalar and Special purpose processors • ALCHEMY:Architectures, Languages and Compilers to Harness the End of Moore Years • COMPSYS: Compiling Systems on Silicium • R2D2: Reconfigurable and Retargetable Digital Devices About 20 researchers or faculties + about 20 Ph.D. students

  4. COMPSYS (created 2003) T. Risset, P. Fautrier, A. Darte: background in automatic parallelization (polyhedral model), systolic networks, software pipeline Applying parallel compiler techniques to embedded systems: -Compilers -”From the algorithm to a silicon system”

  5. ASIC Processor Memory Reconfigurable DART Reconfigurable FPGA DSP R2D2 team (Rennes)Reconfigurable and Retargetable Digital Devices Goal: search for the best compromise between high-performance, power consumption and flexibility using reconfigurable hardware for embedded systems

  6. R2D2 research fields • Compilation, synthesis targeting reconfigurable architectures • High-level synthesis from high-level specifications • Retargetable compilation and processor core modelling • Floating point to fixed-point conversion methodology targeting software (DSP) and hardware (FPGA) • New architectures and technologies • Coarse-grained reconfigurable architecture (DART reconfigurable data path) • Multiple-valued logic (MVL) architectures and circuits • Prototyping of applications on reconfigurable platforms • 3G and 4G mobile application prototyping • Contacts F. Charot and O. Sentieys

  7. ALCHEMY (Saclay ) Architectures, Languages and Compilers to Harness the End of Moore Years • Main topic : Long-Term performance scalability of architectures • 4 faculty • A. Cohen : compiler, automatic parallelization • C. Eisenbeis: software pipeline, software cache optimization • F. Gruau: futuristic architecture model • O. Temam: microarchitecture, software cache management • 10 PhDs

  8. ALCHEMY New technologies ? BLOB Computing (model) Complex systems Language+Architecture (domain-specific, e.g., video processing) Bio-inspired systems Language+Architecture (general-purpose) Scalable processor architectures Compiler optimizations Simulators: Development & Execution (MicroLib) Methodology (comparison) Processor simulation Compositions of transformations (polyhedral model) Architecture optimizations Manual optimizations (decision tree) Influence of data sets Architecture- inspired software optimizations (VHC) Iterative optimization (environment & COP) Algorithm selection Customization (functions) Software optimizations compatible with complex processor architectures & applications Performance  Usage ?

  9. Symbiotic Processing • The clock race is ending (see Intel) • Easily translate on-chipspace into performance • Focus on on-chip parallelism • Constraints • Application • let user extract parallelism effortlessly through proper programming paradigm • Architecture • no central control (too large space), local control only • avoid complex architectures • Spatial computing • break down program into independent/interacting objects • only local actions • architecture manages objects • architecture exploits parallelism (object execution & replication) depending on available resources/space • Application to future SMTs & CMPs

  10. MicroLibhttp://www.microlib.org • Change the way people do research in the domain: facilitate exchange, reuse and comparison of ideas • A library of simulator components (cache, BP, FU…) for complex processors • Open library (open-source, anyone can participate) • Already heavily used • processor architecture components • cache, branch prediction, scheduling… • research modules • 12 data cache mechanisms (MICRO 2004) • full processor simulators • PowerPC 750 (~15% accurate) • 2000+ downloads in 18 months • OoOSysC (generic superscalar) • simple MIPS 2000-like RISC

  11. CAPS (Rennes)Compiler and Architecture for Superscalar and Special purpose processors • Two interacting activities • microprocessor architecture (A. Seznec, P. Michaud) • High performance • Migrating high performance concepts to embedded systems • Performance oriented compilation (F. Bodin) • High performance • Embedded processors + Recently: Worst case execution time analysis (I. Puaut)

  12. CAPS background • Parallel memory systems • Pipeline structure • Caches: • Skewed associative caches • Decoupled sectored caches • Simultaneous Multithreading • Software cache management • Software pipeline • Special purpose compilers • WCET analysis

  13. Processor architecturerecent contributions • Global history branch predictors and instruction fetch front-end • 2bcgskew used in Compaq EV8 • Pipelining the I-fetch front end • O-GEHL branch predictor • Limiting hardware complexity on superscalar processors • Dataflow prescheduling: instruction window • WSRS architecture: register file, bypass network and issue logic • Thread parallelism and single chip parallelism: • CASH: CMP and SMT hybrid • Execution migration: single thread on a multicore, to use all the cache space

  14. architecture/compiler interaction • ISA simulation: • ABSCISS: ISA and architecture retargetable high speed simulator for VLIW processor • IATO: simulation of out-of-order execution IA64 • Low power and architecture configurability: • Cache reconfiguration at software level on phase basis • Hardware/software speculative management of data path and register file width • SWARP: retargetable C-to-C preprocessor to enhance multimedia instruction use

  15. Compiler and software environmentsrecent contributions • Artificial intelligence in performance tuning • CAHT: case based reasoning for assisting performance tuning • Automatic derivation of compiler heuristics: using machine learning to derive compiler heuristics • Performance code size tradeoffs: • Iterative compilation • Mixing interpretation on compressed code and native execution

  16. Aware of (real ) hardware implementation issues • A. Seznec’s sabbatical with Compaq Alpha Development group (1999-2000): • EV8 branch predictor directly issued from CAPS project-team researches • Parallel access scheme to strided vectors in caches in Tarantula vector processor project directly derived from “old” vector CAPS background • P. Michaud’s sabbatical with Intel (2001-2002): • Still covered by NDA

  17. ABSCISS: retargetable processor simulation SALTO: System for Assembly Languages Transformation and optimization SWARP: C-to-C retargetable preprocessor for multimedia instructions Menhir: Matlab to C parallel code generator PACCMAN compiler/simulator HAVEGE random number generator IATO toolkit: IA64 simulation Transferred to industry Distributed on demand, also transferred to industry Transferred to industry Transferred to industry Maintenance by industry Distributed for non-commercial use GPL Many (mature) software developments

  18. Aware of (real) software issuesSet-up of the start-up CAPS Entreprise(2003) • Software tools for high performance and embedded systems: Simulation, code transformation, worst-case execution time • Custom consulting services: Performance analysis, instruction set evaluations, .. • Awarded as an innovative company by ministry of Industry • Currently 6 employees, including 5 former CAPS project-team members

  19. CAPS future objectives • Leverage current expertise of CAPS core team • Microarchitecture: • From “ultimate performance” to “ maintaining performance to cheaper” • Migrating “high-end” concepts to embedded processors • SMT/CMP:Tradeoffs, sharing, synchronization • Compiler/code generation: • performance = (often) unpredictability and unstabibility • Dimensioning a system ? Real time constraints ? • Predictable/stable performance oriented code generation/architecture

More Related