Architecture and Code optimization at INRIA March 2005 André Seznec

Architecture and Code optimization at INRIA March 2005André Seznec Alchemy: Paris CAPS : Rennes COMPSYS: Lyon R2D2: Rennes

INRIA: Institut National de Recherche Informatique et Automatique • French research institute in computer and applied mathematics: • 500 senior research scientists • 8 locations: • Joined centers with universities and CNRS • About 1500 scientists total (including Ph. D students) • About 100 teams: • Cover all topics in computer science

Four research groups in architecture/compiler • CAPS: Compilation and Architectures for Superscalar and Special purpose processors • ALCHEMY:Architectures, Languages and Compilers to Harness the End of Moore Years • COMPSYS: Compiling Systems on Silicium • R2D2: Reconfigurable and Retargetable Digital Devices About 20 researchers or faculties + about 20 Ph.D. students

COMPSYS (created 2003) T. Risset, P. Fautrier, A. Darte: background in automatic parallelization (polyhedral model), systolic networks, software pipeline Applying parallel compiler techniques to embedded systems: -Compilers -”From the algorithm to a silicon system”

ASIC Processor Memory Reconfigurable DART Reconfigurable FPGA DSP R2D2 team (Rennes)Reconfigurable and Retargetable Digital Devices Goal: search for the best compromise between high-performance, power consumption and flexibility using reconfigurable hardware for embedded systems

R2D2 research fields • Compilation, synthesis targeting reconfigurable architectures • High-level synthesis from high-level specifications • Retargetable compilation and processor core modelling • Floating point to fixed-point conversion methodology targeting software (DSP) and hardware (FPGA) • New architectures and technologies • Coarse-grained reconfigurable architecture (DART reconfigurable data path) • Multiple-valued logic (MVL) architectures and circuits • Prototyping of applications on reconfigurable platforms • 3G and 4G mobile application prototyping • Contacts F. Charot and O. Sentieys

ALCHEMY (Saclay ) Architectures, Languages and Compilers to Harness the End of Moore Years • Main topic : Long-Term performance scalability of architectures • 4 faculty • A. Cohen : compiler, automatic parallelization • C. Eisenbeis: software pipeline, software cache optimization • F. Gruau: futuristic architecture model • O. Temam: microarchitecture, software cache management • 10 PhDs

ALCHEMY New technologies ? BLOB Computing (model) Complex systems Language+Architecture (domain-specific, e.g., video processing) Bio-inspired systems Language+Architecture (general-purpose) Scalable processor architectures Compiler optimizations Simulators: Development & Execution (MicroLib) Methodology (comparison) Processor simulation Compositions of transformations (polyhedral model) Architecture optimizations Manual optimizations (decision tree) Influence of data sets Architecture- inspired software optimizations (VHC) Iterative optimization (environment & COP) Algorithm selection Customization (functions) Software optimizations compatible with complex processor architectures & applications Performance  Usage ?

Symbiotic Processing • The clock race is ending (see Intel) • Easily translate on-chipspace into performance • Focus on on-chip parallelism • Constraints • Application • let user extract parallelism effortlessly through proper programming paradigm • Architecture • no central control (too large space), local control only • avoid complex architectures • Spatial computing • break down program into independent/interacting objects • only local actions • architecture manages objects • architecture exploits parallelism (object execution & replication) depending on available resources/space • Application to future SMTs & CMPs

MicroLibhttp://www.microlib.org • Change the way people do research in the domain: facilitate exchange, reuse and comparison of ideas • A library of simulator components (cache, BP, FU…) for complex processors • Open library (open-source, anyone can participate) • Already heavily used • processor architecture components • cache, branch prediction, scheduling… • research modules • 12 data cache mechanisms (MICRO 2004) • full processor simulators • PowerPC 750 (~15% accurate) • 2000+ downloads in 18 months • OoOSysC (generic superscalar) • simple MIPS 2000-like RISC

CAPS (Rennes)Compiler and Architecture for Superscalar and Special purpose processors • Two interacting activities • microprocessor architecture (A. Seznec, P. Michaud) • High performance • Migrating high performance concepts to embedded systems • Performance oriented compilation (F. Bodin) • High performance • Embedded processors + Recently: Worst case execution time analysis (I. Puaut)

CAPS background • Parallel memory systems • Pipeline structure • Caches: • Skewed associative caches • Decoupled sectored caches • Simultaneous Multithreading • Software cache management • Software pipeline • Special purpose compilers • WCET analysis

Processor architecturerecent contributions • Global history branch predictors and instruction fetch front-end • 2bcgskew used in Compaq EV8 • Pipelining the I-fetch front end • O-GEHL branch predictor • Limiting hardware complexity on superscalar processors • Dataflow prescheduling: instruction window • WSRS architecture: register file, bypass network and issue logic • Thread parallelism and single chip parallelism: • CASH: CMP and SMT hybrid • Execution migration: single thread on a multicore, to use all the cache space

architecture/compiler interaction • ISA simulation: • ABSCISS: ISA and architecture retargetable high speed simulator for VLIW processor • IATO: simulation of out-of-order execution IA64 • Low power and architecture configurability: • Cache reconfiguration at software level on phase basis • Hardware/software speculative management of data path and register file width • SWARP: retargetable C-to-C preprocessor to enhance multimedia instruction use

Compiler and software environmentsrecent contributions • Artificial intelligence in performance tuning • CAHT: case based reasoning for assisting performance tuning • Automatic derivation of compiler heuristics: using machine learning to derive compiler heuristics • Performance code size tradeoffs: • Iterative compilation • Mixing interpretation on compressed code and native execution

Aware of (real ) hardware implementation issues • A. Seznec’s sabbatical with Compaq Alpha Development group (1999-2000): • EV8 branch predictor directly issued from CAPS project-team researches • Parallel access scheme to strided vectors in caches in Tarantula vector processor project directly derived from “old” vector CAPS background • P. Michaud’s sabbatical with Intel (2001-2002): • Still covered by NDA

ABSCISS: retargetable processor simulation SALTO: System for Assembly Languages Transformation and optimization SWARP: C-to-C retargetable preprocessor for multimedia instructions Menhir: Matlab to C parallel code generator PACCMAN compiler/simulator HAVEGE random number generator IATO toolkit: IA64 simulation Transferred to industry Distributed on demand, also transferred to industry Transferred to industry Transferred to industry Maintenance by industry Distributed for non-commercial use GPL Many (mature) software developments

Aware of (real) software issuesSet-up of the start-up CAPS Entreprise(2003) • Software tools for high performance and embedded systems: Simulation, code transformation, worst-case execution time • Custom consulting services: Performance analysis, instruction set evaluations, .. • Awarded as an innovative company by ministry of Industry • Currently 6 employees, including 5 former CAPS project-team members

CAPS future objectives • Leverage current expertise of CAPS core team • Microarchitecture: • From “ultimate performance” to “ maintaining performance to cheaper” • Migrating “high-end” concepts to embedded processors • SMT/CMP:Tradeoffs, sharing, synchronization • Compiler/code generation: • performance = (often) unpredictability and unstabibility • Dimensioning a system ? Real time constraints ? • Predictable/stable performance oriented code generation/architecture

Architecture and Code optimization at INRIA March 2005 André Seznec

Architecture and Code optimization at INRIA March 2005 André Seznec

Presentation Transcript