▬▀► By Trevor Tonn CS147 Spring 2009 SPARC Architecture
Outline • RISC • The SPARC Architecture • SPARC in the Marketplace • SPARC Today and Tomorrow
RISC: Defintion & Background • Reduced Instruction Set Computing, also called “load-store architecture” (due to the load-store instructions for accessing memory). • In the 1970's researchers noticed that a promising alternative to providing a large set of instructions to the ISA (instruction set arch.) would be to support only the most frequently used instructions, leaving the scarcely used instructions to be implemented by these as instruction sequences.
RISC: Which instructions are used most? Developers at IBM in the 70's found that 80% of a typical programs compu-tations required only about 20% of the instructions available in the processor's ISA. Focus: few, well-chosen, simple instructions and an optimizing compiler.
RISC: Design Principles • Simple instructions and few addressing modes • Extra stuff left out saves space for other things • Instructions conform to a simple format • Reduces decoding delays • Load-store design • Everything operates on registers only; load from memory to the large number of registers first, then manipulate the registers only; store to memory when done.
RISC: Design Principles(cont) • Hardwired control—no microcode • No translation from machine instructions, freeing CPU cycles needed to perform an instruction. • Also frees up chip space. • Goal: Execute one instruction per clock cycle • Uses pipelining and other features/principles described to get there. • Simplicity to facilitate the use of higher frequency clock cycles
RISC: Advantages • The fewer number of instructions required relatively little on-chip control logic, leaving space on the chip for other functions: enhances performance & versatility of the processor. • One example is the use of a large Register File (array of registers defined by the ISA) to allow for the register-window approach utilized in SPARC; lots of registers available. • Fast cycle time coupled with a high performance memory hierarchy can yield incredible processing power.
RISC: Advantages (cont) The execution time for a large, compute-bound program can be expressed as the product of three terms: The second equation is a simplified version of the first. where: Ip= # of instr executed by program Cp= avg # of cpu clock cycles per instr executed by program T = time per cycle (usually 1/clockCycle) MIPSp = million instr per second
RISC: Disadvantages • Lack the more powerful instructions of CISC; requires many clock cycles to execute the many simple instructions that make up the equivalent instruction sequences. • Execution of a lot of small instructions causes a lot of instruction traffic—more than CISC.
RISC: Solutions & Impure RISC • Instead of focusing on the most commonly used ops, add in some sets of complex instructions whose equivalent instruction sequences bring RISC to its knees. • If you need fast floating point performance, add in some complex fp instructions, for example. • Use free chip space to facilitate techniques to ease instruction traffic problem. • Register-window technique used in SPARC.
RISC: Register-window • Each assembly procedure has a “window” of registers available to it, with an area of overlap between the procedure and the calling procedure to facilitate efficient parameter passing—no need to save or reload registers. • The windows change dynamically on procedure entry/exit. • Reduces instruction traffic.
SPARC: What is it? • Scalable Processor ARChitecture • Designed by Sun Microsystems 1984-1987. • Based on RISC work done at UC Berkeley in 1980-82. • An architecture with many families of processors created by several companies.
SPARC: Brief History • 3 major revisions to the architecture • SPARC-V7, 32bit, 1986 • SPARC-V8, 32bit, 1990 • SPARC-V9, 64bit, 1993 • UltraSPARC extension, 1995 • Backwards, binary compatibility between all revisions
SPARC: Brief History (cont) • V9 greatly improves upon V8: • 64bit integer mul & div instructions • load/store floating-point quadword instructions • Load & store 128bits at a time • Software-settable branch prediction • Branch on register value • Reduces total number of instructions to execute • Conditional move instruction • Allows you to remove branch instructions. • Improved support for very large-scale multiprocessors • Relaxed memory ordering model
SPARC: Architecture Features • Integer unit (IU), floating-point unit (FPU), optional implementation-defined coprocessor (CP), each with its own set of registers. • Allows for maximum concurrency between integer, floating-point & coprocessor instructions. • All IU & FPU registers are 32bits wide • Instructions operate on single, pairs and quads of registers.
SPARC: Integer & Floating-point Units • IU may contain between 40 and 520 general purpose registers. FPU has 32 registers. • Groups of 2 to 32 overlapping register windows • Register windows perform well with LISP and OO languages like Smalltalk. • No direct path between FPU & IU—must be accessed by load/store calls. • FPU can have several multipliers & adders • Implementation dependent • FPU: Concurrent execution of add/mul & load/store
SPARC: Multiprocessor instructions • Two special instructions support tightly coupled multiprocessors: • swap • Exchanges contents of an IU register with a word from memory while preventing other memory accesses from intervening on the memory or I/O bus. • Can be used with a CP to perform other synchronization techniques. • ldstub • Can be used to create semaphores.
SPARC: Firsts... • register windows (1987) • 32-way server on a chip (UltraSPARC T1, 2005) • 64-way SuperServer (SuperSPARC XDBus, 1994) • Major 64bit architecture (UltraSPARC, 1995) • IBM POWER in 1998, x86_64 in 2003 • Many more...
SPARC in the Marketplace • Sun developed Solaris (a UNIX variant) and sold hardware—sell them together. • Licensed implementations by other companies like Fujitsu, LSI Logic, and Texas Instruments • Open sourced the design of UltraSPARC, allowing grass-roots development and new implementations—www.opensparc.net • OpenSPARC T1 and T2 processors, 2005 & 2008 respectively • “open source processors”
TODAY: Architecture • V9 JPS1 • Although SPARC V9 allows its implementations freedom in their MMU (memory management unit) designs, SPARC JPS1 defines a common MMU architecture with some specifics left to implementations • Current products like the Sun SPARC64 series are based on this specification. • Still binary compatible with V8 implementations.
TODAY: Current Features • Characteristics of current V9 implementations: • Multi-core • Ex: Fujitsu SPARC Enterprise M9000 • Quadcore SPARC64 VII processors scalable to 256 cores (64 processors) • 2.52GHz maximum clock frequency • SPARC V9/JPS2 architecture implementation • Multiple threads per core • Multi-threading technology minimizes CPU core wait times and increases CPU core utilization. • SMT (Simultaneous Multithreading) enables two threads running in parallel.
TOMORROW: Future of SPARC • Sun's involvement is uncertain as they've recently claimed “4 years to go” in their transition to x86. • Sun is still developing processors based on V9 architecture and its updates • 'Rock' processor to be released in 2009 • OpenSPARC continues to build on UltraSPARC architecture, which is a descendent & fully compatible with V9. • Active community involvement, new “open” processor implementations of architecture.
References • Stone, Harold S. High-performance computer architecture. Menlo Park: Addison-Wesley, 1990 • Šilc, Jurij, Borut Robič, Theo Ungerer. Processor Architecture: From Dataflow to Superscalar and Beyond. Berlin: Springer-Verlag, 1999 • Weaver, David L., Tom Germond. The SPARC Architecture Manual. Menlo Park: PTR Prentice Hall, 1994 • Catanzaro, Ben J. The SPARC Technical Papers. New York: Springer-Verlag, 1991 • OpenSPARC T1 Microarchitecture Specification. Revision A. Sun Microsystems. Santa Clara, 2006 • SPARC Joint Programming Specification (JPS1): Commonality. Release 1.0.4. Sun Microsystems and Fujitsu Limited. Santa Clara/Japan, 2002