1 / 9

Presented by: Nick Kirchem Feb 13, 2004

Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing Luiz A. Barroso et al. (Compaq Computer Corporation). Presented by: Nick Kirchem Feb 13, 2004. Target and Motivation. Commercial applications (databases, OLTP) Most important market for high performance servers

suchi
Download Presentation

Presented by: Nick Kirchem Feb 13, 2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Piranha: A Scalable Architecture Based on Single-Chip MultiprocessingLuiz A. Barroso et al. (Compaq Computer Corporation) Presented by: Nick Kirchem Feb 13, 2004

  2. Target and Motivation • Commercial applications (databases, OLTP) • Most important market for high performance servers • Data dependent computation (low ILP) • Little gained by complex multiple issue out-of-order processors • Complexity of current processors • Long design times • High development costs • Better use of transistors?

  3. Project Goals • Design a Chip Multiprocessing (CMP) System • Integrate 8 simple processor cores on a single chip • Exploit thread-level parallelism instead of ILP • High performance, Low Cost • Achieve superior performance on commercial workloads • Small team, modest investment, short design time

  4. Architecture Overview

  5. Architecture Elements • Simple Processors (500 MHz, In-Order) • No I/O capability on chip (separate I/O nodes) • Up to 1024 nodes in a system • Individual L1 Caches (64KB, 2-way set-assoc) • One Logical L2 Cache, interleaved, 1MB • Intra-Chip Switch • Unidirectional crossbar • Transaction based, atomic transfers • Bandwidth ~3x memory bandwidth

  6. Intra-Chip Cache Coherence • MESI protocol • No Inclusion (1 MB aggregate L1, 1MB L2) • But, L2 holds copy of L1 tags and state (no snooping required at L1) • L1 filled directly from memory (L2 = victim cache) • Coherence handled by L2 controllers • Can service request directly, forward to owner L1, forward to protocol engine, obtain from Memory

  7. Inter-Node Coherence • Protocol Engines (microprogrammable controllers) • Home: exports local memory • Remote: imports remote memory • Directory Storage • Compute ECC at coarse granularity, use extra bits for directory info  no memory space overhead • Directory granularity = 1 node (not individual processor) • Interconnect: I/O queues, router (point-to-point, 4 links) • No NAKs – avoid deadlock by sufficient buffering, and guarantee forwarded requests can be serviced

  8. Performance Evaluation • OLTP and DSS workloads: TPC-B/D, Oracle database • SimOS-Alpha environment • Compared: • Piranha (P8) @ 500 MHz and Full-Custom (P8F) @ 1.25 GHz • Next-generation Microprocessor (OOO) 1 GHz • Single Chip Evaluation • OOO outperforms P1 (individual proc) by 2.3x • P8 outperforms OOO by 3x • Speedup of P8 over P1 = 7x • Multi-chip Configurations • Four chips (only 4 CPUs per chip ?!) • Results show that Piranha scales better than OOO

  9. Questions/Concerns • Would the Piranha design be worthwhile if there were a well-designed SMT processor (with 4 or 8 threads)? • Reliability better or worse with multiple chips per processor? • Power consumption?

More Related