1 / 7

Introduction to the Cell Multiprocessor

J. A. Kahle , M. N. Day, H. P. Hofstee , C. R. Johns, T. R. Maeurer , and D. Shippy IBM Systems and Technology Group IBM Journal of Research and Development Vol. 49, No. 4/5, Pg. 589 (Jul-Sep 2005) Presented by John Ingalls ECE 259 - April 8, 2010.

amos-guzman
Download Presentation

Introduction to the Cell Multiprocessor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, and D. Shippy IBM Systems and Technology Group IBM Journal of Research and Development Vol. 49, No. 4/5, Pg. 589 (Jul-Sep 2005) Presented by John Ingalls ECE 259 - April 8, 2010 Introduction to the Cell Multiprocessor

  2. ISA: 64-bit IBM Power Architecture with SIMD. • 1 PPE, 8 SPEs, 1 memory and 1 I/O controller all on coherent bus (single address space). • PowerPE: 2-issue in-order 2-thread-SMT, 32KB L1 I$/D$, 512KB L2$ with software management hooks, 128-bit total SIMD width, separate Vector/SIMD issue queue from scalar execute. Design Summary: PPE

  3. SynergisticPE: in-order SIMD. 128-bit total width, like PPE. • Local Store (LS): 256KB, single port for either 128-bit SIMD-word access, or 128-byte insns fetch or DMA I/O. • 128-entry regfile for static (compiler) insn reordering • area efficient: 15% control, rest is Execute & Local Store Design Summary: SPE

  4. I/O supports direct connection to another Cell to easily build a cache-coherent multiprocessor. • Native binary compatibility with Power-ISA apps. • Modular design, but still fully custom. • Extensive test and monitoring circuitry. Other Features

  5. Challenges: • SPE Local Store is software managed. • Each SPE supports one thread context, and context switches are expensive. • Models: • Function Offload: function call from PPE • Device Extension: SPE isolated, like a device • Compute Acceleration: PPE aggregates SPE results • Streaming: each SPE is a step in software pipeline • Shared Memory Multiprocessor: conventional • Asymmetric Thread Runtime: p-threads Programming

  6. Good Bad • Paper is easy to follow and doesn’t throw too much complicated stuff at reader. • Built and shipped on time by a joint venture of IBM, Sony, and Toshiba. • Many applications in media and supercomputing. • They keep listing static limitations imposed by their models as advantages, such as explicitly managed caches. • No hard performance data or comparison to competition. Only “anecdotal evidence” shows that it is possible to fully utilize Cell.

  7. Keywords: • Heterogeneous multi-core SIMD processor. • Single address space across all cores on chip • 1x conventional PPE for control. • 8x SPEs for streaming SIMD are very fast and power efficient if used. • Several programming models are feasible. • Questions: • How could the programming models be easier? • What direction should this architecture grow in? Conclusion / Questions

More Related