1 / 32

Programming the Cell Multiprocessor

Programming the Cell Multiprocessor. Işıl ÖZ. Outline. Cell processor Objectives Design and architecture Programming the cell Programming models CellSs. Cell Processor. Cell Broadband Engine Architecture Cell BE Developed by STI (SCEI-Toshiba-IBM) design center STI formed in 2000

ryu
Download Presentation

Programming the Cell Multiprocessor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Programming the Cell Multiprocessor Işıl ÖZ

  2. Outline • Cell processor • Objectives • Design and architecture • Programming the cell • Programming models • CellSs

  3. Cell Processor • Cell Broadband Engine Architecture • Cell BE • Developed by STI (SCEI-Toshiba-IBM) design center • STI formed in 2000 • STI design center opened in 2001 • Introduced in 2005 • 65 nm in 2007, 45 nm in 2008

  4. Cell Processor Objectives • Outstanding performance especially on game/multimedia applications • Memory latency • Power efficiency • Processor frequency and pipeline depth • Real time response to the user and the network • Applicable to a wide range of platforms • Support for introduction in 2005

  5. Cell Architecture • a 64-bit Power processor element (PPE) • 8 synergistic processor elements (SPE) • Memory controller • Bus-interface controller • Element interconnect bus

  6. Power Processor Elements • PPE • Power core • First level cache L1 • Second level cache L2

  7. PPE Major Units

  8. Synergistic Processor Elements • SPEs • DMA (Direct Memory Access Unit) • LS (Local Store Memory) • SXUs (Execution Units)

  9. SPE Organization

  10. Controllers • Memory Interface Controller • interfaces to the Rambus XDR I/O unit which communicates directly to DRAMmodules • Bus Interface Controller • interfaces to the Rambus FlexIO which provides to communicate with system components

  11. Element Interconnect Bus • EIB • Coherent, on-chip bus • Connects the processing elements, memoryand I/O devices

  12. Programming the Cell • Local store memory in SPEs (256KB) • SIMD nature of dataflows • The size of the register file (128 bits) • Single program context

  13. Programming Models • Function offload model • Device extension model • Computational acceleration model • Streaming models • Shared-memory multiprocessor model • Asymmetric thread runtime model

  14. A programming model:CellSs • Cell superscalar • Simple and flexible • Automatic parallelism of sequential program • Task scheduling and data handling

  15. CellSs Structure • Based on • code annotations • C language • Composed of • Source compiler • Runtime library

  16. CellSs Compilation Environment

  17. CellSs Compiler • Source to source compiler • Function(task) to be executed in the SPEs • Function parameter directions • Parameters that are arrays and their lengths • No pointers!

  18. Parallelism on CellSs Annotated code Generated code for the SPE Generated code for the PPE

  19. CellSs Syntax • Three types of pragmas • initialization and finalization • css start and css finish • task • css task [input inout output] • synchronization • css wait

  20. Example CellSs Source Code task start/finish wait for task

  21. CellSs Runtime • Execute function • Add a node in task graph • Data dependency analysis (RaW, WaR, Waw) • Parameters renaming • Task submission

  22. CellSs Runtime Behavior

  23. Middleware for the Cell • Task scheduling • task control buffer • task grouping • dynamic scheduling

  24. Locality Aware Task Scheduling

  25. Tracing • Generates Paraver trace files by a tracing component embedded in the CellSs runtime • when the main program enters or exits • when an annotated function is called in the main program • when a task is started or finished

  26. Performance Analysis • Matmul • Block matrix multiplication • TSP • Recursive implementation of Traveling Salesman Problem • Cholesky • Block matrix Cholesky factorization

  27. Performance Analysis • TSP • No data dependency • Cholesky • Highly connected data dependency graph

  28. Performance Analysis • x-axis : timeline • y-axis : a thread of the application • green : events • yellow : communications

  29. Performance Analysis • yellow : SPE thread DMA transfer • brown : SPE executing the task

  30. Pros and Cons • annotations • simple • but limited • data transfer transparently to the user code • task dependency analysis • rely on other compilers for • code vectorization (SPE performance) • lower level code optimization

  31. Related Work • OpenMP • Accelerated Library Framework (ALF) • Thread level synchronization • Sequoia • Rapidmind • Ohara • Graphics Processor Units (GPUs)

  32. References • J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy, “Introductionto the Cellmultiprocessor”, IBM J. Res. & Dev. Vol. 49 No. 4/5July/ September2005. • Pieter Bellens, Josep M. Perez, Rosa M. Badia and Jesus Labarta, “CellSs: a Programming Model for the Cell BE Architecture”, Supercomputing Conference, 2006. • M. W. Riley, J. D. Warnock, D. F. Wendel, “Cell BroadbandEngine processor:Design andimplementation”, IBM J. Res. & Dev. Vol. 51 No. 5 September2007. • J. M. Perez, P. Bellens, R. M. Badia, J. Labarta, “CellSs: Makingit easier toprogram the CellBroadbandEngineprocessor”, IBM J. Res. & Dev. Vol. 51 No. 5 September2007. • http://www.ibm.com/developerworks/power/cell/ • www.bsc.es/cellsuperscalar

More Related