1 / 49

Survey of multicore architectures

Survey of multicore architectures. Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy. Summary. CELL processor Reconfigurable devices Software-Hardware co-design Parallel programming problems data dependencies process synchronization memory barriers locking mechanisms

brina
Download Presentation

Survey of multicore architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy

  2. Summary CELL processor Reconfigurable devices Software-Hardware co-design Parallel programming problems data dependencies process synchronization memory barriers locking mechanisms Language extensions for parallel programming Real-time multiprocessor scheduling

  3. Cell processor A Cell Processor

  4. Cell History

  5. Cell basic concepts

  6. Cell synergy

  7. Cell Chip

  8. Cell features

  9. Cell Processor Components

  10. Cell Processor Components

  11. Cell Processor Components

  12. Cell Processor Components

  13. Synergistic Processor Element (SPE)

  14. SPE

  15. SPE details

  16. Element Interconnect Bus (EIB)

  17. EIB: Data topology

  18. Example: 8 concurrent transactions

  19. Theoretical peak operations

  20. Cell BE performance

  21. Why is Cell Processor so fast?

  22. CELL software environment

  23. System Level Simulator

  24. SPE management library

  25. CELL parallelism

  26. Typical CELL sw development flow

  27. ARM’s MPcore

  28. PicoArray (by PicoChip)

  29. PicoArray scaling

  30. FPGA and Reconfigurable devices

  31. Field Programmable Gate Arrays SRAM-based matrix of integrated elements whose interconnections can be programmed statically or even dynamically Basic block is Logic Element (LE) Chip capacities from 1k to 1000k LEs Each LE is typically composed by logic gates, LUTs, Flip-Flops and latches Need for optimized CAD or pre-binded design libraries

  32. FPGA CSL organization: Basic Logic Element:

  33. Altera’s Stratix IV basic block Adaptive Logic Module (ALM)

  34. Flexibility vs efficiency

  35. Reconfigurable devices advantages Efficiency AND Flexibility Time to market Easier upgrade Lower cost (on scale production) Reusable IP Customable interface

  36. Reconfigurable devices parameters Block granularity Coarse grained: Functional Units, Processor Cores, Memory Tiles Fin grained: gate and register level Density Reconfiguration time Compile-Time Reconfiguration (CTR) Run-Time Reconfiguration (RTR) Partial or Total reprogramming

  37. Triscend’s A7S chip

  38. Example: multiplier on Altera’s Stratix IV

  39. Typical FPGA software development environment • FPGA optimized module library • IO Editor • Generate file.h • Bind (placement and route)  file.csl • Config  file.cfg • Download

  40. Typical FPGA module library

  41. Altera’s Nios II Nios II is a soft-core processor IP that can be downloaded into an Altera’s FPGA, obtaining the functionalities of a real RISC CPU Logic elements are programmed so as to behave like gates of classic ASIC processors Different Nios versions are available faster and with full functionalities  bigger size medium sized compact but slower and with limited functionalities

  42. Nios II core

  43. Selecting Nios II e/s/f

  44. Example of a Nios II Processor system

  45. Final global layout

  46. Soft-core processors and FPGAs Possible to have multiple cores on a single chip Customizable hardware can be used to coordinate the various cores Build and test a whole multicore system in a faster time Detect and solve bottlenecks without needing to repeatedly return to the integration phase

  47. Co-design problems with FPGAs A task may be executed by a (soft-core or ASIC) processor or may be entirely implemented in hardware on the reconfigurable logic “Programming in Space” versus “Programming in Time” Centralized vs Distributed computing Sequential vs Parallel programming Interconnect Network

  48. What is a task in hardware? Software programming c=a+b; result=c/2; Hardware implementation a c + b shifter Assembler expansion: ldr r0,a ldr r1,b add r0,r0,r1 mov r0,LSR r0 str r0,result result 5 operations All in one clock cycle!

  49. Conclusions FPGAs are interesting devices for multicore systems developers Valid benchmark upon which to compare classic serial programming methods and parallel computing approaches Allow reducing time-to-market for next-generation multicore systems Provide common platforms that can easily reproduce any architecture (given a proper VHDL/Verilog description)

More Related