Platform Design

Platform Design Multi-Processor Systems-on-Chip MPSoC TU/e 5kk70 Henk Corporaal Bart Mesman

Overview • What is a platform, and why platform based design? • Why parallel platforms? • A first classification of parallel systems • Design choices for parallel systems • Shared memory systems • Memory Coherency, Consistency, Synchronization, Mutual exlusion • Message passing systems • Further decisions Platform Design H. Corporaal and B. Mesman

Design & Product requirements? • Short Time-to-Market • Reuse / Standards • Short design time • Flexible solution • Reduces design time • Extends product lifetime; remote inspect and debug, … • Scalability • High performance and Low power • Memory bottleneck, Wiring bottleneck • Low cost • High quality, reliability, dependability • RTOS and libs • Good programming environment Platform Design H. Corporaal and B. Mesman

Solution ? • Platforms • Programmable • One or more processor cores • Reconfigurable • Scalable and flexible • Memory hierarchy • Exploit locality • Separate local and global wiring • HW and SW IP reuse • Standardization (on SW and HW-interfaces) • Raising design abstraction level • Reliable • Cheaper • Advanced Design Flow for Platforms Platform Design H. Corporaal and B. Mesman

What is a platform? Definition: A platform is a generic, but domain specific information processing (sub-)system Generic means that it is flexible, containing programmable component(s). Platforms are meant to quickly realize your next system (in a certain domain). Single chip? Platform Design H. Corporaal and B. Mesman

Example Platform: Sanyo Camera Platform Design H. Corporaal and B. Mesman

192Kbyte shared SRAM 8Kb data cache (2-way, 512 lines of 16 bytes) Write buffer (17 elements) 16Kb (2-way) 16Kb (2-way) 8Kb mem (2x 4K) 64Kb dual port (8x 4K x 16b) 96Kb single port (12x 4k x 16b) 32Kb ROM Platform example: TI OMAP Up to 192Mbyte off-chip memory Platform Design H. Corporaal and B. Mesman

Platform and platform design Applications SDT system design technology Design technology Platform PDT platform design technology Enabling technologies Platform Design H. Corporaal and B. Mesman

Why parallel processing • Performance drive • Diminishing returns for exploiting ILP and OLP • Multiple processors fit easily on a chip • Cost effective (just connect existing processors or processor cores) • Low power: parallelism may allow lowering Vdd However: • Parallel programming is hard Platform Design H. Corporaal and B. Mesman

Low power through parallelism • Sequential Processor • Switching capacitance C • Frequency f • Voltage V • P = fCV2 • Parallel Processor (two times the number of units) • Switching capacitance 2C • Frequency f/2 • Voltage V’ < V • P = f/2 2C V’2 =fCV’2 Platform Design H. Corporaal and B. Mesman

Power efficiency: compare 2 examples • Intel Pentium-4 (Northwood) in 0.13 micron technology • 3.0 GHz • 20 pipeline stages • Aggressive buffering to boost clock frequency • 13 nano Joule / instruction • Philips Trimedia “Lite” in 0.13 micron technology • 250 MHz • 8 pipeline stages • Relaxed buffering, focus on instruction parallelism • 0.2 nano Joule / instruction • Trimedia is doing 65x better than Pentium Platform Design H. Corporaal and B. Mesman

Parallel Architecture • Parallel Architecture extends traditional computer architecture with a communication network • abstractions (HW/SW interface) • organizational structure to realize abstraction efficiently Communication Network Processing node Processing node Processing node Processing node Processing node Platform Design H. Corporaal and B. Mesman

Platform characteristics • System level • Processor level • Communication network • Memory system • Tooling Platform Design H. Corporaal and B. Mesman

System level characteristics • Homogeneous  Heterogeneous • Granularity of processing elements • Type of supported parallelism: TLP, DLP • Runtime mapping support? Platform Design H. Corporaal and B. Mesman

Homogeneous or Heterogeneous • Homogenous: • replication effect • memory dominated any way • solve realization issuesonce and for all • less flexible • Typically: • data level parallelism • shared memory • dynamic task mapping Platform Design H. Corporaal and B. Mesman

TM TM TM TM TM TM TM memory ARM pixel simd video scale picture improve Example: Philips Wasabi • Homogeneous multiprocessor for media applications • Two-level communication hierarchy • Top: scalable message passingnetwork plus tiles • Tile: shared memory plus processors, accelerators • Fully cache coherent to support data parallelism

Homogeneous or Heterogeneous • Heterogeneous • better fit to application domain • smaller increments • Typically: • task level parallelism • message passing • static task mapping Platform Design H. Corporaal and B. Mesman

MBS VMPG TM3260 TDCS VIP MIPS PR4450 TM3260 QVCP5L MSP MDCS QVCP2L Example: Viper2 • Heterogeneous • Platform based • >60 different cores • Task parallelism • Sync with interrupts • Streaming communication • Semi-static application graph • 50 M transistors • 120nm technology • Powerful, efficient

Homogeneous or Heterogeneous • Middle of the road approach • Flexibile tiles • Fixed tile structure at top level Platform Design H. Corporaal and B. Mesman

DLP Homogenous SIMD / Vector Module level Types of parallelism TLP Heterogenous Multi-threaded / MIMD Program/Thread level Kernel level ILP Heterogenous VLIW / Superscalar/ Dataflow arch. Platform Design H. Corporaal and B. Mesman

Processor level characteristics Processor consists of • Instruction engine (Control Processor, Ifetch unit) • Processing element (PE): Register file, Function unit(s), L1 DMem • Single PE  Multiple PEs (as in SIMD) • Single FU/PE  Multiple FUs/PE (as in VLIW) • Granularity of PEs, FUs • Specialized  Generic • Interruptable, pre-emption support • Multithreading support (fast context switches) • Clustering of PEs; Clustering of FUs • Type of inter PE and inter FU communication network • Others: MMU – virtual memory, ….. Platform Design H. Corporaal and B. Mesman

Generic or Specialized?Intrinsic computational efficiency Platform Design H. Corporaal and B. Mesman

0 M u x 1 I F / I D I D / E X E X / M E M M E M / W B A d d A d d 4 A d d r e s u l t S h i f t l e f t 2 R e a d n o r e g i s t e r 1 i A d d r e s s P C t R e a d c u d a t a 1 r t R e a d s Z e r o n r e g i s t e r 2 I I n s t r u c t i o n R e g i s t e r s A L U R e a d A L U m e m o r y 0 R e a d W r i t e A d d r e s s d a t a 2 1 r e s u l t d a t a r e g i s t e r M M u D a t a u W r i t e x m e m o r y x d a t a 1 0 W r i t e d a t a 1 6 3 2 S i g n e x t e n d General processor organization PE: processing engine Instruction fetch - Control FU Platform Design H. Corporaal and B. Mesman

DMem DMem DMem DMem DMem DMem DMem DMem DMem RF RF RF RF RF RF RF RF RF FU FU FU FU FU FU FU FU FU (Linear) SIMD Architecture Control Processor IMem PE1 PEn • To be added: • inter PE communication • communication from PEs to Control Processor • Input and Output Platform Design H. Corporaal and B. Mesman

Communication network • Bus (single all2all connection)  Crossbar  NoC with point-to-point connections • Topology, Router degree • Routing • path, path control, collision resolvement, network support, deadlock handling, livelock handling • virtual layer support • flow control and buffering • error handling • Inter-chip network support • Guarantees • TDMA • GT  BE traffic • etc, etc. Platform Design H. Corporaal and B. Mesman

Comm. Network: Performance metrics • Network Bandwidth • Need high bandwidth in communication • How does it scale with number of nodes? • Communication Latency • Affects performance, since processor may have to wait • Affects ease of programming, since it requires more thought to overlap communication and computation • Latency Hiding • Global memory access can take hundreds of cycles • How can a mechanism help hide latency? • Examples: • overlap message send with computation, • prefetch data, • switch to other tasks Platform Design H. Corporaal and B. Mesman

How good is your network? Topology determines: • Degree = number of links from a node • Diameter = max number of links crossed between nodes • Average distance = number of links to random destination • Bisection = minimum number of links that separate the network into two halves • Bisection bandwidth = link bandwidth x bisection Platform Design H. Corporaal and B. Mesman

Metrics for common topologies Type Degree Diameter Ave Dist Bisection 1D mesh 2 N-1 N/3 1 2D mesh 4 2(N1/2 - 1) 2N1/2 / 3 N1/2 3D mesh 6 3(N1/3 - 1) 3N1/3 / 3 N2/3 nD mesh 2n n(N1/n - 1) nN1/n / 3 N(n-1) / n Ring 2 N/2 N/4 2 2D torus 4 N1/2 N1/2 / 2 2N1/2 Hypercube Log2N n=Log2N n/2 N/2 2D Tree 3 2Log2N ~2Log2 N 1 Crossbar N-1 1 1 N2/2 N = number of nodes, n = dimension Platform Design H. Corporaal and B. Mesman

More topology metrics Hypercube Grid/Mesh Torus Assume 64 nodes: Platform Design H. Corporaal and B. Mesman

How to make a bigger butterfly network? N/2 Butterfly ° ° ° N/2 Butterfly ° ° ° Multi-stage network: Butterfly or Omega • All paths equal length • Unique path from any input to any output • Try to avoid conflicts !! 8 x 8 butterfly switch Platform Design H. Corporaal and B. Mesman

Multistage Fat Tree • A multistage fat tree (CM-5) avoids congestion at the root node • Randomly assign packets to different paths on way up to spread the load • Increase degree near root, decrease congestion Platform Design H. Corporaal and B. Mesman

What did architects design in the 90ties?Old (off-chip) MP Networks Name Number Topology Bits Clock Link Bis. BW Year nCube/ten 1-1024 10-cube 1 10 MHz 1.2 640 1987 iPSC/2 16-128 7-cube 1 16 MHz 2 345 1988 MP-1216 32-512 2D grid 1 25 MHz 3 1,300 1989 Delta 540 2D grid 16 40 MHz 40 640 1991 CM-5 32-2048 fat tree 4 40 MHz 20 10,240 1991 CS-2 32-1024 fat tree 8 70 MHz 50 50,000 1992 Paragon 4-1024 2D grid 16 100 MHz 200 6,400 1992 T3D 16-1024 3D Torus 16 150 MHz 300 19,200 1993 MBytes/s No standard topology! However, for on-chip: mesh and torus are in favor ! Platform Design H. Corporaal and B. Mesman

Memory hierarchy • Number of memory levels: 1, 2, 3, 4 • HW  SW controlled level 1 • Cache or Scratchpad memory L1 • Central  Distributed memory • Shared  Distributed memory address space • Intelligent DMA support: Communication Assist • For shared memory: • coherency • consistency • synchronization Platform Design H. Corporaal and B. Mesman

Processor-Memory Performance Gap:(grows 50% / year) Intermezzo:What’s the problem with memory ? Performance µProc: 55%/year 1000 [Patterson] CPU 100 “Moore’s Law” 10 DRAM: 7%/year DRAM 1 1980 1985 1990 1995 2000 Time Memories can be also big power consumers ! Platform Design H. Corporaal and B. Mesman

Multiple levels of memory Architecture concept: Reconfigurable HW blocks Reconfigurable HW blocks CPUs Accelerators CPUs Accelerators Reconfigurable HW blocks Accelerators CPUs Communication network Memory Memory I/O Level 0 Communication network Level 1 Communication network Memory I/O Memory Level N Platform Design H. Corporaal and B. Mesman

Communication models: Shared Memory Shared Memory (read, write) (read, write) Process P2 Process P1 • Coherence problem • Memory consistency issue • Synchronization problem Platform Design H. Corporaal and B. Mesman

Communication models: Shared memory • Shared address space • Communication primitives: • load, store, atomic swap Two varieties: • Physically shared => Symmetric Multi-Processors (SMP) • usually combined with local caching • Physically distributed => Distributed Shared Memory (DSM) Platform Design H. Corporaal and B. Mesman

Processor Processor Processor Processor One or more cache levels One or more cache levels One or more cache levels One or more cache levels SMP: Symmetric Multi-Processor • Memory: centralized with uniform access time (UMA) and bus interconnect, I/O • Examples: Sun Enterprise 6000, SGI Challenge, Intel Main memory I/O System Platform Design H. Corporaal and B. Mesman

Processor Processor Processor Processor Cache Cache Cache Cache Memory Memory Memory Memory DSM: Distributed Shared Memory • Nonuniform access time (NUMA) and scalable interconnect (distributed memory) Interconnection Network Main memory I/O System Platform Design H. Corporaal and B. Mesman

Shared Address Model Summary • Each processor can name every physical location in the machine • Each process can name all data it shares with other processes • Data transfer via load and store • Data size: byte, word, ... or cache blocks • Memory hierarchy model applies: • communication moves data to local proc. cache Platform Design H. Corporaal and B. Mesman

receive send Process P2 Process P1 send receive FiFO Communication models: Message Passing • Communication primitives • e.g., send, receive library calls • Note that MP can be build on top of SM and vice versa Platform Design H. Corporaal and B. Mesman

Message Passing Model • Explicit message send and receive operations • Send specifies local buffer + receiving process on remote computer • Receive specifies sending process on remote computer + local buffer to place data • Typically blocking communication, but may use DMA Message structure Header Data Trailer Platform Design H. Corporaal and B. Mesman

Network interface Network interface Network interface Network interface DMA DMA DMA DMA Message passing communication Processor Processor Processor Processor Cache Cache Cache Cache Memory Memory Memory Memory Interconnection Network Platform Design H. Corporaal and B. Mesman

Communication Models: Comparison • Shared-Memory • Compatibility with well-understood (language) mechanisms • Ease of programming for complex or dynamic communications patterns • Shared-memory applications; sharing of large data structures • Efficient for small items • Supports hardware caching • Messaging Passing • Simpler hardware • Explicit communication • Improved synchronization Platform Design H. Corporaal and B. Mesman

Challenges of parallel processing Q1: can we get linear speedup Suppose we want speedup 80 with 100 processors. What fraction of the original computation can be sequential (i.e. non-parallel)? Q2: how important is communication latency Suppose 0.2 % of all accesses are remote, and require 100 cycles on a processor with base CPI = 0.5 What’s the communication impact? Platform Design H. Corporaal and B. Mesman

Three fundamental issues for shared memory multiprocessors • Coherence, about: Do I see the most recent data? • Consistency, about: When do I see a written value? • e.g. do different processors see writes at the same time (w.r.t. other memory accesses)? • SynchronizationHow to synchronize processes? • how to protect access to shared data? Platform Design H. Corporaal and B. Mesman

CPU CPU cache cache a' 550 a' 100 b' 200 b' 200 memory memory a 100 a 100 b 200 b 440 I/O I/O Coherence problem, in single CPU system CPU cache a' 100 b' 200 memory a 100 b 200 I/O Platform Design H. Corporaal and B. Mesman

Coherence problem, in Multi-Proc system CPU-1 CPU-2 cache cache a' 550 a'' 100 b' 200 b'' 200 memory a 100 b 200 Platform Design H. Corporaal and B. Mesman

What Does Coherency Mean? • Informally: • “Any read must return the most recent write” • Too strict and too difficult to implement • Better: • “Any write must eventually be seen by a read” • All writes are seen in proper order (“serialization”) Platform Design H. Corporaal and B. Mesman

Two rules to ensure coherency • “If P writes x and P1 reads it, P’s write will be seen by P1 if the read and write are sufficiently far apart” • Writes to a single location are serialized: seen in one order • Latest write will be seen • Otherwise could see writes in illogical order (could see older value after a newer value) Platform Design H. Corporaal and B. Mesman

Platform Design

Platform Design

Presentation Transcript

Platform Design Considerations

The SOCks Design Platform

PLATFORM SYSTEM-ON-CHIP DESIGN

Microthruster Test Platform Design Presentation

Platform Design

JTI – Eco-Design Platform (Ecolonomic 1 platform)

Platform-Based Reconfigurable Computing Design

Platform Design

Platform-based Design

Platform-based design

Platform design for data sharing

Platform-based Design 5KK70 MPSoC

Platform-based Design 5KK70 MPSoC

Facebook Platform and Design

Platform-based Design 5kk70

Defining Platform-Based Design

Platform-based Design 5KK70 MPSoC

Concrete Platform Design ( StructurePoint spSlab )

Platform Design

Platform-based Design

Platform-based Design 5KK70 MPSoC

Platform based design