1 / 45

Introduction to Multiprocessor System-on-Chip

Introduction to Multiprocessor System-on-Chip. Prof. Jan Madsen Informatics and Mathematical Modeling Technical University of Denmark Richard Petersens Plads, Building 321 DK2800 Lyngby, Denmark. bit-pattern. 001010100101101 101011101101010 001010011101101 110101001010011

berk-bird
Download Presentation

Introduction to Multiprocessor System-on-Chip

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to Multiprocessor System-on-Chip Prof. Jan Madsen Informatics and Mathematical Modeling Technical University of Denmark Richard Petersens Plads, Building 321 DK2800 Lyngby, Denmark

  2. bit-pattern 001010100101101 101011101101010 001010011101101 110101001010011 101010101010001 111101010111101 010111101101010 mem CPU rom func if ... then ... else ... for { ... ..} Embedded systems io (c) Jan Madsen

  3. Embedded systems • Systems which use a computer to perform a specific function, but are neither used nor perceived as a computer • They are embedded within larger electronic devices • Repeatedly carrying out a particular function • Often completely unrecognized by the device’s user (c) Jan Madsen

  4. Several design groups hardware software hardware model software model validation validation hardware prototype software prototype Problems arise at a very late point in the design process Embedded systems design Separated validations Prototype realization (c) Jan Madsen

  5. CPU void UnitControl() { up = down = 0; open = 1; while (1) { while (req == floor); open = 0; SW synthesis Interface synthesis ASIC if (req > floor) { up = 1;} else {down = 1;} while (req != floor); open = 1; delay(10); } } } HW synthesis Principples of Codesign void UnitControl() { up = down = 0; open = 1; while (1) { while (req == floor); open = 0; if (req > floor) { up = 1;} else {down = 1;} while (req != floor); open = 1; delay(10); } } } (c) Jan Madsen

  6. Overview • Technology • Processors • IC fabric • Codesign for speed-up • component execution timing (SW and HW) • Building sub-system • Hardware/software partitioning • Building system • System-level issues of codesign (c) Jan Madsen

  7. func if ... then ... else ... pe for { ... ..} Software • Elements of computation • Store data • Transform data • Move data (c) Jan Madsen

  8. Processor func if ... then ... else ... for { ... ..} • Architecture components • Processing elements – transform data • Memories – store data • Interconnect – move data (c) Jan Madsen

  9. inst mem controller datapath data mem ir cu func reg * pc +/- Processor: General Purpose func if ... then ... else ... for { ... ..} • Availability • Low cost (mass production) • Simple design flow • High flexibility (c) Jan Madsen

  10. A[i] Processor: General Purpose - example p1 func if ... inst mem controller datapath data mem then ... else ... ir cu func for { ... ..} reg * pc +/- x = x + A[i] * p1 5 cycles (c) Jan Madsen

  11. controller datapath cu mem + * +/- Processor: Custom (ASIC) func if ... then ... else ... for { ... ..} • High performance • Low power • Complex design flow • No flexibility (c) Jan Madsen

  12. Processor: Custom (ASIC) – example p1 func if ... controller datapath then ... else ... cu mem A[i] for { ... ..} + * +/- x = x + A[i] * p1 1 cycle (c) Jan Madsen

  13. inst mem controller datapath data mem ir cu func reg + * pc +/- Processor: Semicustom (ASIP) func if ... then ... else ... for { ... ..} • Costumized datapath – 16, 8 or 4 bit • Optimized for particular class of programs - MACC • ”Simple” design flow • High flexibility (c) Jan Madsen

  14. Processor: Semicustom - example p1 func if ... inst mem controller datapath data mem then ... else ... ir cu func A[i] for { ... ..} reg + * pc +/- x = x + A[i] * p1 2 cycles (c) Jan Madsen

  15. IC fabrics • IC is an interconnection of transistors following one of several possible styles – fabrics • The fabric defines how and when transistors are composed • ”the material of processors” • IC fabrics differ in terms of customizability and generality (c) Jan Madsen

  16. IC fabrics: Custom • Exact implementation of processor components • High NRE cost – mask set ~ 1M$ (c) Jan Madsen

  17. IC fabrics: Semicustom • Several semicustom fabrics • Library of standard cells • Cell arrays (sea-of-gates) • Most processing steps are pre manufactured (high volume) (c) Jan Madsen

  18. IC fabrics: Programmable • Set of interconnected modules • Set of modules programmed to implement different components • FPGA • Programmable logic modules, storage and interconnect (c) Jan Madsen

  19. Chips: Implementing IC fabric (c) Jan Madsen

  20. func if ... then ... else ... for { ... ..} Hardware/software codesign? • Many possible mappings • Processor may not exist yet! • Exploring the design space • Need to estimate (c) Jan Madsen

  21. Hardware/Software Codesign • Optimizing • Timing (high performance, hard deadlines) • Area (cost) • Power consumption • Flexibility • Reliability • ... • We will focus on timing (c) Jan Madsen

  22. func if ... then ... else ... for { ... ..} Processing element timing • Execution path • Control data dependent • Input data dependent • Function implementation • Component architecture • Compiler or synthesis (c) Jan Madsen

  23. å = × t ( F,pe ) (b ,pe ) c(b ) t j j i i pe pe I • bibasic block or program segment • tpe(bi,pej)execution time of bi on processing element pej • c(bi)execution frequency of bi • worst/best case timing bounds Formal execution path timing analysis b1 if ... b3 b2 else { ... } then ... for { ... ..} b4 (c) Jan Madsen

  24. (b ,pe ) + * * t + * * j i pe software + model + - - hardware + * + * - Formal execution path timing analysis b2 then ... (c) Jan Madsen

  25. PE D$ I$ SDRAM Flash RAM Memory models • Access time • Control overhead • Burst access (packets) • Cache • hit/miss time overhead • Based on execution history (c) Jan Madsen

  26. Advanced architectures • Modern high performance processors includes architectural features which complicates timing analysis • Dynamic instruction scheduling • Speculative execution • Though fast, it makes • the processor very power hungry • tight bounds on timing very difficult • Computation less predictable • Issues which are important for embedded systems (c) Jan Madsen

  27. processor ASIC Building sub-systems func if ... then ... else ... for { ... ..} • Initial codesign problem • Hardware/software partitioning • the LYCOS cosynthesis tool • Automatic partitioning from C (subset) and VHDL (single process) • Developed at DTU (c) Jan Madsen

  28. Architectural choices • Which processor should be selected and how fast should it be? • Which ASICtechnology should be chosen and how fast should the ASIC be? • How large an ASIC can we afford and which functions should it execute? • How should the processor and ASIC communicate? (c) Jan Madsen

  29. BB Specification Model SW HW Partitioning Model • Determines granularity and simplifying assumptions w.r.t. communication, HW sharing, etc (c) Jan Madsen

  30. SW HW Lib Lib t t S H SW HW Estimator a a Estimator S H t C Com Com a Lib Estimator C Estimation SW HW (c) Jan Madsen

  31. s(bi) sent data in bi r(bi) received datain bi c(bi) execution frequency of bi Communication time s(bi) and r(bi) determined by • data volume • Data encoding • Communication protocol Process communication b1 if ... b2 b4 else { send(...); receive(...);... } then ... for { ... ..} b3 (c) Jan Madsen

  32. Solving the Partitioning Problem SW HW 1 2 3 4 5 6 Just try all combinations... (c) Jan Madsen

  33. SW HW SW HW SW HW 1 1 1 2 2 2 3 4 3 3 4 4 5 5 5 6 6 6 7 Solving the Partitioning Problem Interleaved communication additive areas Parallel execution non-additive areas No communication interleaved exec. additive areas Knapsack Stuffing Large scale linear/nonlinear integer programming Heuristics needed! (c) Jan Madsen

  34. LYCOS Design Flow Specification Functional Require Translate Analysis CDFG SW SW Estim. Model HW Partitioning HW Estim. Model Comm. Comm. Estim. Model CDFG SW Comm. HW Synthesis Synthesis Synthesis Assembler SW/HW Netlist (c) Jan Madsen

  35. M P M M DSP M P CoP Building Systems • Platform architectures are heterogeneous • Different processing element types • Different interconnection networks and communication protocols • Different memory types • Different scheduling and synchronization strategies (c) Jan Madsen

  36. Managing HW platform complexity • Development of APIs to hide complexity from application programmer and improve portability • Specialized RTOS to control resource sharing and interfaces • aComplex multi-level HW/SW architecture (c) Jan Madsen

  37. application Software HW/SW Plattform CPU Timer Hardware Timer Periphery I/O Int Bus- CTRL Software architecture pe1 mem private application RTOS RTOS-APIs shared private private private drivers Cache Bus ce1 (c) Jan Madsen

  38. Platform design challenges • Integration • Design process integration • Heterogeneous component and language integration • Design space exploration and optimization • Verification (c) Jan Madsen

  39. Complex run-time interdependencies • Run-time dependencies of independent components via communication • Influence on timing and power • Need to handle resource sharing • Process/task scheduling • Communication scheduling • Scheduling strategies (static, dynamic, time or priority driven) PE PE CoP (c) Jan Madsen

  40. PE Interdependency example • Complex non-functional interdependencies • Periodic task executing on PE • Task writes to bus at the end of each periodic execution Short execution time ahigh bus load long execution time alow bus load Local decision on improving performance may impact the global system performance (c) Jan Madsen

  41. io router processor memory System-on-Chip challenge (c) Jan Madsen

  42. a c M M d b M Network-on-Chip • Multi-hop • Segmented communication • Concurrency • Multiple simultaneous communications (c) Jan Madsen

  43. a c M M d b M Network-on-Chip • Multi-hop • Segmented communication • Concurrency • Multiple simultaneous communications • Sharing • Quasi-simultaneous resource usage • Multiple communication events occupying some or all resources in an interleaved fashion (c) Jan Madsen

  44. Platform-based design platform design specification IP platform re-design Mapping re-configure New design paradigme ... (c) Jan Madsen

  45. thank you! (c) Jan Madsen

More Related