Embedded Systems in Silicon TD5102 Introduction and overview - PowerPoint PPT Presentation

embedded systems in silicon td5102 introduction and overview n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Embedded Systems in Silicon TD5102 Introduction and overview PowerPoint Presentation
Download Presentation
Embedded Systems in Silicon TD5102 Introduction and overview

play fullscreen
1 / 83
Embedded Systems in Silicon TD5102 Introduction and overview
92 Views
Download Presentation
swann
Download Presentation

Embedded Systems in Silicon TD5102 Introduction and overview

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Embedded Systems in SiliconTD5102Introduction and overview Henk Corporaal http://www.ics.ele.tue.nl/~heco/courses/EmbSystems Technical University Eindhoven DTI / NUS Singapore 2005/2006

  2. Contents • Trends • Platforms • Application mapping • Design flow • Summary H.C. TD5102

  3. Observation 1:The 3 Cs • Convergence of 3 Cs computers, communications and consumer electronics • The computer enters the 3rd fase computing power - networking - intelligent processing • The world is one network wherever, whenever, all information and communication available We get a smart environment H.C. TD5102

  4. System Behaviour Structure Algorithm R/T Logic circuit Physical Observation 2: Current design practise Y-Chart (Gajski-Kuhn) • Design Flow is path in Y chart • Till RT-level largely manual flow H.C. TD5102

  5. System people Task Task Task Paper spec vhdl C verilog ASM Hardware people Software people Integration Observation 3: Informal system specification H.C. TD5102

  6. complexity Process technology + 58% 103 HW gap 102 HW design productivity +21 % SW gap 101 SW productivity + 8 % 4 8 12 16 year Observation 4: design productivity • Yes, we can fabricate the ICs, but … • Can we design them ? • Can we program them ? H.C. TD5102

  7. Load (Sequence: weather, VO1, binary shape, 10Hz, 112 kbit/s, QCIF) 100 % Factor 2 75 % 50 % 25 % 0 % 0 50 100 150 200 250 300 Frame (IPPP ...) Rel. CPU-load for 15 fps 1200% 1000% 800% Order of Magnitude 600% 400% 200% 0% Obervation 5:More dynamic applications * Video P. Kuhn, G. Diebel, “Complexity Analysis of the MPEG-4 VM 8.0,” ISO/IEC JTC1/SC29/WG11/MPEG97/m2862, Fribourg, October 1997 * 3D H.C. TD5102

  8. Processor-Memory Performance Gap:(grows 50% / year) Observation 6: Memory problem Performance µProc: 55%/year 1000 CPU 100 “Moore’s Law” 10 DRAM: 7%/year DRAM 1 1980 1985 1990 1995 2000 Time [Patterson] H.C. TD5102

  9. What do we learn from these observations? We need: • Short Time-to-Market • reuse • short design time • Flexible solution • programmability • reconfigurability • Scalability • Low power • Low cost • QoS control At sufficient performance ! H.C. TD5102

  10. Solution ? • Platforms • HW and SW IP reuse • Standardization (interfaces) • QoS (quality of service) hooks • Advanced Design Flow for Platforms • Raise abstraction level • Tool support • Modeling of Power, Cost, Performance • Predictable design H.C. TD5102

  11. Lecture 1: Introduction • Trends • Platforms • Application mapping • Design flow • Summary H.C. TD5102

  12. What is a platform? A platform is a generic, but domain specific information processing (sub-)system In future available as single chip (SoC), or package (SiP) H.C. TD5102

  13. What is a platform? • HW properties: • One or more programmable processors • Advanced memory organization • Programmable communication network • I/O (highly domain dependent) • Possible extra HW features: • Reconfigurable logic • Domain specific accelerators H.C. TD5102

  14. What is a platform? • SW components: • Standardized RTOS • Proper tooling for platform system design • Compilers, Models, Exploration, Debugging, Simulation, … • Possible extra SW features • Middleware layer on top of OS for features like: • QoS • Domain specific protocols • Domain specific SW interfaces • Control reconfigurable logic • Library components • Distributed / Active network processing • Billing • Security H.C. TD5102

  15. Philips Nexperia Example Platform: Philips Nexperia Available in the Billion Transistor Era • E.g. TI OMAP, Sony Cell, Philips Nexperia, TRIPS, Xilinx Virtex-4 Pro, … H.C. TD5102

  16. Future platforms Example: Smart Networked Devices active packets Virtual Machine Protocols Multimedia (MPEG 21) Network OS library accelerator hardware reconfig. hardware programmable hardware radio H.C. TD5102

  17. Future platform: architecture concept Reconfigurable HW blocks Reconfigurable HW blocks CPUs Accelerators CPUs Accelerators Reconfigurable HW blocks Accelerators CPUs Communication network Memory Memory I/O Level 0 Communication network Level 1 Communication network I/O Level N Memory H.C. TD5102

  18. NoC realization Future platforms Network interface On-chip Network IP core • IP - Isles: • 32 RISC microprocessor ~ 20 Kgates • MPEG decoding ~ 100 Kgates • Wavelet filtering ~ 40 Kgates • SRAM • DRAM • FPGA block H.C. TD5102

  19. Lecture 1: Introduction • Trends • Platforms • Application mapping • Design flow • Summary H.C. TD5102

  20. Platform and platform design Applications SDT system design technology Design technology Platform PDT platform design technology Enabling technologies H.C. TD5102

  21. What is the system designers problem ? Idea Specification Implementation Find for an application (idea/specification) an efficient mapping/implementation on a given realization space, under given constraints (cost, P, E, T, E*D, Throughput, #pins, ..) H.C. TD5102

  22. Processor datapath Data Memory r0 Function Unit(s) r1 Function Unit(s) Load- Store Unit r2 Register file Instruction Memory Decode logic Instruction register Processor control A (single) processor: how does it look inside? H.C. TD5102

  23. b a 2 * * d + + z y e f - + r x Data Dependence Graph (DDG) Mapping: placing operations in space and time d = a * b; e = a + d; f = 2 * b + d; r = f – e; x = z + y; H.C. TD5102

  24. cycle 1 * 2 * 3 + 4 + 5 - 6 + How to map these operations? • Architecture 1: • One Function Unit • All operations single cycle latency b a 2 * * d + + z y e f + - x r H.C. TD5102

  25. b a 2 * * d Mul Add-sub + + cycle z 1 y * + e f + 2 * + - 3 x + r 4 - 5 6 How to map these operations? • Architecture 2: • One Add-Sub and one Mul unit • All operations single cycle latency H.C. TD5102

  26. b a 2 * * d Mul Add-sub + + cycle z 1 y * + e f + 2 - 3 x * + r 4 5 + 6 - How to map these operations? • Architecture 3: • One Add-sub and one Mul unit • Add/Sub 1 cycle, Mul 2 cycles H.C. TD5102

  27. x Pareto curve (solution space) x x x T execution x x Specific architecture and code schedule x x x x x x x x x x x x x x x x x x x x x x x x x x 0 Cost There are many mapping solutions Let S be the solution space containing solutions x = (xi), then: x = Pareto point  x  S, and  y  S i xi < yi H.C. TD5102

  28. Can we do better? Yes !! • Much better !! • transforming the specification • a different architecture • a different mapping • speculative execution • …… be creative ……….. H.C. TD5102

  29. + + + + + + Transforming the specification (1) Example: tree height reduction Based on associativity of + operation a + (b + c) = (a + b) + c H.C. TD5102

  30. 1 b y z a << + - x r Transforming the specification (2) r = f – e = 2*b + d – (a + d) = 2*b – a; x = z + y; d = a * b; e = a + d; f = 2 * b + d; r = f – e; x = z + y; H.C. TD5102

  31. + + + Changing the architecture: adding more complex units: + + + 4-input adder why is this faster? H.C. TD5102

  32. Changing the architecture: adding more complex units In the extreme case put everything into one unit! Spatial mapping - no control flow H.C. TD5102

  33. Control Flow Graph (CFG) -a- cond? -b- -c- -d- More complex control flow Program part: -a- ; If cond Then -b- Else -c- ; -d- ; H.C. TD5102

  34. Mapping the CFG example: 3 options: what's the best? -a- br c -a- br b -a- br c -b- jmp d -c- jmp d -b- -b- -c- -d- -d- -d- -c- jmp d H.C. TD5102

  35. Why not removing the control flow ? H.C. TD5102

  36. If conversion shortens the schedule -a- br c -a- -b- jmp d cond -b- !cond -c- -c- -d- -d- Using guarded instructions like: r3: add r1,r2,r5; !r3: mul r4,r5,#3 H.C. TD5102

  37. Speculative execution makes it even shorter! -a- br c -a- -b- -c- -b- jmp d -d- -c- -d- Why not executing -d- in parallel? H.C. TD5102

  38. However: Real life much more complex E.g.: MPEG-4 : multimedia Huge requirements: > 10 GOP/s > 6 GB/s > 10 MB storage Software specification: - more than 200 000 lines C - hundreds of files - written by approx. 80 teams H.C. TD5102

  39. Nowadays implementations: - small images - decoding only - not real-time - several W - single task - limited dynamism Can we handle this? Wanted features: - large images (HDTV) - encoding and decoding - real-time - 100 mW (mobile) - multiple tasks - dealing with dynamism H.C. TD5102

  40. Lecture 1: Introduction • Trends • Platforms • Application mapping • Design flow • Summary H.C. TD5102

  41. Embedded system design How to map your application graph A(L,A,D) to hardware graph (L,N,C) L: design level (e.g. architecture, implementation or realization level) A: application components (e.g. tasks, operations, data structures) D: dependences between application components N: hardware components (e.g. processors, ASICs, FPGA,memories) C: connections between hardware components H.C. TD5102

  42. Abstraction levels Level specification System specification level Inter-level transformation: languages: Level 0: Requirements English Idea Is modeled by ES/RT-UML, Esterel, SDL Level 1: Architecture Is implemented by C++, JAVA, Level 2: Implementation C, VHDL, SystemC Compiles into Machine code, Level 3: Realization Hardware modules Exploration search area H.C. TD5102

  43. Design space exploration Level n-1 Design point Cost LT(n-1,n) Exploration at level n Exploration search area Realization global optimum space Exploration search area Design transformation H.C. TD5102

  44. Design space exploration framework- another Y-chart H.C. TD5102

  45. Design flow steps and constraints idea high abstraction level Refinement steps Architecture / Platform constraints Transformation low abstraction level realization H.C. TD5102

  46. Step n Step n+1 Step n Step n+1  Step n+1 Step n In which order should we perform the steps? Decision trees H.C. TD5102

  47. Well-known phase ordering examples • Concurrency versus Data management • e.g. loop partitioning versus array partitioning for a multiprocessor • Scheduling versus Register allocation • Logic synthesis versus Placement and Routing H.C. TD5102

  48. Rule of thumb! • Perform steps with biggest impact first • Biggest impact: • depends on your interest (= cost function) • min. E, P, E*D, D, Area, Npins, ... H.C. TD5102

  49. J c o l u m n s I r o w s Phase ordering example:Why fix data storage/transfer before concurrency management issues? Recursive image processing algorithm on local neighborhoods: (for i : 0 .. I-1 ) :: (for j : 0 .. J-1 ) :: img[i][j]= f(img[i][j-k], old_img[i][j]); H.C. TD5102

  50. J c o l u m n s 2 I 14.4mm (0.7um) r o w s Why fix data storage/transfer before concurrency mngnt issues? • Unrolling outerloop (i) M times: • needed M J-word FIFOs (image lines) • M data paths H.C. TD5102