Introduction to Embedded Systems - PowerPoint PPT Presentation

introduction to embedded systems n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Embedded Systems PowerPoint Presentation
Download Presentation
Introduction to Embedded Systems

play fullscreen
1 / 56
Introduction to Embedded Systems
191 Views
Download Presentation
saxton
Download Presentation

Introduction to Embedded Systems

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Introduction to Embedded Systems Rabie A. Ramadan rabieramadan@gmail.com http://www.rabieramadan.org/classes/2014/embedded/ 2

  2. Embedded microprocessor market. Categories of CPUs. RISC, DSP, and Multimedia processors. CPU mechanisms. Topics

  3. Embedded processors account for Over 97% of total processors sold Sales expected to increase by roughly 15% each year Demand for Embedded Processors

  4. Performance Latency : the time required to execute an instruction from start to finish, Throughput : the rate at which instructions are finished Evaluating Processors

  5. At the program level, computer architects also speak of average performance or peak performance. Often calculated assuming that instruction throughput proceeds at its maximum rate and all processor resources are fully utilized Evaluating Processors

  6. Embedded system designers often talk about program performance in terms of worst-case (or sometimes best-case) performance: This is not simply a characteristic of the processor; it is determined for a particular program running on a given processor. Evaluating Processors

  7. Cost The purchase price of the processor. In VLSI design, cost is often measured in terms of the silicon area required to implement a processor, which is closely related to chip cost. Evaluating Processors

  8. Energy and power In modern processors, energy and power consumption must be measured for a particular program and data for accurate results. Evaluating Processors

  9. Predictability Important characteristic for embedded systems When designing real-time systems, we want to be able to predict execution time. More difficult to measure. Evaluating Processors

  10. Security An important characteristic of all processors, including embedded processors. Security is inherently unmeasurable because of the fact that we do not know of a successful attack on a system; this does not mean that such an attack cannot exist. Evaluating Processors

  11. Von Neumann Architecture Basic Computer Architecture Memory instruction data Input unit Output unit ALU Processor CU Reg.

  12. Bit level parallelism Within arithmetic logic circuits Instruction level parallelism Multiple instructions execute per clock cycle Memory system parallelism Overlap of memory operations with computation Operating system parallelism More than one processor Multiple jobs run in parallel Loop level Procedure level Levels of Parallelism

  13. Bit Level Parallelism Within arithmetic logic circuits Levels of Parallelism

  14. Instruction Level Parallelism (ILP) Multiple instructions execute per clock cycle Pipelining (instruction - data) Multiple Issue -Very long instruction word (VLIW) Levels of Parallelism

  15. Memory System Parallelism Overlap of memory operations with computation Levels of Parallelism

  16. Operating System Parallelism There are more than one processor Multiple jobs run in parallel Loop level Procedure level Levels of Parallelism

  17. Single Instruction stream - Single Data stream (SISD) Single Instruction stream - Multiple Data stream (SIMD) Multiple Instruction stream - Single Data stream (MISD) Multiple Instruction stream - Multiple Data stream (MIMD) Flynn’s Taxonomy

  18. Von Neumann Architecture Single Instruction stream - Single Data stream (SISD) Memory instruction data ALU CU Processor

  19. Single Instruction stream - Single Data stream (SISD) Single Instruction stream - Multiple Data stream (SIMD) Multiple Instruction stream - Single Data stream (MISD) Multiple Instruction stream - Multiple Data stream (MIMD) Flynn’s Taxonomy

  20. Instructions of the program are broadcast to more than one processor Each processor executes the same instruction synchronously, but using different data Used for applications that operate upon arrays of data Single Instruction stream - Multiple Data stream (SIMD) data PE data PE instruction CU Memory data PE data PE instruction

  21. Single Instruction stream - Single Data stream (SISD) Single Instruction stream - Multiple Data stream (SIMD) Multiple Instruction stream - Single Data stream (MISD) Multiple Instruction stream - Multiple Data stream (MIMD) Flynn’s Taxonomy

  22. Each processor has a separate program An instruction stream is generated for each program on each processor Each instruction operates upon different data Multiple Instruction stream - Multiple Data stream (MIMD)

  23. Shared memory Distributed memory Multiple Instruction stream - Multiple Data stream (MIMD)

  24. Distributed memory Each processor has its own local memory Message-passing is used to exchange data between processors Shared memory Single address space All processes have access to the pool of shared memory Shared vs Distributed Memory P P P P Bus Memory M M M M P P P P Network

  25. Processors cannot directly access another processor’s memory Each node has a network interface (NI) for communication and synchronization Distributed Memory M M M M P P P P NI NI NI NI Network

  26. Each processor executes different instructions asynchronously, using different data Distributed Memory instr data M CU PE data data data data data instr M CU PE Network data instr M CU PE data instr M CU PE

  27. Each processor executes different instructions asynchronously, using different data Shared Memory data CU PE data CU PE Memory data CU PE data CU PE instruction

  28. Uniform memory access (UMA) Each processor has uniform access to memory (symmetric multiprocessor - SMP) Non-uniform memory access (NUMA) Time for memory access depends on the location of data Local access is faster than non-local access Easier to scale than SMPs P P P P P P P P Bus Bus Memory Memory Shared Memory P P P P Bus Memory Network

  29. Making the main memory of a cluster of computers look as if it is a single memory with a single address space Shared memory programming techniques can be used Distributed Shared Memory

  30. Many general purpose processors GPU (Graphics Processor Unit) GPGPU (General Purpose GPU) Hybrid Multicore Systems Memory • The trend is: • Boardcomposed ofmultiple many core chipssharingmemory • Rack composedof multipleboards • A room full of these racks

  31. RISC vs. CISC---Instruction set style. Instruction issue width. Static vs. dynamic scheduling for multiple-issue machines. Scalar vs. vector processing. Single-threaded vs. multithreading. A single CPU can fit into multiple categories. Other axes of comparison

  32. Complex Instruction Set Computer “High level” Instruction Set Executes several “low level operations” Ex: load, arithmetic operation, memory store –  VAX, Intel X86, IBM 360/370, etc. RISC vs. CISC

  33. Features of CISC Small number of general purpose registers Instructions take multiple clocks to execute Few lines of code per operation

  34. Reduced Instruction Set Computer RISC is a CPU design that recognizes only a limited number of instructions Simple instructions Instructions are executed quickly MIPS, DEC Alpha, SUN Sparc, IBM 801 RISC vs. CISC

  35. “Reduced” instruction set Executes a series of simple instruction instead of a complex instruction Instructions are executed within one clock cycle Incorporates a large number of general registers for arithmetic operations to avoid storing variables on a stack in memory Pipelining = speed Features of RISC

  36. Instruction issue width important aspect of processor performance. Processors that can issue more than one instruction per cycle generally execute programs faster. They do so at the cost of increased power consumption and higher cost. Single issue versus Multiple issue

  37. Static scheduling instructions is determined when the program is written. Dynamic scheduling determines which instructions are issued at runtime. Superscalar is a common technique for dynamic instruction issue -Tomasulo static versus dynamic scheduling

  38. Embedded processors may be customized for a category of applications. Customization may be narrow or broad. We may judge embedded processors using different metrics: Code size. Energy efficiency. Memory system performance. Predictability. Embedded vs. general-purpose processors

  39. RISC processors often have simple, highly-pipelinable instructions Pipelines of embedded RISC processors have grown over time: ARM7 has 3-stage pipeline. ARM9 has 5-stage pipeline ARM11 has 8-stage pipeline. Embedded RISC processors

  40. ARM: ARM7 has in-order execution, and no memory management or branch prediction; ARM9 ARM11 has out of order execution, memory management, and branch prediction, MIPS: MIPS32 4K has 5-stage pipeline; 4KE family has DSP extension; 4KS is designed for security. PowerPC: PowerPC 400 series includes several embedded processors; Motorola and IBM offer superscalar versions of the PowerPC RISC processor families

  41. Embedded DSP Processors • Embedded DSP processors are optimized to perform DSP algorithms; speech coding, filtering, convolution, fast Fourier transforms, discrete cosine transforms

  42. AT&T DSP-16 was the first DSP it had an onboard multiplier and provided a multiply–accumulate instruction. dest = src1*src2 + src3, a common operation in digital signal processing. Based on Harvard-architecture with separate data and instruction memories. Data accesses could rely on consistent bandwidth from the memory, which is particularly important for sampled-data systems. Embedded DSP Processors- example

  43. Static: Use compiler to analyze program. Simpler CPU. Can’t depend on data values. Very Long Instruction Word (VLIW) Dynamic: Use hardware to identify opportunities. More complex CPU. Can make use of data values. Superscalar Parallelism extraction

  44. Widespread use in embedded systems provide instruction-level parallelism with relatively low hardware overhead. The execution unit includes a pool of function units connected to a large register file. the execution unit reads a packet of instructions—each instruction in the packet can control one of the function units in the machine. Very Long Instruction Word (VLIW)

  45. Large register file feeds multiple function units. Simple VLIW architecture E box Add r1,r2,r3; Sub r4,r5,r6; Ld r7,foo; St r8,baz; NOP Register file ALU ALU Load/store Load/store FU

  46. Clustered VLIW architecture • Register file, function units divided into clusters. Cluster bus Execution Execution Register file Register file

  47. Example 1 : Trimedia family of processors designed for use in video systems. Video algorithms often perform similar operations on several pixels at time. Very Long Instruction Word (VLIW)

  48. Example 2 : Texas Instruments C6x VLIW DSP Very Long Instruction Word (VLIW)

  49. Onboard program and a data RAM as well as standard devices and DMA. The processor core includes two clusters, each with the same configuration. Each register file holds 16 words. Each data path has eight function units: two load units, two store units, two data address units, and two register file cross paths. Very Long Instruction Word (VLIW)Example 2: Texas Instruments C6x VLIW DSP

  50. more than one instruction per clock cycle. Unlike VLIW processors, they check for resource conflicts on-the-fly to determine which combinations of instructions can be issued at each step. Superscalar processors are not as common in the embedded world. Used to some extent in embedded processors. Embedded Pentium is two-issue in-order. Some PowerPCs are superscalar Superscalar Processors