introduction to embedded systems n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction to Embedded Systems PowerPoint Presentation
Download Presentation
Introduction to Embedded Systems

Loading in 2 Seconds...

play fullscreen
1 / 56

Introduction to Embedded Systems - PowerPoint PPT Presentation


  • 181 Views
  • Uploaded on

Introduction to Embedded Systems . Rabie A. Ramadan rabieramadan@gmail.com http:// www.rabieramadan.org /classes/2014/embedded/ 2. Embedded microprocessor market. Categories of CPUs. RISC, DSP, and Multimedia processors. CPU mechanisms. Topics. Embedded processors account for

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Introduction to Embedded Systems


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Introduction to Embedded Systems Rabie A. Ramadan rabieramadan@gmail.com http://www.rabieramadan.org/classes/2014/embedded/ 2

    2. Embedded microprocessor market. Categories of CPUs. RISC, DSP, and Multimedia processors. CPU mechanisms. Topics

    3. Embedded processors account for Over 97% of total processors sold Sales expected to increase by roughly 15% each year Demand for Embedded Processors

    4. Performance Latency : the time required to execute an instruction from start to finish, Throughput : the rate at which instructions are finished Evaluating Processors

    5. At the program level, computer architects also speak of average performance or peak performance. Often calculated assuming that instruction throughput proceeds at its maximum rate and all processor resources are fully utilized Evaluating Processors

    6. Embedded system designers often talk about program performance in terms of worst-case (or sometimes best-case) performance: This is not simply a characteristic of the processor; it is determined for a particular program running on a given processor. Evaluating Processors

    7. Cost The purchase price of the processor. In VLSI design, cost is often measured in terms of the silicon area required to implement a processor, which is closely related to chip cost. Evaluating Processors

    8. Energy and power In modern processors, energy and power consumption must be measured for a particular program and data for accurate results. Evaluating Processors

    9. Predictability Important characteristic for embedded systems When designing real-time systems, we want to be able to predict execution time. More difficult to measure. Evaluating Processors

    10. Security An important characteristic of all processors, including embedded processors. Security is inherently unmeasurable because of the fact that we do not know of a successful attack on a system; this does not mean that such an attack cannot exist. Evaluating Processors

    11. Von Neumann Architecture Basic Computer Architecture Memory instruction data Input unit Output unit ALU Processor CU Reg.

    12. Bit level parallelism Within arithmetic logic circuits Instruction level parallelism Multiple instructions execute per clock cycle Memory system parallelism Overlap of memory operations with computation Operating system parallelism More than one processor Multiple jobs run in parallel Loop level Procedure level Levels of Parallelism

    13. Bit Level Parallelism Within arithmetic logic circuits Levels of Parallelism

    14. Instruction Level Parallelism (ILP) Multiple instructions execute per clock cycle Pipelining (instruction - data) Multiple Issue -Very long instruction word (VLIW) Levels of Parallelism

    15. Memory System Parallelism Overlap of memory operations with computation Levels of Parallelism

    16. Operating System Parallelism There are more than one processor Multiple jobs run in parallel Loop level Procedure level Levels of Parallelism

    17. Single Instruction stream - Single Data stream (SISD) Single Instruction stream - Multiple Data stream (SIMD) Multiple Instruction stream - Single Data stream (MISD) Multiple Instruction stream - Multiple Data stream (MIMD) Flynn’s Taxonomy

    18. Von Neumann Architecture Single Instruction stream - Single Data stream (SISD) Memory instruction data ALU CU Processor

    19. Single Instruction stream - Single Data stream (SISD) Single Instruction stream - Multiple Data stream (SIMD) Multiple Instruction stream - Single Data stream (MISD) Multiple Instruction stream - Multiple Data stream (MIMD) Flynn’s Taxonomy

    20. Instructions of the program are broadcast to more than one processor Each processor executes the same instruction synchronously, but using different data Used for applications that operate upon arrays of data Single Instruction stream - Multiple Data stream (SIMD) data PE data PE instruction CU Memory data PE data PE instruction

    21. Single Instruction stream - Single Data stream (SISD) Single Instruction stream - Multiple Data stream (SIMD) Multiple Instruction stream - Single Data stream (MISD) Multiple Instruction stream - Multiple Data stream (MIMD) Flynn’s Taxonomy

    22. Each processor has a separate program An instruction stream is generated for each program on each processor Each instruction operates upon different data Multiple Instruction stream - Multiple Data stream (MIMD)

    23. Shared memory Distributed memory Multiple Instruction stream - Multiple Data stream (MIMD)

    24. Distributed memory Each processor has its own local memory Message-passing is used to exchange data between processors Shared memory Single address space All processes have access to the pool of shared memory Shared vs Distributed Memory P P P P Bus Memory M M M M P P P P Network

    25. Processors cannot directly access another processor’s memory Each node has a network interface (NI) for communication and synchronization Distributed Memory M M M M P P P P NI NI NI NI Network

    26. Each processor executes different instructions asynchronously, using different data Distributed Memory instr data M CU PE data data data data data instr M CU PE Network data instr M CU PE data instr M CU PE

    27. Each processor executes different instructions asynchronously, using different data Shared Memory data CU PE data CU PE Memory data CU PE data CU PE instruction

    28. Uniform memory access (UMA) Each processor has uniform access to memory (symmetric multiprocessor - SMP) Non-uniform memory access (NUMA) Time for memory access depends on the location of data Local access is faster than non-local access Easier to scale than SMPs P P P P P P P P Bus Bus Memory Memory Shared Memory P P P P Bus Memory Network

    29. Making the main memory of a cluster of computers look as if it is a single memory with a single address space Shared memory programming techniques can be used Distributed Shared Memory

    30. Many general purpose processors GPU (Graphics Processor Unit) GPGPU (General Purpose GPU) Hybrid Multicore Systems Memory • The trend is: • Boardcomposed ofmultiple many core chipssharingmemory • Rack composedof multipleboards • A room full of these racks

    31. RISC vs. CISC---Instruction set style. Instruction issue width. Static vs. dynamic scheduling for multiple-issue machines. Scalar vs. vector processing. Single-threaded vs. multithreading. A single CPU can fit into multiple categories. Other axes of comparison

    32. Complex Instruction Set Computer “High level” Instruction Set Executes several “low level operations” Ex: load, arithmetic operation, memory store –  VAX, Intel X86, IBM 360/370, etc. RISC vs. CISC

    33. Features of CISC Small number of general purpose registers Instructions take multiple clocks to execute Few lines of code per operation

    34. Reduced Instruction Set Computer RISC is a CPU design that recognizes only a limited number of instructions Simple instructions Instructions are executed quickly MIPS, DEC Alpha, SUN Sparc, IBM 801 RISC vs. CISC

    35. “Reduced” instruction set Executes a series of simple instruction instead of a complex instruction Instructions are executed within one clock cycle Incorporates a large number of general registers for arithmetic operations to avoid storing variables on a stack in memory Pipelining = speed Features of RISC

    36. Instruction issue width important aspect of processor performance. Processors that can issue more than one instruction per cycle generally execute programs faster. They do so at the cost of increased power consumption and higher cost. Single issue versus Multiple issue

    37. Static scheduling instructions is determined when the program is written. Dynamic scheduling determines which instructions are issued at runtime. Superscalar is a common technique for dynamic instruction issue -Tomasulo static versus dynamic scheduling

    38. Embedded processors may be customized for a category of applications. Customization may be narrow or broad. We may judge embedded processors using different metrics: Code size. Energy efficiency. Memory system performance. Predictability. Embedded vs. general-purpose processors

    39. RISC processors often have simple, highly-pipelinable instructions Pipelines of embedded RISC processors have grown over time: ARM7 has 3-stage pipeline. ARM9 has 5-stage pipeline ARM11 has 8-stage pipeline. Embedded RISC processors

    40. ARM: ARM7 has in-order execution, and no memory management or branch prediction; ARM9 ARM11 has out of order execution, memory management, and branch prediction, MIPS: MIPS32 4K has 5-stage pipeline; 4KE family has DSP extension; 4KS is designed for security. PowerPC: PowerPC 400 series includes several embedded processors; Motorola and IBM offer superscalar versions of the PowerPC RISC processor families

    41. Embedded DSP Processors • Embedded DSP processors are optimized to perform DSP algorithms; speech coding, filtering, convolution, fast Fourier transforms, discrete cosine transforms

    42. AT&T DSP-16 was the first DSP it had an onboard multiplier and provided a multiply–accumulate instruction. dest = src1*src2 + src3, a common operation in digital signal processing. Based on Harvard-architecture with separate data and instruction memories. Data accesses could rely on consistent bandwidth from the memory, which is particularly important for sampled-data systems. Embedded DSP Processors- example

    43. Static: Use compiler to analyze program. Simpler CPU. Can’t depend on data values. Very Long Instruction Word (VLIW) Dynamic: Use hardware to identify opportunities. More complex CPU. Can make use of data values. Superscalar Parallelism extraction

    44. Widespread use in embedded systems provide instruction-level parallelism with relatively low hardware overhead. The execution unit includes a pool of function units connected to a large register file. the execution unit reads a packet of instructions—each instruction in the packet can control one of the function units in the machine. Very Long Instruction Word (VLIW)

    45. Large register file feeds multiple function units. Simple VLIW architecture E box Add r1,r2,r3; Sub r4,r5,r6; Ld r7,foo; St r8,baz; NOP Register file ALU ALU Load/store Load/store FU

    46. Clustered VLIW architecture • Register file, function units divided into clusters. Cluster bus Execution Execution Register file Register file

    47. Example 1 : Trimedia family of processors designed for use in video systems. Video algorithms often perform similar operations on several pixels at time. Very Long Instruction Word (VLIW)

    48. Example 2 : Texas Instruments C6x VLIW DSP Very Long Instruction Word (VLIW)

    49. Onboard program and a data RAM as well as standard devices and DMA. The processor core includes two clusters, each with the same configuration. Each register file holds 16 words. Each data path has eight function units: two load units, two store units, two data address units, and two register file cross paths. Very Long Instruction Word (VLIW)Example 2: Texas Instruments C6x VLIW DSP

    50. more than one instruction per clock cycle. Unlike VLIW processors, they check for resource conflicts on-the-fly to determine which combinations of instructions can be issued at each step. Superscalar processors are not as common in the embedded world. Used to some extent in embedded processors. Embedded Pentium is two-issue in-order. Some PowerPCs are superscalar Superscalar Processors