1 / 34

Advanced Embedded Systems

Advanced Embedded Systems. Lecture 11 Multiprocessors in Embedded Systems (1). Advanced Embedded Systems. Embedded multiprocessors: homogeneous or heterogeneous; A multiprocessor is made of: Processing elements; Memory blocks; Interconnection networks;. Advanced Embedded Systems.

isenberg
Download Presentation

Advanced Embedded Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Embedded Systems Lecture 11 Multiprocessors in Embedded Systems (1)

  2. Advanced Embedded Systems • Embedded multiprocessors: • homogeneous or • heterogeneous; • A multiprocessor is made of: • Processing elements; • Memory blocks; • Interconnection networks;

  3. Advanced Embedded Systems • Embedded multiprocessors vs. typical multiprocessors: • Different types of PEs: PEs with different features, PEs programmable and non-programmable; • Memory blocks with different sizes; private and shared memory blocks; • Specialized interconnection networks; • Both have to offer high performance but EMs must add: • Real time performance: scientific multiprocessors improve average performance at the expense of predictability; EMs must offer predictable performance; • EMs must frequently run at low energy and power levels; low power reduces heating problems and cost, while low energy consumption increases battery life; typical multiprocessors are less sensitive to power and energy consumption; • EMs must be cost-effective: they must provide high performance without excessive hardware; • Design techniques: • Heterogeneous multiprocessors are more energy efficient and cost effective than homogeneous multiprocessors; • Heterogeneous memory systems improve real time performance; • Networks-on-chip;

  4. Advanced Embedded Systems • The combination of high performance, low power and real time leads toward heterogeneous multiprocessors: • It is desirable to specialize the blocks of an EM: the processing elements, the memories and the interconnection network; • Specialization leads to lower power consumption; examples of operations needing specialized hardware: • Bit level operations: in a CPU, it requires too many registers; • Intensive input/ output operations: if data must be read, processed and written to meet a tight deadline, for example in an engine control; • Heterogeneity reduces power consumption because unnecessary hardware is removed; additional hardware is always necessary for generalizing functions; • Drawback: specialization increases communication; • Using multiple CPUs can increase real time performance; allocating time for critical processes on separate CPUs helps to meet deadlines; • Specialized memories and interconnections increases the predictability of the response time of a process;

  5. Advanced Embedded Systems • Embedded multiprocessors design techniques: • Design methodologies; • Modeling and simulation; • Multiprocessor design methodologies:

  6. Advanced Embedded Systems • The program used to design and evaluate the EM is called workload (benchmarks in computers); • Many such programs are not written for ESs (real time performance, low power, limited memory) and their use may lead to wrong decisions; a workload must be tailored to EMs requirements with platform-independent optimizations; • Next, platform-independent measurements must be performed for defining an architecture; examples are: dynamic instruction count and data access patterns; they show how close is the workload to the EM which must be designed; • An initial candidate architecture is delimitated; platform-dependent characteristics are measured and the architecture is evaluated; if the platform is not appropriate it is modified and new measurements are done; if it is appropriate, the blocks of the EM are designed; • The software is mapped onto the platform; during this phase, compilers and libraries may be useful; most of the optimizations are platform dependent; operations must be allocated to processing elements, data to memories and communications to the interconnection network;

  7. Advanced Embedded Systems • Multiprocessor modeling and simulation: • Most multiprocessor simulators are systems of communicating simulators; the component simulators are PEs, memory elements and interconnection networks; the simulator itself ensures the communication between those component simulators; • The multiprocessor simulator can be built using techniques of parallel computing: • Each component simulator is a process both in the multiprocessor simulator and in the host CPU’s operating system; • The operating system provides the abstraction necessary for multiprocessing: each simulator has its own state, just as each PE in the implementation has its own state; • The simulator uses the host’s computer communication mechanisms, such as semaphores, shared memory and so on, to manage the communication between the component simulators; • Simulators for classical multiprocessors assume that all the PEs are the same type; they must be adapted to heterogeneous multiprocessors which requires additional software;

  8. Advanced Embedded Systems • Multiprocessor architectures: • The ESs separated or • The ESs implemented on the same chip, known also as multiprocessor system-on-chip (MPSoC); • Philips Nexperia: MPSoC for digital video and television applications:

  9. Advanced Embedded Systems • It includes two processors: MIPS PR3940 RISC CPU running the real time operating system and Trimedia TM32 VLIW processor for media operations; • It includes a synchronous DRAM satisfying the requirements of the video memory; the memory controller is connected to the rest of the circuit through a bus; • The MIPS processor is connected to a fast bus and this one is connected to a slower bus for the low speed peripherals through a bridge; the TM32 processor has its own bus; • Various peripherals are implemented on the chip: a USB controller, 3 UARTs, 2 I2C interfaces, digital audio interfaces, general-purpose I/ O pins; • The circuit contains special-purpose function units and accelerators for media applications: • An image composition engine, a scale unit, a MPEG-2 video decoder, two video input processors that can be used to receive the NTSC and PAL broadcast standards, a drawing engine; • These units bring efficiency by off-loading some work from the CPUs;

  10. Advanced Embedded Systems • TI OMAP Multiprocessor • It was designed for mobile multimedia applications: camera phones, portable imaging devices and so forth; • The OMAP standard conforms to the OMAPI standard which defines hardware and software interfaces for multimedia multiprocessors; • The fig. shows the overall structure of the OMAP hardware/ software architecture; it is based on a RISC processor, an ARM9, and a DSP, a TI C55x; the two processors communicate through a shared memory;

  11. Advanced Embedded Systems • OMAP 5912 • It contains a frame buffer for video as a separate block of memory, distinct from the main data and program memory; the frame buffer is contained on-chip while the flash and SDRAM memories are off-chip; • There are 4 mailboxes, in hardware, for multiprocessor communications; two are writable by the ARM9 and two are writable by the C55x; all are readable by either processor; • Each processor has some dedicated I/ O devices; there are also some common devices accessible through a peripheral bridge;

  12. Advanced Embedded Systems • The components of an EM are: • Processing elements; • Memories; • Interconnection networks; • The processing elements perform the computations; a PE may run only one process or may run several processes; frequently, an EM uses different CPUs for implementing the PEs: programmable processors, hardwired processors, single-function blocks etc. • For determining the number of PEs and their type the following design methodology is recommended: • Analyze each application to determine the performance and power requirements of each process in the application; • Choose a processor type for each process, usual from a predetermined set of processor types; • Determine which process can share a CPU to determine the required number of PEs; • Software performance analysis can be used to determine how fast a process will run on a particular type of CPU; • Standard CPUs or configurable processors can be used;

  13. Advanced Embedded Systems • The memory system is a classical bottleneck in computing: the memories are slower than processors and, worse, processor clock rates are increasing much faster than memory cycle times decrease; • Traditional parallel memory systems • Used in classical multiprocessors; memories are homogeneous; • Each bank is separately addressable; • If there are n banks, n accesses can be performed in parallel, offering the peak access rate; it can be achieved only in particular cases, for example if the banks are accessed in the order 0, 1, 2, 3, 0, 1, 2, 3, … • In reality, the probability of a k long sequential access sequence is: , where λ is the probability of a nonsequential memory access (for example a branch);

  14. Advanced Embedded Systems • Heterogeneous memory systems: are preferred in EMs but can coexist with homogeneous memory systems; • HMS improve real time performance: • Common memories are good when we are concerned only by functionality and less when real time performance and predictability are desired; • If a memory block is shared by several PEs, they will contend for that memory; in general, one PE will have to wait for another PE to finish its access; in most cases it is not possible to predict when these conflicts will occur; • Avoiding conflicts can be guaranteed if only one, or a few, PEs access a memory, that is if a specialized memory for those PEs was foresight; • HMS contributes to reduce power consumption: • One component of the power consumption when a memory access is done, is given by the size of the memory block (because of the access time); • A heterogeneous memory can be built with smaller memory blocks, reducing the access time, thus the power consumption; • Energy per access also depends on the number of ports on the memory block, so reducing the number of the units that can access a given part of memory leads to a reduction in the energy consumption;

  15. Advanced Embedded Systems • Interconnection networks • Connect the PEs to the memories; • Terminology: • Client: a sender or receiver connected to a network; • Port: a connection to a network on a client; • Link: a connection between two clients; • Half-duplex and full-duplex: … • Topology: organization of the links; determines properties of the network; • Attributes for evaluating and comparing the INs are: • Throughput: the maximum available throughput from one node to another and the variations in data rates over time and the effect of those variations on network behavior are useful; • Latency: the amount of time it takes a packet to travel from a source to a destination is of interest; also, the best-case and worst-case latency are important when the latency varies; • Energy consumption: a typical measure is the amount of energy required to send a bit through the network; • Area: influences the cost and the dynamic energy consumption (the metal area of the wires); the total area is given by the metal area of the wires and the silicon area of the transistors;

  16. Advanced Embedded Systems • The simplest interconnection network is the bus • Small size, low performance, high energy consumption; • For estimating the performance it is assumed that the bus is operated by a master clock; • Considering an one word per bus transaction, the bus throughput is: words/ sec.; P = clock period, C = no. of clock cycles required for transaction overhead (addressing, etc.); • If the bus supports block transfers, then the block transactions of n word blocks is: , words/ sec. • The main part of the energy consumption is due to the dynamic energy consumption; this is determined by the capacitance that must be driven; • The capacitance of a bus is given by two components: the bus wires and the loads at the clients; if the number of clients is large, this capacitance becomes important; • The energy consumption may be high because of the length of the wires; • Bus is not recommended because it becomes easily saturated with traffic, so a small number of PEs can be connected;

  17. Advanced Embedded Systems • The crossbar: the most complex IN: • Is a fully connected network; it provides a path from every input port to every output port; ex. of a 4 x 4 crossbar: • Provides full connectivity to any combination of inputs and outputs; • Broadcast from an input to all outputs and multicast from an input to several selected outputs is possible; • The disadvantage is its size: for n inputs and n outputs n2 switches are necessary; however, because of the simplicity of the switches and their small sizes, crossbars for moderate number of inputs and outputs (for example 8 x 8 with words of reasonable width) can be built in a modern VLSI chip; a 10000 x 10000 crossbar for even 1 bit wide word is not reasonable;

  18. Advanced Embedded Systems • If the number of inputs is too large, for a given area of the crossbar, the solution is to use buffers; • Queues can be added to the inputs of the crossbar, several sources of traffic being connected to a queue; a queue controller is needed to decide the order in which the packets will enter in the queue and what to do when the queue is full; • Buffers can be added to switches; this will increase the physical size but also the flexibility in transfers;

  19. Advanced Embedded Systems • Mesh networks: • Every node is connected to all of its neighbors; • A mesh network is scalable in that a network of dimension n + 1 includes subnetworks that are meshes of dimension n; • The links are short but their number is high establishing multiple paths for data; • The shortest path between two nodes is equal to its Manhattan distance, which is the sum of the differences between the indexes of the source and destination nodes;

  20. Advanced Embedded Systems • Application-specific networks: are appropriate for ESs; • It is a topology matched on the characteristics of the application; • ASNs are less energy consuming than a regular network of equal overall performance; • Because most embedded applications perform several different tasks simultaneously, different parts of the architecture require different network bandwidth; • The network becomes more efficient, without sacrificing performance for a given application, by placing bandwidth where it is necessary; • Routing and flow control determines the cost and the performance of the network; • Routing determines the paths; routing algorithms can be deterministic or adaptive, they may drop packets occasionally or guarantee packet delivery; types of algorithms: circuit switching, store-and-forward, wormhole and virtual cut-through; • Flow control determines the way that links and buffers are allocated as packets move through the network;

  21. Advanced Embedded Systems • Networks on chips • NoCs are the interconnection networks for single-chip multiprocessors; • Each switch is connected to its four nearest neighbors with two unidirectional links and to a resource; • In a 60 nm CMOS technology: • A single chip could include a 10 x 10 mesh with switches and resources; • Each network link would have 256 data bits plus control signals; • Each switch has a queue at each input; • The selection logic at the outputs determines the order of the packets;

  22. Advanced Embedded Systems • Another example is the SPIN network: it is a scalable network with a fat tree topology; • This topology offers more bandwidth at higher levels in order to reduce contention; • The leaf nodes are the processing and memory elements; when a PE wants to send a message to another, the message goes up, in the tree, until a common ancestor node is reached, then it goes back down; • One advantage of the fat tree topology is that all the routing nodes use the same routing function this allowing to use the same routers in all the network; • The SPIN network uses two 32 bit data paths, one for each direction, for a full-duplex communication; a router can choose any of the several equivalent paths that are available at that moment to it;

  23. Advanced Embedded Systems • Design methodologies for NoCs were developed; ex.: a methodology for designing networks for QoS intense applications such as multimedia: • The application requirements are specified; • The performance required from the network is determined; • The topology is determined and the network is configured with PEs and memories; • The network is simulated to evaluate its actual performance; • The network may be modified based on the performance results;

  24. Advanced Embedded Systems • Physically distributed embedded systems and networks • Frequently used for cars, airplanes etc. • These systems are more loosely coupled than multiprocessors, they generally do not share memories; • The application is distributed over the PEs; • The distributed system must provide guaranteed real time behavior; • Reasons to build network based embedded systems: • To execute tasks near the events; ex.: an engine control may ask short time delays; • Data reduction: ex.: some initial signal processing on the data inputs for reducing its volume; the allocation of these operations to a dedicated processor will fasten the process and will reduce the load on the processor that uses the data for taking decisions; • Modularity: for easier design and assembling, for easier debugging (a verified module can be used to probe components in another part of the network), for fault tolerance; • The design of a distributed embedded system is an example of hardware/ software co-design since both the network topology design and the software running on the network nodes design must be thought together;

  25. Advanced Embedded Systems • Time-triggered architecture • TTA is a distributed architecture for real time control; it offers reliability for safety-critical systems and accuracy for high-rate physical processes; • It is different from other distributed architectures in that it takes time into account; • TTA represents time as a 64 bit value, with the three lower bytes meaning fractions of seconds and the five upper bytes meaning seconds; • Next fig. presents the communication network interface; it links the communications controller, which is the low-level interface and the host node, which is the TTA’s PE;

  26. Advanced Embedded Systems • The TTA can be implemented on bus and star topologies; • A bus based system uses replicated busses; they are passive to avoid components that may fail; • Each physical node is made by a node, two guardians and a bus transceiver; the guardians monitor the transmissions of the node;

  27. Advanced Embedded Systems • FlexRay • Is a second generation standard for automotive networks; it provides higher bandwidth and more abstract services than CAN; • It is based on the TTA; • Next fig. shows a block diagram of a generic FlexRay system: • The host run applications; • The host communicates with the communication controller, which provides high-level functions and with the low-level bus driver; • Bus guardians are nodes that monitors the behavior of the network and takes actions when the behavior is erroneous;

  28. Advanced Embedded Systems • FlexRay is organized around 5 levels of abstraction: • Physical level: defines the structure of connections; • Interface level: defines the physical connections; • Protocol engine: defines frame formats and communication nodes and services such as messages and synchronization; • Controller host interface: provides information on status, configuration, messages and control for the host layer; • Host layer: provides applications;

  29. Advanced Embedded Systems • FlexRay has an active star topology (the router node is active): • A node may be connected to more than one star to provide redundant connections;

  30. Advanced Embedded Systems • Data is coded with the differential non-return-to-zero scheme; • The transmission rate is 10 Mbps, independent of the length of the link; arbitration on bits is not done, so arbitration contention does not limit the link’s length; • Data is encapsulated in frames; a frame’s form is: • Frame ID: identifies the frame’s slot; its value Є {0, …, 2047}; • Payload length: gives the number of 16 bit words in the payload section; • Header CRC: provides error correction; • Cycle count: enumerates the protocol cycles; this information is used within the protocol engines to guide clock synchronization; • Data field: provides payload from 0 – 254 bytes in size; • Trailer CRC: provides additional error correction;

  31. Advanced Embedded Systems • There are 2 timing structures: static and dynamic segment; • The static segment is scheduled using a TDMA discipline; • Static segments are divided into slots of fixed end equal length; all the slots are used in every segment in the same order; • The static segment is split across two channels; synchronization frames are provided on both channels; messages can be sent on either one or both channels; less critical messages are sent on only one channel; the slots are occupied by messages with ascending frame ID numbers; • The dynamic segment: • Provides bandwidth for asynchronous, unpredictable communication; the slots are arbitrated using a deterministic mechanism; • The dynamic segment has two channels and each of which can have its own message queue;

  32. Advanced Embedded Systems • Because of its complex timing, FlexRay must be started properly: • The operation begins with a wake-up procedure that turns on the nodes; • Then a coldstart that initiates the TDMA process is done; • At least two nodes must have the possibility to perform a coldstart; • FlexRay has a global time source to synchronize messages: • The global time is synthesized by the clock synchronization process from the nodes’ clocks using distributed timekeeping algorithms; • The bus guardians: • Prevent the nodes from transmitting outside their schedules; • It is not mandatory to include a bus guardian in a FlexRay system, it is only recommended; • The bus guardian sends an enable signal to every node in the system it guards; by removing the enable signal, the transmission will be stopped; • The bus guardian uses its own clock to watch the bus operation; if it detects a message coming at the wrong time, it will remove the enable signal; • The controller host interface provides services to the host, regarding: status, control (interrupt service, startup), data (buffering messages) and configuration;

  33. Advanced Embedded Systems • Aircraft networks: • The aircraft area is somehow similar to the automotive area but with more severe requirements: • The weight is a more sensitive parameter than in the case of cars; • Planes must have more complex control because they are driven in 3D; • Most aspects of aircraft design, operation and maintenance are regulated; • Aircraft electronics is divided into 3 categories: • Instrumentation; • Navigation/ communication; • Control; • Instrumentation (such as the altimeter or artificial horizon) use mechanical, pneumatic or hydraulic methods; the electronics has to display the data and send them to other systems; • Navigation/ communication: is done by radio, and is regulated; communication is done by voice or data; digital electronics control the radios and display navigation data, such as moving maps that integrate navigation data onto a map; • Control: operate the engines and flight surfaces (such as aileron, elevator, rudder)

  34. Advanced Embedded Systems • Generally, aircraft use different types of networks, such as: • Control networks: they perform hard real time tasks for instrumentation and control; • Management networks: they control noncritical devices; they can use nonguaranteed modes, such as Ethernet, to improve average performance and limit weight; • Passenger networks: ex.: Internet service to passengers; a satellite link is used; these networks are separated from the operation networks by firewalls; • Aircraft data networks are governed by several standards; ex. ARINC 664: • It is based on Ethernet, providing higher bandwidth than previous aircraft data networks and allows aircraft manufacturers to use classical network components; • However, the basic Ethernet is used with protocols and architectures that provide the needed real-time performance and reliability; • It divides the aircraft network into 4 domains, with firewalls between them: • The flight deck network for real time control; • A network for equipment supplied by outside vendors; • A subnetwork for secondary operations, such as inflight entertainment; • The passenger subnetwork which provides Internet access to passengers.

More Related