Achieving Real-Time Performance with Linux/Android or Windows CE

ESC 316 Achieving Real-Time Performance with Linux/Android or Windows CE Dave Stewart, PhD Director of Research and Systems Integration InHand Electronics dstewart@inhand.comwww.inhand.com

Outline • Are Linux/Android and WinCE Real-Time? • Priorities and Priority Inversion • Sleep, Pause, and Clock Drift • Soft and Hard Real-Time threads • ISRs, ISTs, and Aperiodic Servers • Real-Time Data Streaming • Interfacing Hard and Soft Real-Time Threads • Hard Real-Time using a Co-Processor

Are Linux/Android and WinCE Real-Time? • Yes! As they are designed, they have all the features that make any other RTOS real-time • However, some drivers and kernel or OAL layers have been written by desktop programmers, and use of this code “breaks” the real-time capabilities • By fixing, avoiding, or designing around those issues that affect real-time, it is possible to create both Hard and Soft real-time systems using these operating systems

Examples of Applications with Real-Time Requirements • Cell Phones and Communication • Multimedia Streaming • Graphical Dashboards • GUIs for Robotic Control • Data Acquisition with Real-Time Display • Remote Sensor Data Acquisition • Cable/TV/Satellite Boxes • Security and Surveillance Systems

Priorities and Priority Inversion • Arbitrarily bumping up priorities of “important” drivers is a primary cause of breaking real-time performance • Doing so will help the driver perform as needed, but at the cost of causing other real-time threads to fail to meet their requirements • Priorities are a system configuration issue • Drivers and threads should never define or hard-code their own priorities • Use a configuration manager or define all priorities in the registry • OS Scheduling Policy • Make sure Highest-Priority-First is selected • This is default in WinCE, select SCHED_FIFO in Linux

Priorities and Priority Inversion • Use Rate Monotonic Algorithm as a Guideline • While an accurate analysis would be nice, it is not necessary unless on the threshold of performance • Consider the general rates and approximate utilization of each driver or thread, and assign priorities accordingly • For example • If Ethernet IST is processing packets about once every 5 msec, and USB IST is doing so about once every 15 msec, then give Ethernet higher priority, even if USB is more critical in functionality

Priorities and Priority Inversion • Never hold locks for extended periods of time • Get the lock, perform actions quickly, then release • NEVER block while holding a lock, this is one of the key problems that leads to priority inversion and deadlock • Don’t disable ALL interrupts for more than a few microseconds at a time • A few microseconds to guarantee an atomic action is acceptable • However, to lock out for more than that will prevent real-time interrupt handlers or threads from pre-empting when necessary

Sleep, Pause, and Clock Drift • Delay • Busy wait, holds the processor • Use for delays in usec; never for more than 100+ usec • It must be based on hardware timer, not an empty loop • Sleep • “Wait for X time”; X is relative to ‘now’ • A quick means of providing delays • Only use it in non-real-time threads, since timing of it isn’t guaranteed, and it will generate a context switch • Using Sleep to manage time in periodic threads will produce clock drift • Even the best methods to try to minimize it still have minor drift • Pause • “Wait UNTIL X time”; X is absolute time • This is the right way to implement periodic threads

Sleep, Pause, and Clock Drift • A Pause() command is not a standard Linux or WinCE call • Unfortunately, very few RTOS provide this, it is not only a deficiency of these OS • Provider or your kernel might have added an API, or create your own API, based on one of the system’s hardware timers, or build your own

Processes vs. Threads • In general, you can create your entire real-time system as a single process • Create multiple threads within that process • Since all threads have same address space, mechanisms for IPC are simpler, have less overhead, and are more predictable • This RT process and all threads have high priority • All non-real-time applications are in other processes, and should have lower priority than any of the real-time threads • E.g. WinCE reserves a number of high-priority values specifically for this.

Use of Pause for Periodic Threads • Structure of periodic thread: nextstart=now()+period while (1) { wait until ‘start’ time do stuff nextstart += period } • This establishes an accurate time base • Even if thread is preempted or has variable execution time, each start time is synchronized precisely relative to its starting point

Implementation of Pause Command • Wait command can be standard OS calls. E.g. • sem_wait() in Linux • WaitForSingleObject() in WinCE • Method 1: Timer Interrupt • Create a timer interrupt that generates a periodic interrupt, then signals the event • If periodic thread is tied to a remote device, a periodic GPIO interrupt from that device can serve the same purpose • Method 2: Virtual Timer • kernel-provided method that leverages the system clock or hardware timer, to generate events at precise periodic intervals

Implementation of Pause Command • What if ‘nextstart’ time has already passed? • A mistake is to try to “catch up” by running it ASAP afterwards • If the time has passed, it means the system is already overloaded; Any attempts to catch up will further overload the system, and eventually break real-time performance, and in some cases bring it to a grinding halt • Rather, accept that the previous cycle of execution was late, and skip one or more cycles: while (nextstart < now()) nextstart += period;

Terminology • What is effect of missing or skipping cycles? • That completely depends on design and implementation • Hard Real-Time • Timing requirements must be met precisely • Failure to meet requirements leads to significant failure • catastrophic (system is destroyed) • damage to environment or injury to people • economic (product fails in the market place) • Soft Real-Time • There is some flexibility in the timing constraints • End-user requirements and quality of service are still met, even if some low-level timing elements are not perfect

Hard or Soft? Are these applications Hard or Soft real-time? Flight Control System Soft ? Bottling Plant Production Line Hard ? Anti-Lock Braking System Soft ? DVD Playback ? Hard Airline Reservation System ? Soft Internet Video ? Soft ? Cellular Phone RF Reception Hard My opinion; any surprises?

Hard Real-Time Threads • Must NEVER be late for the next cycle • The system has failed, time for emergency error handling • If ignoring the fact that a thread is late still allows the system to operate acceptably, then this is NOT a hard real-time thread; Instead, use soft real-time techniques to manage the thread • Follow guidelines in this presentation to ensure a hard real-time thread always executes as needed • Most systems that are called “hard” usually have only one or two critical hard real-time time • It is much easier to guarantee just those one or two threads always meet timing requirements, than trying to guarantee hard real-time for all threads

Soft Real-Time Threads • Recognize a timing constraint was missed, and take explicit action to correct for the error • For Example: • if video streaming, that is the time to drop a frame • if a control algorithm, use results from previous cycle twice in a row; Most control algorithms or analog sensor readings are robust enough to handle this • if buffers are present for data or messages, process limited number of items, and keep the rest for subsequent cycles

ISRs and ISTs • ISR: Interrupt Service Routine • High-priority interrupt, unscheduled • General rule is to process it and return as soon as possible • IST: Interrupt Service Thread • Scheduled actions for most interrupts • Priorities can be managed

ISRs and ISTsBad implementation, breaks real-time performance • Following is a very common flaw • This flaw will usually break the ability to achieve real-time performance on a WinCE or Linux platform • ISR: • disable interrupts • decide which IST to run • signal IST • IST: • process the interrupt • re-enable interrupts

ISRs and ISTsGood implementation for real-time performance • ISR • disable interrupts • if “quick” operation (e.g. < 100 usec) • perform the full operation • else • Mask out the individual interrupt only • signal IST • enable interrupts • IST • perform longer operations • unmask individual interrupt

ISRs and ISTsGood implementation example: Serial Driver • ISR • Handles individual bytes arriving over serial link • It only takes a few microseconds per byte; No need to signal IST for that • IST • When a complete packet arrives, or buffer reaches a critical threshold, then IST is signaled to provide more extensive processing

ISRs and ISTsGood implementation: Why is it better? • Reduces overhead • Especially if interrupts that are happening rapidly • Less frequent processing • Often, interrupts are arriving much faster than the data actually needs to be processed • By processing data less frequently, priority of IST can be lowered, making it easier to schedule • ISR remains at high priority • Critical events don’t get missed • Interrupts are never disabled for extended periods • This avoids priority version, and allows other real-time threads to execute when they need to

Real-Time Data Streaming • Multimedia Environments • Data streaming is often a key feature in embedded systems that use an OS such as WinCE or Linux • Data streaming can be real-time if important precautions are taken: • Selecting proper buffer management algorithms • Sizing buffers correctly • Choosing appropriate priorities for producers and consumers • Determine good rates for periodic execution • Transfer data periodically or in fixed sizes

Real-Time Data StreamingBuffer Management Buffer DeviceThread CommThread Typical Structure: Device thread is harder real-time than communication (comm) thread It executes more frequently with smaller data chunks Comm thread is often non-real-time medium e.g. Ethernet, WiFi, Serial, RF radio, USB, etc.

Real-Time Data StreamingBuffer Management Buffer DeviceThread CommThread Can Real-Time be achieved? Yes, if on average, Comm link throughput is greater than device thread’s data production

Real-Time Data StreamingBuffer Management Animated: please view in slideshow mode Buffer DeviceThread CommThread Reverse can also be real-time If Comm thread can retrieve data from non-real-time link fast enough to keep the buffer non-empty

Real-Time Data StreamingImportant design parameters • Buffer Locking • Never lock the buffer • Use a circular buffer, where the insert pointer can always be written by producer, and never block • If buffer fills up, don’t block waiting for it to clear; rather drop the data • If data cannot be dropped, then need larger buffer and/or faster consumer to meet real-time requirements

Real-Time Data StreamingImportant design parameters • Buffer Size • Compute maximum amount of data per second that can be produced • Determine the worst-case latency for data on the consuming end • Ensure the buffer is large enough to hold data for that worst-case latency scenario

Real-Time Data StreamingImportant design parameters • Device Thread • The device thread will generally execute at a fixed rate, such as once per frame of data • Due to compression or other variable parameters, the amount of data might be different on each iteration • This thread generally has a high priority • Comm Thread • The Comm thread can send data as it receives it, but with maximum thresholds set for both time and data size. • The frequency of this thread should be capped by not allowing it to run too fast • Best illustrated by example • Example follows

Real-Time Data StreamingCase study RawKernelSpaceBuffer RawUser-SpaceBuffer Com-pressedDataBuffer USBDriverThread DeviceThread CODECThread EthernetDriverThread Successful streaming of ~5Mbps from a USB data acquisition device to a PC via Ethernet CPU: PXA270, 520 MHz, 128MB RAM Thread Priorities set via RMA Approximation

Real-Time Data StreamingCase study RawKernelSpaceBuffer RawUser-SpaceBuffer Com-pressedDataBuffer USBDriverThread DeviceThread CODECThread EthernetDriverThread • Priority 80 (0=highest, 255=lowest) • Data read was 400 bytes per half millisecond • Rate = 2000 Hz • Execution time approx 140 usec (28% CPU)

Real-Time Data StreamingCase study RawKernelSpaceBuffer RawUser-SpaceBuffer Com-pressedDataBuffer USBDriverThread DeviceThread CODECThread EthernetDriverThread • Size: 24 KBytes • Sufficient to hold 60 msec of data

Real-Time Data StreamingCase study RawKernelSpaceBuffer RawUser-SpaceBuffer Com-pressedDataBuffer USBDriverThread DeviceThread CODECThread EthernetDriverThread • Reads data into user space buffer • Executes every 20 msec. Priority 100 • Execution time approx 4 msec (20% CPU) • Can miss two cycles, and still not lose data because of size of raw kernel buffer • Execution time rises to 6 msec (30%) for iteration after a skipped cycle where twice as much data is processed

Real-Time Data StreamingCase study RawKernelSpaceBuffer RawUser-SpaceBuffer Com-pressedDataBuffer USBDriverThread DeviceThread CODECThread EthernetDriverThread • 1 MByte • Can hold up to 2.5 seconds of data

Real-Time Data StreamingCase study RawKernelSpaceBuffer RawUser-SpaceBuffer Com-pressedDataBuffer USBDriverThread DeviceThread CODECThread EthernetDriverThread • Executes every 200 msec. Priority = 120 • Processes up to 250 KBytes of data per iteration in under 35 msec (17% CPU) • Since data is produced at 160 KBytes per 200 msec, even if a cycle is missed, this thread can catch up within 2 iterations

Real-Time Data StreamingCase study RawKernelSpaceBuffer RawUser-SpaceBuffer Com-pressedDataBuffer USBDriverThread DeviceThread CODECThread EthernetDriverThread • 1 MByte • Depending on compression ratio, can hold 3 to 10 seconds of data

Real-Time Data StreamingCase study RawKernelSpaceBuffer RawUser-SpaceBuffer Com-pressedDataBuffer USBDriverThread DeviceThread CODECThread EthernetDriverThread • Sends 4 kByte packet whenever that much data is available, but never more than 1 packet every 3 msec. Hence Priority=90 • When thread is “keeping up”then it would send 1 packet every 5 msec • Execution time approx 800 usec every 5 msec (16%)

Real-Time Data StreamingCase study RawKernelSpaceBuffer RawUser-SpaceBuffer Com-pressedDataBuffer USBDriverThread DeviceThread CODECThread EthernetDriverThread • If network glitches cause thread to fall behind, by executing at 4 kBytes every 3 msec, it slowly catches up • Size of buffer allows the thread to recover from second-long outages, as long as they don’t occur too often

Real-Time Data StreamingCase study summary • Utilization approximately 81% in worst-case • Priorities set via RMA • Even if some periods or executions times were approximate, by following the general rule, “more frequent means higher priority,” real-time performance was achieved • USB Thread was hard real-time • Had highest priority in system • Device App and CODEC threads were soft real-time • Buffering enabled falling behind without losing data • Execution time budgeted so that each has the opportunity to catch up a bit on each cycle if it falls behind; However, no need to fully catch up on each iteration • In theory, ethernet thread was non-real-time • But the design of it allowed it to function as soft real-time • Large enough buffer allows for smoothing even if there are significant network glitches • In case of extensive network delays, buffer would fill up, and the producer would drop data

Interfacing Hard and Soft Real-Time Threads • Ethernet Thread in Case Study is an Example • Hard RT thread must have higher priority • Since we want to use RMA, this also means the hard real-time thread will execute at a faster rate • Soft RT threads require buffering or other mechanism to allow it to miss cycles, and tolerate the glitches that could happen • Hard RT thread must never block on a lock held by soft RT thread

Hard Real-Time Using a Co-Processor • Useful for User-Oriented Embedded Systems • Industrial Control User Panel • Automated Teller Machine • Gas Pump • Vehicle Navigation System • Low-level functions need hard real-time • E.g. reading specific sensors at high speedorcontrolling an analog and digital outputs • Choose Linux and WinCE • for graphics, connectivity, and storage management

Hard Real-Time Using a Co-Processor • Trying to mix hard real-time threads executing in the kHz range with user-interface commands is possible, but creates an unnecessarily complex architecture • Instead, off-load the real-time threads to a small microcontroller • E.g. MSP430, PIC, 8051, Z180, etc. • Communicate with the chip serially, e.g. SPI

Hard Real-Time Using a Microcontroller SPI driver designed as hard-real-time, similar to USB driver in previous Case Study Display/ Touchscreen USB Network CE or Linux Embedded System microcontroller SPI hard real-time device

Summary • Linux and WinCE can be used in real-time systems • Both Soft and Hard real-time are possible • The same care is needed whether you use CE, Linux, or any other RTOS • Without proper design, other RTOS would fail the same way • Use of non-real-time mechanisms are fine if system is integrated correctly

Achieving Real-Time Performance with Linux/Android or Windows CE

Achieving Real-Time Performance with Linux/Android or Windows CE

Presentation Transcript

Linux-HA Release 2 Tutorial

GStreamer as multimedia framework in Android: a new alternative.

Solaris/Linux Performance Measurement and Tuning

ASE133: Performance Tuning of ASE with special emphasis on Linux

Advanced Performance Diagnostics for SQL in DB2 9.7

Overview of Real -Time PCR

Linux Kernel Internals

An Introduction to Linux

Linux

Real-Time PCR

LINUX

Introduction of Android