Applying a Flexible Middleware Scheduling Framework to Optimize Distributed RT/Embedded Systems

Applying a Flexible Middleware Scheduling Framework to Optimize Distributed RT/Embedded Systems Christopher D. Gill Center for Distributed Object Computing Department of Computer Science Washington University, St. Louis, MO cdgill@cs.wustl.edu Friday, March 23, 2001

Christopher D. Gill • Flexible Middleware Scheduling Framework • Gill, Levine, Schmidt, “The Design and Performance of a Real-Time CORBA Scheduling Service” Real-Time Systems, 20:2, March 2001 • Early use: ASFD, ASTD I demonstrations (AFRL, Boeing) August, December 1999 • TAO, WSOA, SEC, ASTD II Bold Stroke infrastructure (AFRL, DARPA, Boeing) - now • Release as a Separate Open-Source Framework Kokyu – summer 2001 Primary Contributions • Customized Scheduling Strategies and Optimizations for Multi-Layer Integrated Middleware Resource Management • Doerr, Venturella, Jha, Gill, and Schmidt, "Adaptive Scheduling for Real-time, Embedded Information Systems”, DASC, October 1999 • Loyall, Gossett, Gill, et al., “Comparing & Contrasting Adaptive Middleware Support in Wide-Area and Embedded Distributed Object Applications”, ICDCS, April 2001 • Gill, Cytron, Schmidt, “Middleware Scheduling Optimization Techniques for Distributed Real-Time and Embedded Systems”, submitted to the Workshop on Optimization of Middleware and Distributed Systems (ACM SIGPLAN), June 2001 • WSOA, SEC, ASTD II Bold Stroke infrastructure (Boeing) – now • TAO, Kokyu – summer 2001 • Real-Time Adaptive Metrics and Visualization Infrastructure • Gill, Levine, O'Ryan, and Schmidt, "Distributed Object Visualization for Sensor-Driven Systems”, DASC, October 1999 • Gill and Levine, "Quality of Service Management for Real-Time Embedded Information Systems”, DASC, October 2000 • WSOA, SEC, ASTD II Bold Stroke infrastructure (Boeing) – now • TAO, Kokyu – summer 2001

Historically, mission-critical DRE apps were built directly atop the hardware & OS • Tedious, error-prone, & costly over lifecycles COTS middleware (e.g. ACE+TAO) is being leveraged to lower cost and cycle times in real-world mission-critical DRE applications • Avionics Mission Computing (Boeing) • Medical Information Systems (Siemens Med) • Satellite Control (LMCO COMSAT) • Telecommunications (Motorola, Lucent) • Missile Systems (LMCO Sanders) • Steel Manufacturing (Siemens ATD) • Beverage Bottling Lines (Krones AG) Middleware Scheduling for Mission-Critical Distributed Real-Time & Embedded (DRE) Systems • However, previous-generation COTS middleware limits design choices, e.g.: • Customized scheduling policies • “Back-end” dispatching optimizations • Heterogeneous end-to-end dispatching • Optimized integration w/ other key technologies, e.g., RT-ARM, QuO, TNA • Historically, COTS couples functional with QoS aspects • e.g., due to lack of “hooks”

Research Context: Avionics Mission Computing • Bold Stroke Middleware Infrastructure Platform • ASFD, ASTD I, WSOA, ASTD II • Target: transition to production systems • Operations Well Defined • Harmonic rate sets, bounded execution times, some critical, some non-critical • Need criticality isolation assurances • Event Mediated • RT Enhanced TAO Event Channel • Distributed precedence DAG, scheduler per endsystem sub-graph • Previous Generation Systems • Fixed environment, static modes • Used cyclic or RMS scheduling • Next Generation Systems • Highly variable environment • Very large number of system states – support for fine-grain dynamic modes • Need dynamic and adaptive approaches to resource management • Need coordinated closed-loop control of QoS across time-scales, system layers • ACE+TAO, QuO, RT-ARM

rate propagation WCET propagation static RMS selected rates propagated rates static mandatory rate tuples Dispatching configuration optional LLF tuple visitor laxity timers operation visitors sub-graph Rate and priority assignment policy Kokyu: a Flexible Middleware Scheduling and Dispatching Framework Scheduler Dispatcher • Dispatcher is (re)configurable • Multiple priority lanes • Queue, thread, timers per lane • Starts repetitive timers once • Looks up lane on each arrival • The application specifies characteristics • E.g., criticality, periods, dependencies • The scheduler assigns/stores periods & priorities per topology, scheduling policy • Defines necessary dispatch configuration • Implicit projection of particular scheduling policy into a generic dispatching infrastructure

Empirical Costs of Dispatching Primitives • Three Canonical Queue Ordering Disciplines • Simple test with queue classes • Static  fixed sub-priority • Deadline  time to deadline • Laxity  time to deadline – WCET • Messages in dynamic queues age monotonically – cancellation only needed at the head of the queue • Basic Empirical Comparison • Randomly ordered enqueues • Average enqueue and dequeue times for deadline were slightly better than for laxity, and much worse than for static • Simple result, but useful to guide design of and experiments with richer composite strategies

static static laxity Dispatching configuration laxity mandatory RT-ARM optional Domain-Tailored Scheduling Heuristics • Abstract Mapping: Operation Characteristics  OS & Middleware Primitives • Generalizes well-known monolithic and composite heuristics, e.g., RMS, EDF, LLF, MUF, RMS+LLF • Supports arbitrary composition • Also allows strategized factory-driven dispatching module (re)configuration at run-time (work-in-progress) • Empirical & a priori info guides design of domain-specific heuristic Dispatcher • Desired Heuristic Depends on Application-Specific Details: • Scheduling ARM ops: timely adaptive transitions w/out impacting mandatory ops • RMS {mandatory, ARM} + LLF {optional} if feasible (lower mandatory overhead) • LLF {mandatory, ARM} + LLF {optional} (MUF) if feasible (non-harmonic rates) • RMS {mandatory} + LLF {ARM} + LLF {optional} • if {mandatory, ARM} infeasible & cannot separate ARM operations • RMS {mandatory, ARMm} + LLF {ARMo} + LLF {optional} • LLF {mandatory, ARMm} + LLF {ARMo} + LLF {optional} • RMS {mandatory, ARMm} + LLF {ARMo, optional} • LLF {mandatory, ARMm} + LLF {ARMo, optional} • A cancellation discipline for futile operations may help optimize lower partitions

feasibility feasibility feasibility feasibility performance performance performance performance feasibility feasibility feasibility feasibility feasibility feasibility performance performance performance performance performance performance • Preserve Invariant, but Optimize • Given: RT-ARM ops separable  into RT-ARMm and RT-ARMo • Given: criticality values express deadline isolation partitions • Definition: system schedulable if highest partition feasible • Invariant: no lower partition makes feasible higher one infeasible • Precise invariant strength is key: • e.g., 1:1 criticality-priority over-constrains problem • e.g., 1:1 rate-priority (RMS) both over-and-under-constrains problem • Want invariant-preserving optimizations, e.g., RMS{mandatory,ARMm} + LLF{ARMo, optional} • Decision lattice among distinct mappings from criticality partitions into priority partitions • Example based on one criticality partition: {mandatory},{ARMm}, {ARMo},{optional} RMS {mandatory, ARM, optional} Domain-Tailored Scheduling Example LLF {mandatory, ARM, optional} Dispatcher RMS {mandatory, ARM} LLF {optional} LLF {mandatory, ARM} LLF {optional} RMS {mandatory, ARMm} LLF {ARMo, optional} LLF {mandatory, ARMm} LLF {ARMo, optional} RMS {mandatory} LLF {ARM, optional} LLF {mandatory} LLF {ARM, optional}

Experiments to Quantify Hybrid/Adaptive Scheduling • ASFD • Quantify RMS, MUF, RMS+LLF with realistic OFP application using The ACE ORB (TAO) on realistic hardware (AV-8B simulator demonstration 1999): AFRL sponsored, directed by Boeing • ASTD I • RT-ARM/Scheduler integration and adaptive scheduling scalability test (NT desktop demonstration in 1999): AFRL sponsored, directed by Boeing • WSOA • Quantify coordinated multi-layer resource management (ground and flight demonstrations in 2001): AFRL, OS/JTF, DARPA sponsored, directed by Boeing • ASTD II • Integrate TNA and OFP scheduling and dispatching: AFRL sponsored, directed by Boeing • Boeing Fellowship  Benefits to WSOA • Quantify behavior across a range of scheduling heuristics: work started under 1999-2000 Boeing Fellowship Grant, leveraged on WSOA, new OEP experiments underway

FRAME MANAGER Real-Time Metrics Monitoring Framework • Frame Manager • Singleton manages deadlines per rate REMOTE LOGGER METRICS CACHE SHARED MEMORY Data Collection • Probes populate singleton C++ metrics data cache • Monitor periodically digests raw data, steams to logger PROBES STORAGE METRICS MONITOR OPERATIONS PROBES Logging and Visualization • Logger streams data to storage, viewers • Remote processing, e.g., on an NT Workstation DISPATCHER REMOTE WORKSTATION DOVE Browser (Java) EMBEDDED BOARDS QuO Resource Management • RTARM, QuO (syscond) may query the monitor RTARM

ASFD: Measured Improvement over Static Approach • One Problem the Flexible Scheduling Framework Addresses • Transition from an already overloaded state (but with mandatory below the utilization bound), to an even more overloaded state in the ASFD application • As theory predicts, RMS showed significant degradation of mandatory operation deadline behavior under CPU overload conditions

ASFD: Hybrid Scheduling Isolated Deadline Misses • MUF (Shown) and RMS+LLF: No Mandatory Operation Misses • Successful use of dynamic queue disciplines: predictable mandatory operation behavior in MUF and RMS+LLF • Same overload conditions in same state • MUF and RMS+LLF  mandatory & optional in separate lanes • Partitioning helps isolate effects of offered overload

ASFD: Empirical Result for Domain-Specific Design RMS+LLF Optional Operations MUF Optional Operations • Expectations from Theory and Basic Overhead Tests Confirmed • Slight increase in made deadlines under same overload conditions w/out cancellation, moving from MUF to RMS+LLF • RMS+LLF: static mandatory queues, 5 threads, lower overhead & messages/queue • MUF: laxity mandatory queue, 2 threads, lower overhead & messages/queue

ASFD: Early Results for Operation Cancellation MUF Optional Operations without cancellation • Cancellation Effects in MUF (RMS+LLF similar) • Optional tasks may miss deadlines • Cancellation reduced futile dispatches • However, pessimistic cancellation based on WCET also reduced deadlines made (i.e., due to false cancellations) • Different and/or more exact execution time information (e.g., distribution curve with WCET, ACET, BCET) needed MUF Optional Options with cancellation

ASTD I: Number of Calls in an Adaptive Transition • Interaction Between RT-ARM and Scheduler during Transitions • RT-ARM iteratively proposed operation to rate bindings, according to a (HTC proprietary) heuristic for adaptive rate selection • Scheduler assessed feasibility of each proposed binding • Scheduler also gave feedback on the feasibility “sensitivity” of the proposed binding WRT the feasible bound, to changes to rates in the bound set • Number of scheduler calls was linear in number of destination state operations

ASTD I: Average Call Time in an Adaptive Transition • Combined RT-ARM and Scheduler Behavior • Average time in each scheduler call also linear in # of destination state operations • Combined call count and time curves indicate scheduler sensitivity and feasibility were quadratic in # of destination state operations • O(n2) adaptation time may be fine for small numbers of operations per state, but for larger scale applications (e.g., ASTD II, SEC, WSOA) a closer bound is needed

rate propagation WCET propagation selected rates propagated rates rate tuples sub-graph Rate and priority assignment policy WSOA: Improving Adaptive Scheduling Performance • Integrated Mechanisms for Adaptive Scheduling • In ASTD I, isolation of mechanisms (RT-ARM vs. scheduler) made it O(n2) to select feasible rate assignment • Merging mechanisms in scheduler: rate and lane assignment as O(n lg n) or O(n) sorts, O(n) feasibility & max-rate • Modular visitor functors over dependency graph • i.e., cycle check, rate choice, WCET & rate propagation) • Rate selection strategized, like lane assignment: arbitrary mapping from operation characteristics to selection order (may use 2 feasibility thresholds) Scheduler tuple visitor operation visitors • Similarly General & Flexible Approach to Rate Selection Strategies • e.g., Fair Admission by Indexed Rate (FAIR) • Rate index, criticality, mean rate, handle • Each op gets its lowest rate, then each gets its next highest, etc. • e.g., Criticality-Biased FAIR (CB-Fair) • Criticality, rate index, mean rate, operation handle • Mandatory ops get all rates first, then optional ops

RATE ID START END 20HZ 20HZ 20HZ 20HZ 20HZ 20HZ 20HZ 20HZ 20HZ 8 3 7 4 1 6 5 2 0 100 250 200 400 150 300 0 350 50 200 250 100 450 50 400 300 150 350 10HZ 10HZ 10HZ 10HZ 10HZ 10HZ 10HZ 10HZ 10HZ 1 0 0 1 3 4 2 3 2 200 200 100 100 0 400 0 300 300 400 500 400 300 300 200 100 100 200 5HZ 5HZ 5HZ 5HZ 5HZ 5HZ 5HZ 5HZ 5HZ 1 0 1 0 2 0 0 1 1 0 0 200 200 400 200 0 200 0 400 200 400 200 200 400 400 600 200 WSOA: Consistent Time and Frame Management FRAME MANAGER • Common Reference Point for Deadlines Singleton Frame Manager • Rates are Registered with Frame Manager • Frame Start, End, and Id for Each Rate • Timer-Based Call to Update Frames Advances Each Frame Appropriately • Dispatcher Obtains Deadline Using Period from Operation RT_Info to Manage Laxity, Deadline Queues 20HZ 450 5HZ 600 10HZ • Upcall Adapter Uses Cancellation Deadline, Again Using RT_Info Rate 20HZ 500 450 • RT-ARM and QuO Can Also Obtain Consistent Deadlines from Frame Manager DISPATCHER ADAPTER RTARM QuO EMBEDDED BOARD

START SUSPEND RESUME STOP START START SUSPEND STOP RESUME STOP WSOA: Meeting Embedded & RT Metrics Constraints METRICS CACHE 20Hz EnQ Process Tile B S R E SHARED MEMORY Shared Memory Capable • Time & space constraints: shared memory (e.g., VME) is attractive • However, virtual functions, native pointers and heap allocation are perilous in shared memory • Apply offset-based smart pointers, strategized allocators to support run-time cache sizing and use from any address space • System instrumented with C++ inline probe macros: write to a singleton metrics cache METRICS MONITOR 20Hz DQ B S R E Proc Tile B E 20Hz Enqueue START 20Hz Dequeue START STOP SUSPEND DISPATCHER RESUME Process Owned Memory • Upcall Adapter • Frame Manager • DOVE, Metrics Monitor, Logger • All non-Tie CORBA Interfaces STOP OPERATION UPCALL ADAPTER EMBEDDED BOARDS

ASTD II: Domain Integration at the Dispatching Level Task Network Scheduler Active Task Agenda • Coordinated TNA+OFP Operation Dispatching • TNS specifies operation characteristics to the scheduler • TNS manages tasks and execution predicates, processes active agenda by strategized transfer to dispatcher • Dispatcher manages priority lanes according to specified configuration • Special dispatch target (in this case implemented as an Event Consumer) invokes execute function of special event as a functor (a.k.a., Command Design Pattern) to call the TN task process(...) Prioritized OS Threads push(...) enqueue lookup Dispatcher (configured for RMS+LLF) push(...) TAO Scheduler TN Event Consumer execute() TN Task

Concluding Remarks Empirical Results • Show adaptive & hybrid scheduling approach can improve DRE real-time performance Composable Dispatching • Enables domain-specific optimizations, especially when design decisions are aided by empirical data QoS Metrics Framework • Offers diverse modes of interaction with a DRE system: resource managers, human operators, control systems OEP Experiments • Will offer a quantitative profile of the benefits of this approach to flexible real-time scheduling in middleware Open-Source Code • All software described here that is part of my research will be made available in the ACE_wrappers distribution • First within TAO, then as a distinct Kokyu directory (summer 2001)

static static laxity timers Future Work • Run-Time Re-configurable Dispatching Module + Factory • Strategized adaptation at run-time to changing queue, thread, and timer configurations: pre-allocation, reallocation, caching of primitives • Paper in progress for submission to Middleware 2001 • SEC and ASTD II Bold Stroke infrastructure (Boeing) – in progress • TAO, Kokyu – summer 2001 • Extremely Small Footprint DRE Middleware • Applying scheduling/dispatching primitives and framework to nodes and nodelets with different resource niche scales (downward scalability) • “Just enough” middleware: e.g., from Jini-like backbone to an ORB • Proposal for DARPA ITO NEST Program: Open Experimental Platform • 2001-2005 time frame

Future Work • Aspect-Oriented Techniques for Middleware QoS Management • Using aspect weaving to configure possibly heterogeneous middleware scheduling and dispatching points, at multiple layers and on multiple paths • Multi-dimensional resource management, i.e., memory, network, and CPU together, to apply good heuristics and domain-specific optimizations • Domain-specific type systems, generic programming techniques may apply • Coordinated Multi-Layer Multi-Agent QoS Management • Integrated cooperation of resource managers, schedulers, dispatchers in OS, middleware, and application layers • Infrastructure hybridization and re-factoring to leverage mechanism composition and integration, while preserving policy separation • Toward generalized techniques, patterns and a theory of QoS composition in middleware, e.g.., hard real-time + anytime + adaptive control + DAS + …

Applying a Flexible Middleware Scheduling Framework to Optimize Distributed RT/Embedded Systems