Performance Analysis of Concurrent & Distributed Real-Time Software Designs

Performance Analysis of Concurrent & Distributed Real-Time Software Designs ECEN5053 Software Engineering of Distributed Systems University of Colorado

Overview • Why bother • Review of RMA • Advanced RMA • Event Sequence Analysis • Examples ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Why bother? • Quantitative analysis allows for early detection of potential performance problems • Both Rate Monotonic Analysis and Event Scheduling Analysis are applied to designs • Task architecture level • Provides early performance estimate and characterization, e.g. where are bottlenecks ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

A Word About the SPE model • The SPE model (Smith and Williams) can model distributed systems or single CPU systems • Represent components whether they are software or hardware or both • Specify varying workloads ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Review of RMA • Priority based scheduling of concurrent tasks with hard deadlines • Same CPU • Can be used in environments with less rigid constraints • For example, server role in a client/server application • Assumes priority preemption scheduling algorithm • Can be applied where task synchronization is required ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Basic Theory RMA Review (cont. 1) • Initially • Independent periodic tasks • Do not communicate with each other • Do not synchronize with each other • Periodic task has • A period T, frequency with which it executes • An execution time C, CPU time required/period • CPU utilization of C/T • Group of tasks is schedulable if each task can meet its deadlines • Assign a fixed priority such that the shorter period has the higher priority ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

RMA Review (cont. 2) • A set of n independent periodic tasks scheduled by the rate monotonic algorithm will always meet its deadlines for all task phasings, if: C1/T1 + … + Cn/Tn <= n( 21/n – 1) = U(n) where Ci and Ti are the execution time and period of task ti, respectively. (Note: the upper bound converges to 69% as the number of tasks approaches infinity.) U(1) = 1.000 U(2) = .828 U(3) = .779 U(4) = .756 U(5) = .743 U(6) = .734 U(7) = .728 U(8) = .724 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Conclusions & Assumptions • The rate monotonic algorithm is stable when there is a transient overload • A subset of the total number of tasks (highest priorities) will still meet their deadlines if the system is overloaded for a relatively short time. • Context switching overhead is included in the CPU times of the interrupting tasks • The Utilization Bound Theorem is pessimistic. If it fails, we can do a further check by applying a second theorem to get an exact determination of whether the tasks are schedulable. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Completion Time Theorem -- Thm 2 • For a set of independent periodic tasks, if each task meets its deadline when all tasks are started at the same time, the deadlines will be met for any combination of start times. • Check the end of the first period of task ti as well as the end of all periods of higher priority tasks. • Remember the higher priority tasks have shorter periods • These are called scheduling points • Can be illustrated graphically with a timing diagram ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Time-annotated sequence diagram t1 t2 t3 Time in msec ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Contradictions to Basic RMA Theory • Sometimes tasks execute at actual priorities different from their rate monotonic priorities – priority inversion • For example, a lower priority task must execute its critical section at a higher priority to avoid being preempted by a higher priority task that shares the same resource but is mutually excluded • Support mutual exclusion • Avoid deadlock ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Contradictions to Basic RMA Theory - 2 • Aperiodic tasks can be treated as periodic tasks where the worst-case inter-arrival time is its “period” • If this “period” is longer than another, it will be assigned a lower priority • Often aperiodic tasks are interrupt-driven and execute as soon as the interrupt arrives ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Accounting for Priority Inversion • Extend Theorem 1 (Utilization Bound) • Four factors need to be considered to determine whether task ti can meet its first deadline • Preemption time by higher priority tasks (periods less than ti) Cj/Tj for each task • Execution time for task ti, Ci/Ti • Preemption by higher priority tasks with longer periods, that is, non-rate-monotonic priorities. • Can only interrupt ti once (why?) • Ck is the sum of their execution times • Ck/Ti because worst case is that it all occurs in i’s period • Blocking time by lower priority tasks – once/Ti ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Generalized Utilization Bound Thm Cj 1 Ci + Bi + Ck + Tj Ti j e Hn k e Hl Ui = Ui is the utilization bound during period Ti for task ti. The first term is the total preemption utilization by higher priority tasks with periods of less than ti’s. The second term is the CPU utilization by task ti. The third term is the worst-case blocking utilization experienced by ti. The fourth term is the total preemption utilization by higher priority tasks with longer periods than ti’s period. (Terms 3 and 4 are instances of priority inversion.) If Ui is less than the worst-case upper bound for U(i), this means the task ti will meet its deadline. The utilization-bound test must be applied to each task. Since rate monotonic priorities are not guaranteed, ti may meet its deadline while a higher priority task does not. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Generalized Completion Time Theorem • Assumes the worst case that all tasks are ready for execution at the start of the task ti’s period. • Draw the timing sequence diagram for all the tasks and take into account the priority inversion as well as preemption that can occur. • If each task meets its first deadline while all higher priority tasks meet all of their deadlines up to that point and all priority-inverted tasks meet their deadlines up to that point, then ti will meet its deadlines. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Task scheduling and Design • Cautious approach at design time • Use estimates • Satisfy Thm 1, the conservative one, not just Thm 2 • If some tasks with lower priorities have soft real-time or non-real-time tasks • Ok to exceed utilization bound somewhat • If ok to miss their deadlines/targets occasionally • At design time, can choose priorities to assign • Aim for rate monotonic priorities for periodic tasks • Assign highest priorities to interrupt-driven tasks to reflect reality • If 2 tasks have same period, assign one a higher priority based on application semantics ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Example of Generalized RMA • 4 tasks, t1 and t3 are periodic and t2 and ta are aperiodic • ta is interrupt-driven and must execute within 200 ms of the arrival of its interrupt or data will be lost • t2 has a worst-case interarrival time of T2. • t1 is periodic: C1 = 20; T1 = 100; U1 = 0.2 • t2 is aperiodic: C2 = 15; T2 = 150; U2 = 0.1 • ta is aperiodic, interrupt-driven: Ca = 4; Ta = 200; Ua = 0.02 • t3 is periodic: C3 = 30; T3 = 300; U3 = 0.1 • t1, t2 and t3 access a data repository protected by semaphore s. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Notes, not meant for use as slide If tasks assigned strict rate monotonic priorities, obviously the assignments in priority order from highest to lowest would be t1, t2, ta, and t3. ta stringent response time tells us to give it the highest priority. The priority assignment becomes ta, t1, t2, and t3. Overall CPU utilization is 0.42 which is less than worst-case utilization bound for infinity, namely 0.69. Since rate monotonic priorities are not strictly assigned, we can’t rely on the basic Theorem 1, we need to apply the extended theorem 1 to each task individually. ta is highest priority and interrupt-driven so there are no blockers. Ua is 0.02 < U(1) -- no problem meeting its deadline. (cont. next slide) ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Notes 2, not meant for use as slide Consider t1. Need to consider four factors : a. Preemption time by higher priority tasks with periods less than T1. There are higher priority tasks (the aperiodic one) but not with shorter periods. b. Execution time C1 for the task t1 = 20. U1 = 0.2 c. Preemption by higher priority tasks with longer periods. ta is one of these. Preemption utilization during the period T1: Ca /T1 = 4/100=0.04 d. Blocking time by lower priority tasks. Because of the semaphore, t2 and t3 can both potentially block t1. In the worst case, one of them will. But at most one lower priority task can actually block t1 (why?). The worst case is the task with the longer CPU time, t3 = 30. Blocking utilization during the period T1: B3 /T1 = 30/100 = 0.04 Worst case utilization = preemption util. +execution util. + blocking util. = .04 + .2 + .3 = .54 < worst-case upper bound of .69. t1 will be ok. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

NOTES 3 You do the calculation for tasks 2 and 3. Ask for help if you need it. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Event Sequence Analysis • If done properly, during requirements definition, the system’s required response times to external events are specified • After task structuring, we can make a first attempt at allocating time budgets to the concurrent tasks • Event Sequence Analysis determines the tasks to be executed to service a given external event ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis using ESA • Pick an external event • Determine which I/O task is activated by this event • Determine the sequence of internal events that follow in response • Identify the tasks that are activated • Identify the I/O tasks that generate the system response to the external event • Estimate CPU time for each task • Estimate CPU overhead, inter-task communication and synchronization overhead • Consider other tasks that execute during this period ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

CPU Utilization for ESA • The Sum (indented list) must be less than or equal to the specified system response time • CPU times for the tasks that participate in the event sequence • Times for additional tasks that execute • CPU overhead • Allocate a worst-case upper bound for uncertain CPU times • Overall CPU utilization, estimate for given interval • CPU time for each task, for each path if >= 1 • Frequency of activation * tasks’ CPU times ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Example of Perf. Analysis using ESA • Consider Cruise Control subsystem; see event sequence diagram • based on task architecture diagram • assume, for now, that all the other tasks in the system as well as Calibration in this subsystem have lower priorities so that we can ignore them • Consider first the case of the driver engaging the cruise control lever in the accelerate position resulting in controlled acceleration of the car. • Performance requirement: system must respond to driver’s action within 250 ms. • Sequence of internal events following the driver’s stimulus is shown by the event seq. on the concurrent collaboration diagram (Fig. 17.2 taken from Gomaa’s book). ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis ESA example - cont. • Assume Cruise Control is in its initial state. ACCEL is the cruise control input. • Event sequence: (Ci is time to process event i) • C1: interrupt arrives from external cr. cont. lever • C2: CC Lever Interface reads the ACCEL input from the CC lever • C3: CC Lever interface sends a cc request message to CC • C4: CC receivse the msg, executes its state transition diagram, and changes state from Initial to Accelerating • C5: CC sends an increase speed command msg to Speed Adjustment • C6: Speed Adjustment executes the command, computes throttle value • C7: Speed Adj sends throttle value msg to Throttle Interface task ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis ESA example - cont. 2 • Event sequence continues: • C8: Throttle Interface computes new throttle position • C9: Throttle Interface outputs throttle position to the real-world throttle. (This is an output operation, uses no CPU time.) • Four tasks required to support the ACCEL external event • Minimum of four context switches required, 4*Cx where Cx is context switching overhead • Assume Cm is message communication overhead so C3, C5, and C7 are all equal to Cm ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis ESA example - cont. 3 • Execution time of this event sequence, Ce = what? • System response time, however, must also consider other tasks that could execute during the time when the system must respond to the external event. • Look at Fig. 17.2 (remember we have artificially decided that all other tasks have lower priorities -- they can’t execute during this time) • Assume Auto Sensors (C10) is periodically activated every 100 ms. It could execute 3 times before the 250 ms deadline. • Shaft Interface (C11) is activated once every shaft rotation. It could execute up to __?__ times assuming a shaft rotation max rate of 6000 rpm. This is once every __?__ . • Distance & Speed (C12) activates periodically once every quarter of a second. In the 250 ms window, it can execute _?_. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis ESA example - cont. 4 • Every time another task intervenes, there could be two context switches (assume 0.5ms for real-time) • assuming the executing task is preempted and then resumes execution after completion of the intervening task • These three tasks could therefore impose an additional __?__ context switches. • Total CPU time Cother for these three tasks including system overhead is what? • Estimated response time to the external event is greater than or equal to the total CPU time which is the sum of the tasks in the event sequence plus the CPU time for the other tasks. Ctotal = Ce + Cother ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis ESA example - cont. 5 • Make estimates for each of these timing parameters so that the equations can be solved (see table provided) • Substituting for the timing parameters results in estimated value of Ce = 35 ms. • Substituting for the estimated timing parameters adding up to Cother results in estimated value of 79 ms • Ctotal = 114 ms. This is well below the specified response time of 250 ms. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis ESA example - cont. 6 • How susceptible is the estimated response time to error? • Experiment with different values • What if context switching time were 1 ms instead of 0.5? ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis using RMA & ESA • An external event activates a task. Its execution initiates a series of internal events which activate other internal tasks. • Can all the tasks in the combined event sequence be executed before the deadline? • Each internal event sequence can be analyzed regarding how much time it will take. The internal event sequences can then be treated as a group of tasks rate monotonically speaking … • That is, initially allocate all the tasks in the event sequence the same priority. These can collectively be considered one equivalent task from a real-time scheduling viewpoint. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis using RMA & ESA - 2 • This equivalent task has a CPU time equal to the • sum of the CPU times of the tasks in the event sequence • Plus context switching overhead • Plus message communication or event synchronization overhead • Worst-case inter-arrival time of the external event that initiates the event sequence is the period of this equivalent task. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis using RMA & ESA - 3 • To decide if the equivalent task can meet its deadline, apply the real-time scheduling theorems. Consider: • Preemption by higher priority tasks • Blocking by lower priority tasks • Execution time of the equivalent task itself • Cannot always replace all tasks in the event sequence by a single equivalent task • A task may be used in more than one event sequence • Executing the equivalent task at the chosen priority may prevent other tasks from meeting their deadlines. • May need to analyze tasks separately and assign different priorities ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis using RMA & ESA - 4 • Must consider preemption and blocking on a per task basis • Also necessary to determine whether all tasks in the event sequence will complete before the deadline. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Perf. Analysis using RMA • Some considerations • Consider first a steady state involving only the periodic tasks. • After that, the aperiodic externally-imposed demands on the system can be considered. • Consider the worst steady state case, namely the case that causes maximum CPU demand • Remember context switching time • You can include aperiodic tasks if they have a known/estimated worst-case inter-arrival time • If 2 tasks have same period, assign higher priority to the independent task* ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Perf. Analysis using RMA - 2 • Access time to shared data stores consists of one read instruction or one write instruction. • So small that potential delay time due to blocking of one task by another is considered negligible. • It’s guaranteed to be “short” and to “terminate” so don’t try to compute it as a blocking factor, just include it in its CPU time • Significant priority inversion delays can occur and those are the ones to consider ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Perf. Analysis Example using RMA & ESA • Back to the Cruise Control example • Driver initiates an external event (CC lever or pressing the brake) • Must consider the tasks in the event sequence as well as the periodic tasks that execute on an ongoing basis when simply driving under CC • Earlier we replaced the four tasks in the event sequence with an equivalent aperiodic task ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Perf. Analysis Example using RMA & ESA -2 • Consider the impact of the additional load imposed by the driver-initiated external event on the steady state load of the periodic tasks. • The worst case is when the vehicle is already under automated control (CC). If it weren’t, Speed Adjustment and Throttle Interface wouldn’t be executing so the CPU load would be lighter • Input from CC lever. In the event sequence analysis, we saw CC Lever Interface, CC, Speed Adjustment, and Throttle Interface process this input. (CPU time Ce calculated at slide 29) ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Perf. Analysis Example using RMA & ESA -3 • Four tasks are involved but they must execute in strict sequence. • Each activated by msg from its predecessor. • The four are equivalent to one aperiodic task • Ce is the sum of the CPU times of the four tasks plus msg communication overhead and context switching overhead. • We’ll call the combined task the “event sequence task” • In RMA, can treat aperiodic task as one whose period is the minimum inter-arrival time of the requests. Call it Te = 250 ms. • For now, assume desired response is also Te ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Perf. Analysis Example using RMA & ESA -4 • When assigning priority to the event sequence task, initially assign its rate monotonic priority. • When we do this, the event sequence task has the same period as two other periodic tasks, Speed Adjustment and Distance & Speed. • Assign the event sequence task the highest priority of those three • The event sequence task still has a lower priority than Shaft Interface, Throttle Interface, and Auto Sensors. (See Table 17.4, Gomaa) • Ce for the event sequence task is 35 ms; Te is 250 ms; therefore CPU utilization Ue is 0.14 ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Perf. Analysis Example using RMA & ESA -5 • Total CPU utilization of the periodic tasks is 0.48 (you can compute that if you don’t believe me ) • Total periodic and event sequence task CPU utilization is 0.62 which is less than .69 and therefore less than U(n) where n is the number of periodic tasks plus 1 • Therefore, the event sequence task can meet its deadline as can all the periodic tasks. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Perf. Analysis Example using RMA & ESA -6 • We made one assumption • All tasks can be allocated their rate monotonic priorities • What is wrong with giving the event sequence task its rate monotonic priority? • What is wrong with giving it the highest priority? • Compromise, give the event sequence task a priority lower than Shaft Interface but higher than Throttle Interface and Auto Sensors. This is higher than its rate monotonic priority. • What does THAT mean we’ll have to do? ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Perf. Analysis Example using RMA & ESA -7 • Overall CPU utilization is less than the 0.69 • Bursts of activity can lead to transient loads that are much higher • In the 100 ms worst case CPU burst, the total utilization of the three steady state tasks and the one event sequence task is 67 %, allowing lower priority tasks to execute. • If the next highest priority task, Distance & Speed, were to also execute in this busy 100 ms, CPU utilization would increase to 78% • Comparing to the proper U(n) value, all tasks can meet their deadlines. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Design Restructuring • If proposed design does not meet performance goals, design needs to be restructured • Revisit task clustering criteria and task inversion criteria • Consider sequential task inversion • CC task sends a speed command msg to the Speed Adj task which in turn sends throttle msgs to the Throttle Interface task. • These may be combined into one task, the CC tasks with passive objects for Speed Adj and Throttle Interface. • This eliminates message communication overhead between them plus context switching overhead ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Estimation & Measurement of Performance Parameters • Performance input parameters must be determined through estimation or measurement before the performance analysis is carried out. • Independent variables whose values are input to the performance analysis • Dependent variables are variables whose values are estimated by the real-time scheduling theory • Assumption for RMA, all tasks are locked in main memory so there is no paging overhead. Typically paging overhead cannot be tolerated in real-time system design. ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Estimation & Measurement of Performance Parameters -- 2 • Individual task parameters that need to be estimated for each task involved in the performance analysis • Task’s period Ti which is the frequency with which it executes • Execution time Ci which is the CPU time required for the period • CPU overheads • Context switching overhead • Interrupt handling overhead • Inter-task communication and synchronization overhead ECEN5053 SW Eng of Distributed Systems, Performance Analysis, Univ of Colorado

Performance Analysis of Concurrent & Distributed Real-Time Software Designs