R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems Junsung Kim, Karthik Lakshmanan and Raj Rajkumar Electrical and Computer Engineering Carnegie Mellon University

Outline • Motivation • Goals and Systems Models • R-BATCH: Task Allocations with Replicas • Performance Evaluation • Conclusion

Autonomous Vehicles: Background • GM Chevy Tahoe named “Boss” • Won 2007 DARPA urban challenge Motivation → Goals → R-BATCH → Evaluation → Conclusion 3

Autonomous Vehicles: Background • Boss • Senses environment • Fuses sensor data to form a model of the real world • Plans navigation paths • Actuates steering wheel, brake, and accelerator • Boss requires • Safety-critical operations • Timing guarantees • Robustness to harsh environments Motivation → Goals → R-BATCH → Evaluation → Conclusion 4

Autonomous Vehicles: Architecture • 0.5 million lines of code for autonomous driving support • 10 dual-core processors + 50 embedded processors < Boss System Architecture > <From C. Urmson et al.’s Tartan Racing: A Multi-Modal Approach to the DARPA Urban Challenge> Motivation → Goals → R-BATCH → Evaluation → Conclusion 5

Autonomous Vehicles: Capabilities • 0.5 million lines of code for autonomous driving support • 10 dual-core processors + 50 embedded processors • Requires high computational capabilities with timeliness guarantees • Adding more processors • Using high-performance processors Motivation → Goals → R-BATCH → Evaluation → Conclusion 6

Processor Reliability Trend Wear-out (intrinsic) 2010, 32nm Infant mortality (random, extrinsic) Failure Rate 2000, 130nm 1989, 800nm 100 1 10 Log time (years in service) <From Mark White’s Product Reliability and Qualification Challenges with CMOS Scaling > Motivation → Goals → R-BATCH → Evaluation → Conclusion 7

Goals for Fault-Tolerance • Handle permanent processor failures • Tolerate a given number of processor failures • Avoid losing functionality by adding more resources in an affordable way • Hardware replication • Software replication • Re-execution of failed jobs • Lower quality of service of tasks • Deal with unpredictable nature of failures • Consider all possible scenarios? Motivation → Goals → R-BATCH → Evaluation → Conclusion 9

System Model (1 of 2) • Primary fault model: fail-stop • An entity stops functioning when it fails instead of alternating between correct and wrong outputs • Fault-containmentcan be guaranteed • Consider a set of periodic tasks • Periodic task • Represented by • : Worst-case execution time of task ti • : Period of task • Task utilization: • Total utilization in a processor: Motivation → Goals → R-BATCH → Evaluation → Conclusion 10

System Model (2 of 2) • Task classifications • Hard recovery task • cannot miss the deadline even if a failure occurs • e.g., automotive engine control • Soft recovery task • can be recovered in the nextperiod • e.g., navigation, chassis unit control • Best-effort recovery task • can be recovered if there is an enough room after failure Motivation → Goals → R-BATCH → Evaluation → Conclusion 11

Hard Recovery Task Failure occurred Processor 1 Task recovered Processor 2 0 Task should be recovered within Motivation → Goals → R-BATCH → Evaluation → Conclusion 12

Soft Recovery Task Failure occurred Processor 1 Task recovered Processor 2 0 Task should be recovered within Motivation → Goals → R-BATCH → Evaluation → Conclusion 13

Task Replication • Observations • Hot Standby • The primary and the backups running at the same time • Cold Standby • One Cold Standby can recover several tasks on different processors • Shared system state is available in all processors • By using network bus architecture Motivation → Goals → R-BATCH → Evaluation → Conclusion 14

Hard Recovery Task with Hot Standby Failure occurred Processor 1 Task recovered via Hot Standby Processor 2 0 Motivation → Goals → R-BATCH → Evaluation → Conclusion 15

Soft Recovery Task with Cold Standby Failure occurred Processor 1 Task recovered via Cold Standby Processor 2 0 Motivation → Goals → R-BATCH → Evaluation → Conclusion 16

Example Scenarios P3 P4 P1 P2 nP: Primary of task n nH: Hot Standbys of task n nC: Cold Standbys of task n 1C 1P 2P 3C 2H 5C 4C 4P 3P 3H 5H 5P With 5 tasks and 4 processors P3failed P1 failed P3 P4 P1 P2 P3 P4 P1 P2 1C 1P 1P 1P 2P 3C 2P 3H 2P 5H 2H 5C 4P 4P 4C 4P 3P 3H 3P 3H 5H 5H 5P 5P Motivation → Goals → R-BATCH → Evaluation → Conclusion 17

R-BATCH • Reliable Bin-packing Algorithm for Tasks with Cold standby and Hot standby • Reliable task allocation • Allocates Hot Standbys • Allocates Cold Standbys Motivation → Goals → R-BATCH → Evaluation → Conclusion 19

Uniprocessor Schedulability* • Consider a set of periodic tasks • Periodic task • Represented by • : Worst-case execution time of task ti • : Period of task • Task utilization: • Total utilization in a processor: • Schedulability • For EDF (Earliest Deadline First) • Tasks are schedulable if % • For RMS (Rate Monotonic Scheduling) • Tasks are schedulable if % • For general tasks • Tasks are schedulable if % • For harmonic tasks More complex; misbehaves at higher U Lower utilization Practical <* C.L. Liu and J.W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 1973> Motivation → Goals → R-BATCH → Evaluation → Conclusion 20

Bin-packing Problem • Definition: The problem of packing a set of items into the fewest number of bins such that the total size does not exceed the bin capacity* • Items: Utilizations of each task • Bins: Processors • Then, given a set of tasks, how manybins (processors) do we need?† Tm Task Ti Tj Tk Processor P <*Mark Allen Weiss, from Data Structures and Algorithm Analysis, Addison> <†D. Oh and T. Baker. Utilization bounds for n-processor rate monotonic scheduling with static processor assignment. Real-Time Systems, 1998.> Motivation → Goals → R-BATCH → Evaluation → Conclusion 21

The Classical Approach: Bin-packing • Bin packing is used to allocate tasks to multiprocessor platforms • Best-fit Decreasing (BFD) algorithm • Step 1: Sort the objects in descending order of size • Step 2: Sort the bins in descending order of consumed space • Step 3: Fit next object into the first sorted bin that fits • If no bin fits, add a new bin to fit into • Step 4: If objects remain, go to Step 2. • Step 5: Done. Given a set of tasks: {0.6, 0.3, 0.2} 3, 0.2 2, 0.3 2, 0.3 1, 0.6 1, 0.6 3, 0.2 P3 P4 P2 P1 Motivation → Goals → R-BATCH → Evaluation → Conclusion 22

BFD with Placement Constraints • We also have to deal with replicated tasks • Under the placement constraint (BFD-P*) • No two replicas can be on the same processor • Otherwise, processor failure will take down both replicas 3H, 0.2 2P, 0.3 2H, 0.3 1H, 0.6 3P, 0.2 1P, 0.6 1H, 0.6 2H, 0.3 1P, 0.6 3H, 0.2 3P, 0.2 2P, 0.3 P3 P4 P2 P1 <* J. Chen, C. Yang, T.W., and S.Y. Tseng. Real-Time task replication for fault tolerance in identical multiprocessor systems. In Proceedings of the 13th IEEE RTAS, IEEE CS, 2007.> Motivation → Goals → R-BATCH → Evaluation → Conclusion 23

Can BFD-P Be Improved? • Given a set of tasks: {0.6, 0.3, 0.2} with 2 replicas each • By using BFD with placement constraint • We can however reduce the number of bins as follows: 3H, 0.2 2P, 0.3 2H, 0.3 1H, 0.6 3P, 0.2 1P, 0.6 1H, 0.6 2H, 0.3 1P, 0.6 3H, 0.2 3P, 0.2 2P, 0.3 P3 P4 P2 P1 3H, 0.2 2P, 0.3 1H, 0.6 3P, 0.2 1H, 0.6 3H, 0.3 2H, 0.3 1P, 0.6 1P, 0.6 2H, 0.3 3P, 0.2 2P, 0.3 P3 P4 P2 P1 Motivation → Goals → R-BATCH → Evaluation → Conclusion 24

Reliable BFD (RBFD) • RBFD Algorithm • Step 1: Sort tasks in decreasing order according to the utilization of each task • Step 2: Allocate each primary task in the bin which will have the smallest remaining space • Step 3: Set i = 1 • Step 4: Allocate ith replica of each task in the bin which will have the smallest remaining space. • Step 5: Increment i and repeat Step 4 until all replicas are allocated. 1H, 0.6 2P, 0.3 1H, 0.6 2H, 0.3 3H, 0.3 3H, 0.2 1P, 0.6 1P, 0.6 2P, 0.3 2H, 0.3 3P, 0.2 3P, 0.2 P3 P4 P2 P1 Motivation → Goals → R-BATCH → Evaluation → Conclusion 25

Save More Processors with Cold Standby • Given a set of tasks: {0.6, 0.3, 0.2} with 3 replicas each to tolerate 2 processor failures • Instead of using two more processors, add an “empty” processor to hold a “virtual task” 2H, 0.3 2P, 0.3 1H, 0.6 3H, 0.2 1H, 0.6 1P, 0.6 2H, 0.3 3H, 0.2 3P, 0.2 P3 P4 P5 P2 P1 2P, 0.3 1H, 0.6 3H, 0.2 1P, 0.6 1C, 0.6 2H, 0.3 2C, 0.3 3C, 0.2 3P, 0.2 P3 P4 P5 P2 P1 Motivation → Goals → R-BATCH → Evaluation → Conclusion 26

Cold Standby with Virtual Task • Virtual task • A guaranteed utilization reserving slack for recovering failures via Cold Standby • Generate Virtual Tasks • Step 1: Create a new virtual task by selecting the task with the highest utilization across all processors, which is not allocated to virtual tasks • Step 2: Compare the size of virtual task with tasks on different processors, and check if those tasks can be recovered by using the virtual task • Step 3: Go to Step 1 if there are remaining tasks 1C, 0.6 2P, 0.3 3C, 0.2 1H, 0.6 3H, 0.2 2C, 0.3 1P, 0.6 2H, 0.3 Generated Virtual Task 1C covers task 1, 2, and 3 3P, 0.2 P3 P4 P2 P1 1C, 0.6 1C, 0.6 3C, 0.2 2C, 0.3 Motivation → Goals → R-BATCH → Evaluation → Conclusion 27

R-BATCH • Reliable Bin-packing Algorithm for Tasks with Cold Standby and Hot Standby • Step 1: Perform R-BFD with the primary and Hot Standbys • Step 2: Generate virtual tasks • Step 3: Perform R-BFD with virtual tasks 2P, 0.3 1H, 0.6 1H, 0.6 3H, 0.2 2H, 0.3 1P, 0.6 2H, 0.3 1P, 0.6 3H, 0.2 3P, 0.2 2P, 0.3 3P, 0.2 P3 P4 P2 P1 1C, 0.6 1C, 0.6 3C, 0.2 2C, 0.3 Motivation → Goals → R-BATCH → Evaluation → Conclusion 28

Performance Evaluation (R-BFD) 18% Ratios of Saved Processors (Normalized to BFD-P) Number of Tasks Motivation → Goals → R-BATCH → Evaluation → Conclusion 31

Performance Evaluation (R-BATCH) 49% Ratios of Saved Processors (Normalized to BFD-P) Number of Tasks Motivation → Goals → R-BATCH → Evaluation → Conclusion 32

Performance Evaluation R-BFD, R-BATCH, Ratios of Saved Processors (Normalized to BFD-P) Ratios of Saved Processors (Normalized to BFD-P) • For smaller task set sizes, R-BFD is more beneficial • For larger task set sizes, R-BATCH is more beneficial Motivation → Goals → R-BATCH → Evaluation → Conclusion 33

Back to Boss • 20 periodic tasks for autonomous driving support • By using R-BATCH • Can tolerate 5 failures with 10 dual-core processors • 35% saving compared to BFD-P • With the primary • With 1 Hot Standby per task • With 4 Cold Standby per task Motivation → Goals → R-BATCH → Evaluation → Conclusion34

Conclusion • Many safety-critical real-time systems must also support redundancy for tolerating faults • We defined recovery task models • Hard Recovery Task • Soft Recovery Task • Best-effort Recovery Task • We used two types of recovery schemes • Hot Standby (for Hard Recovery Task) • Cold Standby (for Soft Recovery Task) • We can tolerate a fixed number of (fail-stop) failures • R-BFD • 18% fewer processors with Hot Standby • R-BATCH • 49% fewer processors with Hot Standby and Cold Standby • Utilizes slack for additional tasks Motivation → Goals → R-BATCH → Evaluation → Conclusion35

R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems