1 / 34

R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems. Junsung Kim , Karthik Lakshmanan and Raj Rajkumar Electrical and Computer Engineering Carnegie Mellon University. Outline. Motivation Goals and Systems Models R-BATCH: Task Allocations with Replicas

questa
Download Presentation

R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. R-BATCH: Task Partitioning for Fault-tolerant Multiprocessor Real-Time Systems Junsung Kim, Karthik Lakshmanan and Raj Rajkumar Electrical and Computer Engineering Carnegie Mellon University

  2. Outline • Motivation • Goals and Systems Models • R-BATCH: Task Allocations with Replicas • Performance Evaluation • Conclusion

  3. Autonomous Vehicles: Background • GM Chevy Tahoe named “Boss” • Won 2007 DARPA urban challenge Motivation → Goals → R-BATCH → Evaluation → Conclusion 3

  4. Autonomous Vehicles: Background • Boss • Senses environment • Fuses sensor data to form a model of the real world • Plans navigation paths • Actuates steering wheel, brake, and accelerator • Boss requires • Safety-critical operations • Timing guarantees • Robustness to harsh environments Motivation → Goals → R-BATCH → Evaluation → Conclusion 4

  5. Autonomous Vehicles: Architecture • 0.5 million lines of code for autonomous driving support • 10 dual-core processors + 50 embedded processors < Boss System Architecture > <From C. Urmson et al.’s Tartan Racing: A Multi-Modal Approach to the DARPA Urban Challenge> Motivation → Goals → R-BATCH → Evaluation → Conclusion 5

  6. Autonomous Vehicles: Capabilities • 0.5 million lines of code for autonomous driving support • 10 dual-core processors + 50 embedded processors • Requires high computational capabilities with timeliness guarantees • Adding more processors • Using high-performance processors Motivation → Goals → R-BATCH → Evaluation → Conclusion 6

  7. Processor Reliability Trend Wear-out (intrinsic) 2010, 32nm Infant mortality (random, extrinsic) Failure Rate 2000, 130nm 1989, 800nm 100 1 10 Log time (years in service) <From Mark White’s Product Reliability and Qualification Challenges with CMOS Scaling > Motivation → Goals → R-BATCH → Evaluation → Conclusion 7

  8. Outline • Motivation • Goals and Systems Models • R-BATCH: Task Allocations with Replicas • Performance Evaluation • Conclusion

  9. Goals for Fault-Tolerance • Handle permanent processor failures • Tolerate a given number of processor failures • Avoid losing functionality by adding more resources in an affordable way • Hardware replication • Software replication • Re-execution of failed jobs • Lower quality of service of tasks • Deal with unpredictable nature of failures • Consider all possible scenarios? Motivation → Goals → R-BATCH → Evaluation → Conclusion 9

  10. System Model (1 of 2) • Primary fault model: fail-stop • An entity stops functioning when it fails instead of alternating between correct and wrong outputs • Fault-containmentcan be guaranteed • Consider a set of periodic tasks • Periodic task • Represented by • : Worst-case execution time of task ti • : Period of task • Task utilization: • Total utilization in a processor: Motivation → Goals → R-BATCH → Evaluation → Conclusion 10

  11. System Model (2 of 2) • Task classifications • Hard recovery task • cannot miss the deadline even if a failure occurs • e.g., automotive engine control • Soft recovery task • can be recovered in the nextperiod • e.g., navigation, chassis unit control • Best-effort recovery task • can be recovered if there is an enough room after failure Motivation → Goals → R-BATCH → Evaluation → Conclusion 11

  12. Hard Recovery Task Failure occurred Processor 1 Task recovered Processor 2 0 Task should be recovered within Motivation → Goals → R-BATCH → Evaluation → Conclusion 12

  13. Soft Recovery Task Failure occurred Processor 1 Task recovered Processor 2 0 Task should be recovered within Motivation → Goals → R-BATCH → Evaluation → Conclusion 13

  14. Task Replication • Observations • Hot Standby • The primary and the backups running at the same time • Cold Standby • One Cold Standby can recover several tasks on different processors • Shared system state is available in all processors • By using network bus architecture Motivation → Goals → R-BATCH → Evaluation → Conclusion 14

  15. Hard Recovery Task with Hot Standby Failure occurred Processor 1 Task recovered via Hot Standby Processor 2 0 Motivation → Goals → R-BATCH → Evaluation → Conclusion 15

  16. Soft Recovery Task with Cold Standby Failure occurred Processor 1 Task recovered via Cold Standby Processor 2 0 Motivation → Goals → R-BATCH → Evaluation → Conclusion 16

  17. Example Scenarios P3 P4 P1 P2 nP: Primary of task n nH: Hot Standbys of task n nC: Cold Standbys of task n 1C 1P 2P 3C 2H 5C 4C 4P 3P 3H 5H 5P With 5 tasks and 4 processors P3failed P1 failed P3 P4 P1 P2 P3 P4 P1 P2 1C 1P 1P 1P 2P 3C 2P 3H 2P 5H 2H 5C 4P 4P 4C 4P 3P 3H 3P 3H 5H 5H 5P 5P Motivation → Goals → R-BATCH → Evaluation → Conclusion 17

  18. Outline • Motivation • Goals and Systems Models • R-BATCH: Task Allocations with Replicas • Performance Evaluation • Conclusion

  19. R-BATCH • Reliable Bin-packing Algorithm for Tasks with Cold standby and Hot standby • Reliable task allocation • Allocates Hot Standbys • Allocates Cold Standbys Motivation → Goals → R-BATCH → Evaluation → Conclusion 19

  20. Uniprocessor Schedulability* • Consider a set of periodic tasks • Periodic task • Represented by • : Worst-case execution time of task ti • : Period of task • Task utilization: • Total utilization in a processor: • Schedulability • For EDF (Earliest Deadline First) • Tasks are schedulable if % • For RMS (Rate Monotonic Scheduling) • Tasks are schedulable if % • For general tasks • Tasks are schedulable if % • For harmonic tasks More complex; misbehaves at higher U Lower utilization Practical <* C.L. Liu and J.W. Layland. Scheduling algorithms for multiprogramming in a hard-real-time environment. J. ACM, 1973> Motivation → Goals → R-BATCH → Evaluation → Conclusion 20

  21. Bin-packing Problem • Definition: The problem of packing a set of items into the fewest number of bins such that the total size does not exceed the bin capacity* • Items: Utilizations of each task • Bins: Processors • Then, given a set of tasks, how manybins (processors) do we need?† Tm Task Ti Tj Tk Processor P <*Mark Allen Weiss, from Data Structures and Algorithm Analysis, Addison> <†D. Oh and T. Baker. Utilization bounds for n-processor rate monotonic scheduling with static processor assignment. Real-Time Systems, 1998.> Motivation → Goals → R-BATCH → Evaluation → Conclusion 21

  22. The Classical Approach: Bin-packing • Bin packing is used to allocate tasks to multiprocessor platforms • Best-fit Decreasing (BFD) algorithm • Step 1: Sort the objects in descending order of size • Step 2: Sort the bins in descending order of consumed space • Step 3: Fit next object into the first sorted bin that fits • If no bin fits, add a new bin to fit into • Step 4: If objects remain, go to Step 2. • Step 5: Done. Given a set of tasks: {0.6, 0.3, 0.2} 3, 0.2 2, 0.3 2, 0.3 1, 0.6 1, 0.6 3, 0.2 P3 P4 P2 P1 Motivation → Goals → R-BATCH → Evaluation → Conclusion 22

  23. BFD with Placement Constraints • We also have to deal with replicated tasks • Under the placement constraint (BFD-P*) • No two replicas can be on the same processor • Otherwise, processor failure will take down both replicas 3H, 0.2 2P, 0.3 2H, 0.3 1H, 0.6 3P, 0.2 1P, 0.6 1H, 0.6 2H, 0.3 1P, 0.6 3H, 0.2 3P, 0.2 2P, 0.3 P3 P4 P2 P1 <* J. Chen, C. Yang, T.W., and S.Y. Tseng. Real-Time task replication for fault tolerance in identical multiprocessor systems. In Proceedings of the 13th IEEE RTAS, IEEE CS, 2007.> Motivation → Goals → R-BATCH → Evaluation → Conclusion 23

  24. Can BFD-P Be Improved? • Given a set of tasks: {0.6, 0.3, 0.2} with 2 replicas each • By using BFD with placement constraint • We can however reduce the number of bins as follows: 3H, 0.2 2P, 0.3 2H, 0.3 1H, 0.6 3P, 0.2 1P, 0.6 1H, 0.6 2H, 0.3 1P, 0.6 3H, 0.2 3P, 0.2 2P, 0.3 P3 P4 P2 P1 3H, 0.2 2P, 0.3 1H, 0.6 3P, 0.2 1H, 0.6 3H, 0.3 2H, 0.3 1P, 0.6 1P, 0.6 2H, 0.3 3P, 0.2 2P, 0.3 P3 P4 P2 P1 Motivation → Goals → R-BATCH → Evaluation → Conclusion 24

  25. Reliable BFD (RBFD) • RBFD Algorithm • Step 1: Sort tasks in decreasing order according to the utilization of each task • Step 2: Allocate each primary task in the bin which will have the smallest remaining space • Step 3: Set i = 1 • Step 4: Allocate ith replica of each task in the bin which will have the smallest remaining space. • Step 5: Increment i and repeat Step 4 until all replicas are allocated. 1H, 0.6 2P, 0.3 1H, 0.6 2H, 0.3 3H, 0.3 3H, 0.2 1P, 0.6 1P, 0.6 2P, 0.3 2H, 0.3 3P, 0.2 3P, 0.2 P3 P4 P2 P1 Motivation → Goals → R-BATCH → Evaluation → Conclusion 25

  26. Save More Processors with Cold Standby • Given a set of tasks: {0.6, 0.3, 0.2} with 3 replicas each to tolerate 2 processor failures • Instead of using two more processors, add an “empty” processor to hold a “virtual task” 2H, 0.3 2P, 0.3 1H, 0.6 3H, 0.2 1H, 0.6 1P, 0.6 2H, 0.3 3H, 0.2 3P, 0.2 P3 P4 P5 P2 P1 2P, 0.3 1H, 0.6 3H, 0.2 1P, 0.6 1C, 0.6 2H, 0.3 2C, 0.3 3C, 0.2 3P, 0.2 P3 P4 P5 P2 P1 Motivation → Goals → R-BATCH → Evaluation → Conclusion 26

  27. Cold Standby with Virtual Task • Virtual task • A guaranteed utilization reserving slack for recovering failures via Cold Standby • Generate Virtual Tasks • Step 1: Create a new virtual task by selecting the task with the highest utilization across all processors, which is not allocated to virtual tasks • Step 2: Compare the size of virtual task with tasks on different processors, and check if those tasks can be recovered by using the virtual task • Step 3: Go to Step 1 if there are remaining tasks 1C, 0.6 2P, 0.3 3C, 0.2 1H, 0.6 3H, 0.2 2C, 0.3 1P, 0.6 2H, 0.3 Generated Virtual Task 1C covers task 1, 2, and 3 3P, 0.2 P3 P4 P2 P1 1C, 0.6 1C, 0.6 3C, 0.2 2C, 0.3 Motivation → Goals → R-BATCH → Evaluation → Conclusion 27

  28. R-BATCH • Reliable Bin-packing Algorithm for Tasks with Cold Standby and Hot Standby • Step 1: Perform R-BFD with the primary and Hot Standbys • Step 2: Generate virtual tasks • Step 3: Perform R-BFD with virtual tasks 2P, 0.3 1H, 0.6 1H, 0.6 3H, 0.2 2H, 0.3 1P, 0.6 2H, 0.3 1P, 0.6 3H, 0.2 3P, 0.2 2P, 0.3 3P, 0.2 P3 P4 P2 P1 1C, 0.6 1C, 0.6 3C, 0.2 2C, 0.3 Motivation → Goals → R-BATCH → Evaluation → Conclusion 28

  29. Outline • Motivation • Goals and Systems Models • R-BATCH: Task Allocations with Replicas • Performance Evaluation • Conclusion

  30. Performance Evaluation (R-BFD) 18% Ratios of Saved Processors (Normalized to BFD-P) Number of Tasks Motivation → Goals → R-BATCH → Evaluation → Conclusion 31

  31. Performance Evaluation (R-BATCH) 49% Ratios of Saved Processors (Normalized to BFD-P) Number of Tasks Motivation → Goals → R-BATCH → Evaluation → Conclusion 32

  32. Performance Evaluation R-BFD, R-BATCH, Ratios of Saved Processors (Normalized to BFD-P) Ratios of Saved Processors (Normalized to BFD-P) • For smaller task set sizes, R-BFD is more beneficial • For larger task set sizes, R-BATCH is more beneficial Motivation → Goals → R-BATCH → Evaluation → Conclusion 33

  33. Back to Boss • 20 periodic tasks for autonomous driving support • By using R-BATCH • Can tolerate 5 failures with 10 dual-core processors • 35% saving compared to BFD-P • With the primary • With 1 Hot Standby per task • With 4 Cold Standby per task Motivation → Goals → R-BATCH → Evaluation → Conclusion34

  34. Conclusion • Many safety-critical real-time systems must also support redundancy for tolerating faults • We defined recovery task models • Hard Recovery Task • Soft Recovery Task • Best-effort Recovery Task • We used two types of recovery schemes • Hot Standby (for Hard Recovery Task) • Cold Standby (for Soft Recovery Task) • We can tolerate a fixed number of (fail-stop) failures • R-BFD • 18% fewer processors with Hot Standby • R-BATCH • 49% fewer processors with Hot Standby and Cold Standby • Utilizes slack for additional tasks Motivation → Goals → R-BATCH → Evaluation → Conclusion35

More Related