1 / 40

Response-Time Analysis for globally scheduled Symmetric Multiprocessor Platforms

Response-Time Analysis for globally scheduled Symmetric Multiprocessor Platforms. RETIS Lab. Real-Time Systems Laboratory. RTSS’07. Marko Bertogna, Michele Cirinei. Overview. Multiprocessor global scheduling Existing schedulability tests for global schedulers Limits of existing techniques

minna
Download Presentation

Response-Time Analysis for globally scheduled Symmetric Multiprocessor Platforms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Response-Time Analysis for globally scheduled Symmetric Multiprocessor Platforms RETIS Lab Real-Time Systems Laboratory RTSS’07 Marko Bertogna, Michele Cirinei

  2. Overview • Multiprocessor global scheduling • Existing schedulability tests for global schedulers • Limits of existing techniques • Extending Response Time Analysis to multiprocessor systems: • Generic work-conserving schedulers • Global EDF • Global FP • Simulations and conclusions

  3. Global scheduling on SMP CPU1 t1 Global queue (ordered according to a given policy) CPU2 t2 t5 t4 t3 t2 t1 CPU3 t3 The first m tasks are scheduled upon the m CPUs

  4. Global scheduling on SMP t5 t4 t3 t2 t1 CPU1 t1 Global queue (ordered according to a given policy) CPU2 t2 t5 t4 t3 t2 t1 CPU3 t4 t3 When a task finishes its execution, the next one in the queue is scheduled on the available CPU

  5. Global scheduling on SMP t5 t4 t3 t2 t1 t5 t3 t4 t2 t1 CPU1 t1 Global queue (ordered according to a given policy) CPU2 t2 t3 t3 CPU3 t3 t4 When a higher priority task arrives, it preempts the task with lowest priority among the executing ones

  6. Global scheduling on SMP t5 t4 t3 t3 t2 CPU1 t4 t1 Global queue (ordered according to a given policy) CPU2 t2 t5 t4 t4 t3 t3 t2 t1 Task t4“migrated” from CPU3 to CPU1 CPU3 t4 t3 When another task ends its execution, the preempted task can resume its execution

  7. Global scheduling properties CPU1 t1 CPU1 t1 t5 t4 t1 CPU2 t2 t5 t4 t3 t2 t1 CPU2 t2 t3 t2 CPU3 CPU3 t3 • Single system-wide queue instead of multiple per-processor queues: Global scheduling Partitioned approach

  8. Global scheduling advantages Advantages of global schedulers w.r.t. partitioning algorithms: • Load automatically balanced • More efficient handling of overload conditions • More flexible reclaiming of unused bandwidth • Easier re-scheduling • Lower number of preemptions (but need to limit migration cost)

  9. On-line scheduling problem for global schedulers • Limited performances of classical algorithms (EDF, RM, etc.) • Pfair optimal only for implicit deadlines (Di = Ti) • No optimal algorithm known for more general task models • Many hybrid solutions proposed (EDF-US, RM-US, fpEDF, EDZL, etc.)

  10. Schedulability problem for global scheduling • All known exact tests are computationally intractable for non-trivial task sets • Many different sufficient schedulability tests • Big gap from necessary and sufficient conditions  difficult to “compare” the various scheduling policies • Need to reduce this schedulability gap

  11. Considered task model • Periodic and sporadic tasks: = (Ci,Di,Ti) • Constrained deadlines: Di ≤ Ti • Platform composed by m identical processors • Work-conserving global schedulers Work-conserving scheduling policy: a processor is never idled when a task is ready to execute.

  12. Existing schedulability tests for work-conserving global policies • Fixed task priority: • Andersson et al.: utilization bound (RTSS’01), later improved and extended for constrained deadlines by Bertogna et al. (OPODIS’05) • Baker: demand-based polynomial test (RTSS’03, JTCC’06 and JEC’07) • Fisher, Baruah: load-based pseudo-polynomial tests (IASTED’06, OPODIS’07, ICDCN’08) • Dynamic task priority: • Goossens et al.: EDF utilization bound (RTSJ’03) (Utot ≤ m(1-Umax)+Umax)) later extended for arbitrary deadlines • Baker: demand-based polynomial test (RTSS’03 and TPDS’07) • Bertogna et al.: demand-based ploynomial test (ECRTS’05) • Fisher, Baruah: load-based tests (ECRTS’07, RTSS’07)

  13. Existing schedulability tests for work-conserving global policies • Hybrid algorithms: • Srinivasan, Baruah: bound for EDF-US (IPL’02) generalized by Baruah’s bound for fpEDF (Utot ≤ ) valid for implicit deadlines (TC’04) • Cirinei, Baker: EDZL demand-based polynomial test (ECRTS’07) • Dynamic job priority: • Pfair (Utot ≤ m, valid only for implicit deadlines) • Andersson, Tovar: EKG for implicit deadlines (RTCSA’06) • Feasibility results: • Fisher, Baruah: load based pseudo-polynomial test (ECRTS’06 improved in ECRTS’07) • Baker, Cirinei: load-based pseudo-polynomial necessary test (RTSS’06)

  14. Our approach • All existing schedulability tests have poor performances • A better analysis of worst-case situations is needed • Refine the estimation of the maximum interference a task can impose on other taks • Apply Response Time Analysis (RTA) to multiprocessor systems; then check if WCRTi ≤ Di for all tasks

  15. RTA for Uniprocessors The synchronous arrival of all tasks is a critical instant: we can compute the worst-case interferences considering that situation. … … … • Synchronous arrivals • Jobs released as soon as permitted

  16. RTA for Uniprocessors • For FP, the worst-case response time of a task is given by the first instance released at a critical instant • For EDF, it is given by an instance in a busy interval starting with a critical instant With these observations it is possible to compute the WCRT of all tasks. Example: for FP, the WCRT of a task k is given by the fixed point of:

  17. And for Multiprocessor? For global schedulers, things are much more difficult: • The synchronous arrival of tasks doesn’t represent a critical instant. • Difficult to find a worst-case situation in which to compute the maximum response times. • Need to introduce some pessimistic assumptions to make things easier

  18. Introducing the interference Ik1 Ik3 Ik6 Ik3 CPU3 tk Ik5 Ik2 Ik5 Ik2 CPU2 tk tk Ik7 Ik4 Ik3 Ik8 CPU1 rk+Rk rk Ik= Total interference suffered by task tk Iki= Interference of task ti on task tk

  19. Limiting the interference Ik1 Ik3 Ik6 Ik3 CPU3 tk Ik5 Ik2 Ik5 Ik2 CPU2 tk tk Ik7 Ik4 Ik3 Ik8 CPU1 rk+Rk rk IDEA: It is sufficient to consider at most the portion Rk-Ck+1 of each term Ii,k in the sum It can be proved that WCRTk is given by the fixed point of:

  20. Bounding the interference Exactly computing the interference is complex Pessimistic assumptions: • Bound the interference of a task with the workload: . • Use an upper bound on the workload.

  21. Improving the estimation of the workload using slack values Consider a situation in which: • The first job executes as close as possible to its deadline • Successive jobs execute as soon as possible εi Di Ti Ci Ci Ci Ci L where: (# jobs excluded the last one) (last job)

  22. RTA for generic global schedulers • An upper bound on the WCRT of task k is given by the fixed point of Rk in the iteration: • The slack of task k is at least: Rk Sk

  23. Improving the estimation of the workload using slack values Consider a situation in which: • The first job executes as close as possible to its deadline • Successive jobs execute as soon as possible εi Di Ti Ci Ci Ci Ci L where: (# jobs excluded the last one) (last job)

  24. Improving the estimation of the workload using slack values Consider a situation in which: • The first job executes as close as possible to its deadline • Successive jobs execute as soon as possible Di Ti Si Ci Ci Ci Ci L where:

  25. RTA for generic global schedulers • An upper bound on the WCRT of task k can be given by the fixed point of Rk in the iteration: 1. 2. If a fixed point Rk ≤ Dk is reached for every task k in the system, the task set is schedulable with any work-conseving global scheduler.

  26. Iterative schedulability algorithm • All slacks initialized to zero • Compute slack lower bound for tasks 1,…,n • if higher than old value  update slack bound • If lower, do nothing • If all tasks have a positive slack lower bound  return success • If no slack has been updated for tasks 1,…,n  return fail • Otherwise, return to point 2

  27. Refining the analysis for particular policies • We can exploit further information on the scheduling algorithm in use to tighten the bounds on interference and workload • Refined analysis for: • Fixed Priority • EDF

  28. RTA for Fixed Priority • The interference on higher priority tasks is always null: • For a system scheduled with FP, an upper bound on the WCRT of task k can be given by the fixed point of Rk in the iteration: 1. 2.

  29. RTA for EDF with: and: • Still valid the bound: • A different bound can be derived analyzing the worst-case workload in a situation in which: • The interfering and interfered tasks have a common deadline • All jobs execute as late as possible Si Di Ti Ci Ci Ci Dk

  30. RTA for EDF • For a system scheduled with EDF, an upper bound on the WCRT of task k can be given by the fixed point of Rk in the iteration: 1. 2. If a fixed point Rk ≤ Dk is reached for every task k in the system, the task set is schedulable with global EDF.

  31. Complexity • Pseudo-polynomial complexity. • Depends on the order in which the slack lower bounds are updated. • We verified the schedulability of millions of task sets in a few minutes on a normal device. • Test particularly fast for Fixed Priority systems: at most one slack update per task, if slacks are updated in decreasing priority order.

  32. Experimental results for EDF Total task sets generated task sets task sets • 2 processors • Constrained • deadlines • 1.000.000 • task sets • generated • Our test is • constantly • superior at all • utilizations RTA-EDF Goossens et al.’03 Baker et al.’07 Bertogna et al.’05 our test Improvement over existing solutions Task set utilization

  33. Experimental results for FP Total task sets generated task sets task sets • 2 processors • Constrained • deadlines • 1.000.000 • task sets • generated • Our test is • constantly • superior at all • utilizations RTA-FP Bertogna et al.’05 Baker et al.’07 Density bound our test Task set utilization

  34. FP vs EDF • 4 processors • Constrained • deadlines • 1.000.000 • task sets • generated • our FP test is • constantly • superior to all • tests at every • utilization generated task sets Total task sets task sets RTA-FP Baker et al.’07 RTA-EDF Goossens et al.’03 our FP test our EDF test Task set utilization

  35. Evaluations • Our test behaves better than any existing polynomial and pseudo-polynomial schedulability test in literature • However, it doesn’t dominate all of them • Resource augmentation bound needed • The test is also sustainable (probably a non-trivial resource augmentation bound can be achieved)

  36. Conclusions • Multiprocessor Real-Time systems are a promising field to explore. • Still few existing results far from tight conditions. • We contributed filling this gap. • Future work: • Find tighter schedulability tests. • Use our techniques to analyze the efficiency of other scheduling algorithms (EDZL, EDF-US, FP-DS, etc). • Take into account exclusive resources access. • Integrate into Resource Reservation framework.

  37. Marko BertognaPhD studentmarko@sssup.it RETIS Lab Real-Time Systems Laboratory Thank you

  38. Moore’s law effects Pentium Tejas cancelled! Power (W) Nuclear Reactor P4 STOP P3 Pentium P1 P2 Hot-plate 286 486 8086 386 8085 8080 8008 4004 Year Clock speed limited to less than 4 GHz Leakage current intolerable @ 90nm

  39. Motivations • Improve computing performances at reasonable power consumption. • Multiprocessor-based architectures: • High-level computing: Intel’s Pentium D, Core 2 Duo, Itanium and Xeon; AMD’s Opteron, Quad FX and Athlon64 X2; etc. • Embedded market: TI’s OMAP, NXP’s Nexperia, STM’s Nomadik, ARM’s MPCore, Sony-IBM-Toshiba’s Cell, and many others. • How to program these devices?

  40. Multiprocessor scheduling anomalies Utot 1 T • Scheduling problem is in general NP-hard. • Schedulability problem is as well NP-hard. • Dhall’s effect significantly degrades perfromances of classical scheduling algorithms. • Synchronous instant is not “critical”. • Only sufficient schedulability conditions. DEADLINE MISS

More Related