1 / 67

Phase II PI Meeting

DARPA:ARMS. Phase II PI Meeting. Lockheed Martin Advanced Technology Laboratories April 11-13, 2006. Team Overview. Phase II Activity. Extended Team. Gate Test 1.

tillie
Download Presentation

Phase II PI Meeting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DARPA:ARMS Phase II PI Meeting Lockheed Martin Advanced Technology Laboratories April 11-13, 2006

  2. Team Overview Phase II Activity Extended Team Gate Test 1 Michael Price, Ed Mulholland, & Tom Damiano (ATL), Matt Gillen (BBN), Doug Stuart (Boeing), John Cosgrove (Raytheon), Will Otte (Vanderbilt) Gate Test 2 Gautam Thaker (ATL), Raj Rajkumar & Gaurav Bhatia (CMU), Joe Cross (DARPA) Certification Technologies Gautam Thaker (ATL), Chenyang Lu &Yuanfang Zhang, Chris Gill (Washington University St. Louis) Company Resource Management Don Krecker (ATL), Blake Ross (LM), Rose Daley & I-Jeng Weng (APL), Yiaming Je (BBN) CIAO DDS Integration Ming Xiao (Vanderbilt) Technology Transition Patrick Lardieri & Tom Damiano (ATL), Doug Schmidt (Vanderbilt) Resource Allocation and Control Engine (RACE) Tom Damiano (ATL) & Ed Muholland , Jaiganesh Balasubramanian Will Otte, Nilabja Roy, Nishanth Shankaran (Vanderbilt) Project Leadership Tom Damiano, Patrick Lardieri, Gautam Thaker

  3. Technical Accomplishments/Progress Phase II - Gate Test IExperimental Results

  4. Pool-1 Pool-2 Node-Mako Node-Javelin Node-Champion Node-Hogfish Node-Checkmate … … … … … … … … Node-Chaparal Technical Accomplishments/ProgressPhase II - Gate Test 1: CONOPS - Do No Harm Gate Test 1 was conducted using two scenarios: GT-1A and GT-1B Involving two pools, three nodes per pool, and two application strings GT-1A Pre-Condition: The TSCE is operating normally. Scenario: A fault occurs which is detected by MLRM. MLRM begins dynamic reconfiguration when an artificial fault is induced within the MLRM. The MLRM detects the failure to dynamically reconfigure and deploys a feasible static configuration. Post-Condition: The TSCE is operating with the static configuration. GT-1B Pre-Condition: The TSCE is in a MLRM determined configuration following a failure(s). Scenario: A human operator signals the system to ‘fallback’ to a feasible static configuration. Post-Condition: The TSCE is operating with the static configuration. GM3-string1.1 ed-1, ed-2, plan-3, plan-1, cfgop-1, eff-1, eff-7, eff-8, eff-12,eff-13 GM3-string 2.2 smm-1, plan-3, plan-4, plan-1

  5. Technical Accomplishments/ProgressPhase II - Gate Test 1A: Final Experimental Results Run Sequence Plot Lag Plot Time t (ms) Time (ms) Time t-1 (ms) Test Case Normal Probability Plot Histogram Plot Total Test Case Ordered Response Normal Order Statistic Medians Elapsed Time (ms) Resource Allocator Executes … Induced Error Occurs WLGs Started IA Declares Redeployment Complete IA Notified of Redeploy Failure Pool Mgr Receives New Deployment Pool 1.B Fails Pool Failure Detected PM Detects RA Error PM Receives Static Fallback app performs useful work X Scenario Time Line time Data Collection Period Click for Animated Scenario Results on ARMS wiki Code Base:CVS Branch PHASE2_GM1 Environment:Emulab build phase2-gm1-emulholl Time (ms) Outliers are due to Non-RT OS

  6. Technical Accomplishments/ProgressPhase II - Gate Test 1B: Final Experimental Results NP Kills Affected Apps WLGs Started IA Declares Redeployment Complete ASM Suspends Execution of Affected Apps IA Notified of Static Deployment Request System in MLRM Determined State ASM Starts/Resumes new Apps Operator Initiated Fallback PM Receives Static Fallback app performs useful work X Scenario Time Line time Data Collection Period Click for Animated Scenario Time (ms) Note: Timeline includes startup of WLGs Results on ARMS wiki Code Base:CVS Branch PHASE2_GM1 Environment:Emulab build phase2-gm1-emulholl

  7. Technical Accomplishments/ProgressPhase II - Gate Test 1: Gate Test Completed • Does the MLRM deploy a feasible static configuration? YES • Time between the occurrence of the fault and restored operation using the statically defined configuration. Mean = 68ms. GT-1A Metrics: • Does the MLRM deploy a feasible static configuration? YES • Time between the issuance of a command and restored operation using the statically defined configuration. Current Mean = 315.2s. GT-1B Metrics: Gate Test Passed!

  8. Phase II - Gate Test IIExperimental Results

  9. No time limit specified Assumption: feasible allocation to be found within 1 second Acceptable failure probability: Technical Accomplishments/ProgressPhase II - Gate Test 2: Objectives Duration 1 sec 1 yr 5 yrs 10 yrs Probability of meteor strike within duration 1.0E-15 3.1536E-08 1.5768E-07 3.1536E-07 • Provide efficient algorithms for finding a feasible allocation solution when one exists for Bob(X)-scale problems and beyond • Exploit special features of practical aspects of problem in a provable way • Presence of ‘slack’ in the packing • Discrete sizes of objects sizes • Expected number of bins and/or objects • Employ an Ensemble approach - Run multiple heuristics in sequence (or in parallel) • If one heuristic does better in one particular part of the problem space, a solution will be found by one of these heuristics with a very high probability • Framework uses multiple heuristics in sequence until one succeeds or all fail. Sequence ordered based on properties of problem set e.g. Non-zero slack, zero slack, size_ratio, etc

  10. Technical Accomplishments/ProgressPhase II - Gate Test 2: Ensemble & Results 1 second Runtime (ms) Problem Size Only a small # ofsize 3.3cases fail the strict G2 test. • 100,000 tests each • ~0.5% quantization (bin size of 210, object size: multiple of 1) • Problem size(x): x2 bins and x3 objects • Ensemble Heuristics: • WFD (Worst-Fit-Decreasing): spreads objects across bins (load-balancing heuristic) • FFD (First-Fit-Decreasing) • BFD (Best-Fit-Decreasing) • Efficient SubsetSums enumeration • Base SubsetSums with preference for low homogeneity subset sums. • Base SubsetSums with preference for high homogeneity subset sums. • LSUBS (developed by Gautam Thaker ) • Java Kimchee (developed by Dr. Joe Cross)

  11. Technical Accomplishments/ProgressPhase II - Gate Test 2: Ensemble Runs for size=3.3 Heuristic # Successes % Success % Failure WFD 25 0.00025 0.99975 FFD 1997 0.01997 0.8003 BFD 2349 0.02349 0.91451 Efficient Subset Sums 91451 0.91451 0.08549 Subset Sums with Lo Homogeneity 97120 0.9712 0.0288 Subset Sums with Hi Homogeneity 97811 0.97811 0.02189 LSubs with a 1-second timeout 99332 0.99332 0.00668 Kimchee with a 60-second timeout 99992 0.99992 0.00008 Randomly generated 100,000zero-slacktasksets for the most difficultsize_3.3 case. Complete Failure Probability if the heuristics were independent:2.10743E-11

  12. Technical Accomplishments/ProgressPhase II - Gate Test 2: Ensemble Observations • The Ensemble approach is an excellent scheme to adopt. • A collection of heuristics (each of which has < 100% success rates) can yield 100% success rates • Runtimes decrease significantly since the most complex schemes are invoked only when the efficient ones fail. • However, as used, it does NOT meet the strict 1-second time limit we assumed in GT-2 • Can take 20 seconds or longer in the worst case A Practical Assumptions: • Accepting that there is quantization levels

  13. Technical Accomplishments/ProgressPhase II - Gate Test 2: The Quantization Effect Quantization Level ~0.5% 1% 2.5% 5% Acceptance Threshold # of failures with a 1s timeout 8042 237 0 0 - Probability of allocation failure with 1s timeout (p) 8.04*10-6 2.37*10-7 0* 0* 1.0E-15 If an allocation occurs every hour for 1 year, probability of at least one failure with 1s timeout = 1-(1-p)(365*24) 6.93E-02 2.07E-03 0 0 3.1536E-08 If an allocation occurs every hour for 5 years, probability of at least 1 failure = 1-(1-p)(5*365*24) 3.02E-01 1.03E-02 0 0 1.5768E-07 If an allocation occurs every hour for 10 years, probability of at least 1 failure = 1-(1-p)(10*365*24) 5.12E-01 2.05E-02 0 0 3.1536E-07 Bin-Packing Ensemble Failure Probability (from109cases) * A lot more samples are needed to observe this extremely improbable event. Note: Failures occur only for size 3.3

  14. Technical Accomplishments/ProgressPhase II - Gate Test 2: Related Observations • Other failure thresholds considered in practice: • Air Traffic Control availability requirement is 99.99999%  failure probability at any given instant is 10-7. • Hardware / software failure probability is of the order of 10-7 to 10-8 even in reliable systems • In the (very unlikely) event of an allocation failure • critical tasks can be allocated very efficiently first (with non-zero slack >= 20%, even the basic heuristics succeed) • As the 1-second time limit is relaxed, the failure probability decreases exponentially even at low quantization levels • With a 10-second timeout and 0.5% granularity, probability of allocation failure over 1 year drops to 3.36E-04 (from 6.93E-02) • With a 50-second timeout and 0.5% granularity, probability of allocation failure over 1 year drops to 1.24E-07 (from 6.93E-02)

  15. Technical Accomplishments/ProgressPhase II - Gate Test 2: Gate Test Completed • Gate Test II requirements have been satisfied: • Feasible allocation found over independent, large sample, problem sets. • Feasible allocation found in all cases in less than 1 second except size_3_3 where there were a small number of outliers. • Solution was demonstrated for no slack stress cases and more realistic slack cases. • A careful study of impact of distribution of item sizes, item size quantization and overall problem size was completed • Parallel ensemble execution shows a collection of heuristics (each of which has < 100% success rates) can yield 100% success overall • With allowance for quantization, event the most demanding cases can meet the “Meteorite-bound” • Related additional research completed beyond strict requirements: • Extend to multi-dimensional bin-packing • Constraints along each dimension must be satisfied • In the Bob(X) context, the dimension of processor utilization, network utilization, and memory needs are typical. Gate Test Passed!

  16. Node Alive Research and Results

  17. Technical Accomplishments/ProgressNode Failure Detection: UDP Push Model UDP-PUSH Approach to Node-Failure-DetectionAll clients “push” node-alive messages to a monitor at 100HZ • Inter-arrival of messages at the monitor should be 10 msec, confirmed in data that is collected – see graphic. • Node-Alive Monitor “sweeps” over received messages at 50HZ • Monitor declares client node failure after 2 sweeps without receiving a beat from a client • Current testing is at Emulab using up to 20 real nodes and 380 virtual nodes. • Failures are simulated by the clients by suppressing 10 messages at every 60 second mark • Fastest detection is 40 msec, slowest 60 msec – confirmed in current testing (see graphic). • A RT Linux kernel was used to obtain accurate 100HZ and 50HZ loops (Ingo Molnar kernel with real-time patches – version 2.6.15-rt15-smp). • With 380 nodes monitor receives 38,000 messages/sec • Monitor load has been observed to be about 8% • It is estimated that a Hierarchical solution (not yet implemented) will cut this down to < 2% at cost of increase in maximum detection time. • In current tests no UDP packets are lost – no false alarms • Further testing and hierarchical implementation underway 10 msec mean interarrivals for 2.2% of samples exceed theoretical max of 60 msec

  18. Technical Accomplishments/ProgressNode Failure Detection: Performance Acceptable Real-time Performance w/ Up to 1000 Clients • Observed a 5x increase in CPU load when using Linux w/ complete preemption patches • Initiated technical exchanges with RT Linux group (Ted Tso, Ingo Molnar, others.)

  19. Technical Accomplishments/ProgressNode Failure Detection: Performance (continued) SMP Kernel w/ Preemption Patches has 3x Larger Latency

  20. Certification of DRM Systems Technologies and Methods

  21. Technical Accomplishments/ProgressCertification: Problem Problem Space • Certification is a process to verify that system behavior remains within safety and effectiveness parameters • DRE system effectiveness typically requires performing a subset of tasks within temporal bounds or deadlines • In many cases the deadlines apply to an end-to-end string • Dynamic Resource Management generates new system configurations and thereby moves part of the certification process into the system runtime Solution Space • Use scheduability analysis techniques (periodic and strict aperiodic, and transient periodic tasks triggered by a periodic events) to predict whether a particular allocation will meet deadlines while bounding pessimism. • Automate the process of determining feasible and appropriate deployment placements by providing algorithms for release, development, and integration that determine the appropriate allocation, based on the QoS requirements and constraints of the applications and operational strings. • Approach Certification with Simple DRM Capabilities and Full DRM Capabilities

  22. Technical Accomplishments/ProgressCertification: Approach Simple DRM Capabilities • Add constraint capabilities to current allocation methodologies (Phase II) • Mutual Placement Constraints (e.g. replicas) • Attribute Matching Constraints (e.g. OS type) • Introduce multi-dimensional bin packing algorithms (Phase II) • Engineering Support Tools (Phase II) • Provide capabilities to Bob(X) to build a pedigree of cases, while providing from static generation tools (Phase III) • ARMS (Phase III) • Provide RACE Capability for Online Use (Phase III) Full DRM Capabilities • Schedulability Method for simple QoS Allocation (Phase I) • Schedulability Method for ARMS (Phase II) • Constraint Method (Phase III) • Online Capabilities in RACE (Phase III) • Constraint Capable Bin-Packer Planner • Attribute Matching Constraints (e.g. OS type) • Full QoS Allocation (Phase III) • Verification (Phase III) • Delta from static plans w/small perturbations

  23. Technical Accomplishments/ProgressCertification: QoS Driven Allocation Tools Capabilities Runtime • Possible use Offline and Online • Offline tool-suite with integrated algorithm support for generation of static deployment plans and research into new algorithms. • Pluggable algorithms – usable “as-is” both online and offline; I.e. the same components run online (within RACE and offline within the tool-suite). • Statistical history capture • Output adaptation to accommodate varying needs for deployment configuration file generation • Support for ensemble algorithm runs • Flexible test input distribution generation for validating algorithms and Extensions for Scheduability Analysis • Variations of simple bin packing and heuristics-based algorithms for more challenging (e.g. zero slack) problems. • Multi-Dimensional variations on allocation algorithms – including 3-D bin-packing along CPU, memory, and network bandwidth dimensions. • Constraint-Based allocation • Incorporation of Scheduability Offline Deployment Configuration

  24. Technical Accomplishments/ProgressTowards Certification: Aperiodic Tasks - Overview • Schedulability analyses for end-to-end aperiodic tasks with hard deadlines • 1st Approach: Aperiodic Utilization Bound (AUB) - Online • 2nd Approach: Deferrable Server (DS) - Offline • Accomplishments • Implemented AUB and DS schedulability analyses • Developed heuristics for tuning Deferrable Server • Compared two approaches via numerical studies • Implementation on TAO federated event channel • The first DS implementation in middleware • Online admission control based on AUB • Empirical results on a Linux cluster • Validation of schedulability analysis • Run-time overhead • On-going • Developing deferrable server mechanisms in TAO’s federated event channel • Validating schedulability analyses via empirical studies on TAO

  25. Technical Accomplishments/ProgressTowards Certification: Deferrable Server Overview • Incorporate aperiodic tasks in periodic scheduling • Server: a periodic task responsible for processing aperiodic requests. • Budget: maximum time the server can run in a period • Algorithm • Server is suspended when its budget runs out • Bound aperiodic tasks’ impact on periodic tasks • Budget is replenished in the beginning of each period Implementation • Challenge: Implement bandwidth preserving servers on top of priority-based operating systems. • Solution • Server thread processes aperiodic events (2nd highest priority) • Budget thread manages the budget and controls the execution of server threads (highest priority)

  26. Technical Accomplishments/ProgressTowards Certification: Deferrable Server Validation • Correctness: No schedulable task sets had deadline misses. • Pessimism: Some of the unschedulable task sets also met deadlines. 4 processors; 4 aperiodic tasks+ 8 periodic tasks

  27. Technical Accomplishments/ProgressTowards Certification: Deferrable Server Overhead • Budget manager: < 89us per server period • Server thread: < 159us per aperiodic subtask

  28. Technical Accomplishments/ProgressTowards Certification: Admission Control (AC) • Central admission controller for end-to-end tasks. • Admission test • If the system remains within the feasible region • admit the new task into the system • increase the synthetic utilization • Decrement synthetic utilization • at the deadlines of aperiodic tasks • [resetting rule] when CPU idles AC Policies • Soft Tasks • Send an event to notify the central admission controller • Hold the task in a waiting queue and waits for the reply • Hard Tasks • Release immediately, then notify AC • AC may eject soft periodic tasks when it receives the notification. • Aperiodic Tasks • Admission test for every job • CPU idles  idle thread reports the departed aperiodic tasks to AC • Periodic Tasks • Admit once and maintains reservation for a task

  29. Technical Accomplishments/ProgressTowards Certification: AC Latency for Soft Tasks Round-trip latency for admitting a soft task Hard tasks are admitted immediately

  30. Technical Accomplishments/ProgressTowards Certification: AC – Admission Ratio • Online admission control significantly outperformed offline analysis. • All task sets are unschedulable under offline analysis • Resetting significantly increased the number of admitted tasks. • 3 processors + 1 AC processor • 4 soft aperiodic tasks and 5 soft periodic tasks

  31. RACE Workshop and Demonstration

  32. Flat Deployment Plan Flat Deployment Plan (modified) RACE Demo and WorkshopTool Chain: Demonstration Highlights The RACE demonstration is composed of three scenarios. These scenarios involve RACE (control and allocation), DAnCE, PICML, CUTS, CoWorkEr and the BMW elements. Hierarchical Plan DAnCE PICML RACE Many of the initial capabilities being shown will support GT-4, or are extendable to do so. Scenario 1 - Demonstrates RACE Control by reacting to deadline misses in a critical path modeled into the RT1H operation string. The critical path exceeds its EED threshold due to the introduction of a competing operation string that consumes excessive CPU. Scenario 2 - Demonstrates the ability of the tool chain to handle Shared Components. Two operation strings are deployed with shared components between them. After deployment a string is torn down to show the other (involving the shared component) is still operational. Scenario 3 - Demonstrates FT extensions to PICML to capture fault tolerant requirements. The concepts of SRG and FOU are shown and an integrated interpreter is used to run an offline constraint-based algorithm for replica placement.

  33. RACE Demo and WorkshopTool Chain: New Capability Highlights The RACE demonstration highlights many of the new capabilities developed for the RACE framework and related tool chain, many of which are intended to support GT-4. * Importance Attr. (supports GT-4) * Static and Dynamic Plans (supports GT-4) * Component Dynamic Placeability Attr. (supports GT-4) * Shared Components (supports GT-4) * Hierarchical Descriptors (supports GT-4) * PICML Modifications (supports GT-4) o FT Elements o Shared Components o Qos Attributes * DaNCE Modifications (supports GT-4) o ReDAC o Priority Control o Component-Process Mapping o Shared Component Support * Web and Interactive Input Adapters * RACE Control (supports GT-4) o EED Monitoring o Reactive control of OS priority based on importance * WLG-2 Capabilities o Code Generation o BMW Integration o BDC Integration * Ensemble Planner * Target Manager * Fault Model Elements o Failover Unit o Replication Group o CCM IOGRs o Shared Risk Group o Constraint-Based Allocation # metrics (e.g. distance, co-failure) # integrated in interpreter - offline analysis # motivates contraint-based allocation

  34. RACE Demo and WorkshopTool Chain: Future Capabilities The RACE framework and tool chain will require additional capabilities to support the current GT-4CONOPS. * RACE Follow-on work o Plan State (supports GT-4) + ReDAC Integration + (Re)plan on Importance + Include FT simplex deployments + Integration of Node Alive Solution o Events on plan progress and status (supports GT-4) o Warfighter Value/Importance Constraints on Placement (supports GT-4) o Submission of Multiple Plans Simultaneously (supports GT-4) * Multi-D Planner o Multiple Heuristics: FFD, WFD, BFD, Efficient Subset Sums o Modeled 3 dimensions: CPU Utilization, Memory, Network Bandwidth  Algorithms available and initial development done.

  35. Controller ResearchRACE Control Research

  36. Technical Accomplishments/ProgressRACE Control: Flexible Maximum Urgency First (FMUF) • Task model • Soft, end-to-end deadlines • Two types of tasks: critical tasks and non-critical tasks • Goals • Performance isolation: protect critical tasks against disturbance from non-critical ones • Minimize deadline misses: improve overall performance • Handle uncertainties and dynamics • Task arrival/departure • Fluctuation in execution times • Practical, application-transparent adaptation • Actuator: Priority adjustment • Sensor: CPU utilization, deadline miss • Planned for future RACE implementation

  37. Technical Accomplishments/ProgressRACE Control: The MUF Approach • Two priority classes • Each class is scheduled by a real-time policy (RMS, EDF) • Critical tasks  high-priority class • Feedback control • Dynamically change the priority-class of non-critical tasks based on deadline misses in the high-priority class • No miss: Non-critical tasks  high-priority class • Miss: Non-critical tasks  low-priority class • Avoid oscillation based on measured CPU utilization • Maximize #tasks in the high-priority class without causing deadline misses in that class

  38. Phase III Future Work

  39. Phase III IdeasPossible Phase III Research Areas • Scheduability Analysis (integrated with the allocation/placement problem) • Multi-Dimensional Allocation • Constraint-Based Allocation/Placement • Certifiability of these approaches • Including a framework for testing and researching new algorithms • Verifying allocations meet certification constraints (e.g. differ from a static plan in a specified manner or according to specified rules) • Offline and Online capability for this analysis and planning • Offline tool-suite with integrated algorithm support for generation of static deployment plans and research into new algorithms. • Pluggable algorithms usable "as-is" both online and offline; I.e. the same components run online (within RACE and offline within the tool-suite).

  40. Backup SlidesMain Presentation Support Slides

  41. Technical Accomplishments/ProgressPhase II - Gate Test 1A: Test Scenario Pool-1.A Pool-1.B Primary WLG Mako Javelin eff.7 eff.12 eff.13 eff.7 smm.1 plan.1 Redeployed WLG Hogfish Champion plan.3 eff.8 plan.4 ed.1 Legend Chaparal Checkmate cfgop.1 shared Fault Detected cfgop.1 eff.1 plan.3 ed.2 MLRM X • TSCE is operating normally – as configured by MLRM Dynamic Allocation • A fault occurs and is detected by MLRM • MLRM attempts dynamic re-allocation • An artificial error causes MLRM dynamic allocation to fail • MLRM deploys a feasible static allocation Click to return to results slide.

  42. Technical Accomplishments/ProgressPhase II - Gate Test 1B: Test Scenario Pool-1.A Pool-1.B Primary WLG Mako Javelin eff.7 eff.12 eff.1 cfgop.1 plan.12 smm.1 Redeployed WLG Hogfish Champion plan.3 eff.8 plan.4 ed.1 Legend Chaparal cfgop.1 Checkmate shared eff.7 MLRM eff.1 plan.3 ed.2 Static Allocation Request • TSCE is operating normally – as configured by MLRM • Operator elects to fall back to a feasible static allocation • MLRM tears down existing dynamically allocated strings • MLRM deploys a feasible static allocation Click to return to results slide.

  43. Technical Accomplishments/ProgressCertification Model: Constrained Perturbation Dynamic Plans (DRM generated plans) Template (analogous to a static plan) class transformation relation-pair φR  from plan class constraints, parameters φ φR-1 inverse verification class of plan certifiably constraint-isomorphic legal dynamic domain for class of plan class transformation relation-pair ΨR  from plan class ΨR-1 constraints, parameters class of plan Ψ inverse verification certifiably constraint-isomorphic legal dynamic domain for class of plan traditionally certifiable transformation domain DRM certification gauntlet feasibly allocatable feasibly scheduable isomorphic transformation Φ П boolean certification meterics Γ

  44. DARPA:ARMS RACE Demonstration and Workshop Lockheed Martin Advanced Technology Laboratories and Vanderbilt University

  45. Flat Deployment Plan Flat Deployment Plan (modified) RACE Demo and WorkshopTool Chain Demonstration The RACE demonstration is composed of three scenarios. These scenarios involve RACE (control and allocation), DAnCE, PICML, CUTS, CoWorkEr and the BMW elements. Hierarchical Plan DAnCE PICML RACE Many of the initial capabilities being shown will support GT-4, or are extendable to do so. Scenario 1 - Demonstrates RACE Control by reacting to deadline misses in a critical path modeled into the RT1H operation string. The critical path exceeds its EED threshold due to the introduction of a competing operation string that consumes excessive CPU. Scenario 2 - Demonstrates the ability of the tool chain to handle Shared Components. Two operation strings are deployed with shared components between them. After deployment a string is torn down to show the other (involving the shared component) is still operational. Scenario 3 - Demonstrates FT extensions to PICML to capture fault tolerant requirements. The concepts of SRG and FOU are shown and an integrated interpreter is used to run an offline constraint-based algorithm for replica placement.

  46. RACE Demo and WorkshopPhysical Demo Setup: ISIS Lab ISIS Lab Local Demo Laptop PICML RACE Internet DAnCE RACE Demo GUI wiki.isis.vanderbilt.edu/support/isislab.htm

  47. RACE Control Critical Path End-to-End Deadline Monitoring and Reactive Control

  48. RACE Demo and WorkshopRACE Controller: RACE Components Hierarchical Packages & Deployment Plans RACE Controller Receives plans from the RACE allocation planners. Key Elements: Target Manager, Race Controller, CUTS BDC, and DAnCE

  49. RACE Demo and WorkshopRACE Controller: Scenario One First Demo Scenario: 1. Deploy RT1H Operational String, which has an EED requirement specified. View post RACE deployment. 2. Monitor EED 3. Deploy Competing (CPU Hog) Hog_String. View post RACE deployment RACE Demo GUI Deployment After RACE Processing 4. Monitor EED Miss 5. Observe RACE Reactive Control All Deployments occur through DAnCE

More Related