Developing dependable systems by maximizing component diversity and fault tolerance
Download
1 / 32

Developing Dependable Systems by Maximizing Component Diversity and Fault Tolerance - PowerPoint PPT Presentation


  • 121 Views
  • Uploaded on

Developing Dependable Systems by Maximizing Component Diversity and Fault Tolerance. Jeff Tian, Suku Nair, LiGuo Huang, Nasser Alaeddine and Michael Siok Southern Methodist University

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Developing Dependable Systems by Maximizing Component Diversity and Fault Tolerance' - parmida


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Developing dependable systems by maximizing component diversity and fault tolerance

Developing Dependable Systems by Maximizing Component Diversity and Fault Tolerance

Jeff Tian, Suku Nair, LiGuo Huang,

Nasser Alaeddine and Michael Siok

Southern Methodist University

US/UK Workshop on Network-Centric Operation and Network Enabled Capability, Washington, D.C., July 24-25, 2008


Outline
Outline Diversity and Fault Tolerance

  • Overall Framework

  • External Environment Profiling

  • Component Dependability:

    • Direct Measurement and Assessment

    • Indirect Assessment via Internal Contributor Mapping

    • Value Perspective

  • Experimental Evaluation

    • Fault Injection for Reliability and Fault Tolerance

    • Security Threat Simulation

  • Summary and Future Work

US/UK NCO/NEC Workshop


Overall framework
Overall Framework Diversity and Fault Tolerance

  • Systems made up of different components

  • Many factors contribute to system dependability

    • Our focus: Diversity of individual components

  • Component strength/weakness/diversity:

    • Target: Different dependability attributes and sub-attributes

    • External reference: Operational profile (OP)

    • Internal assessment: Contributors to dependability

    • Value perspective: Relative importance and trade-off

  • Maximize diversity => Maximize dependability

    • Combine strength

    • Avoid/complement/tolerate flaws/weaknesses

US/UK NCO/NEC Workshop


Overall framework 2
Overall Framework (2) Diversity and Fault Tolerance

  • Diversity: Four Perspectives

    • Environmental perspective: Operational profile (OP)

    • Target perspective: Goal, requirement

    • Internal contributor perspective: Internal characteristics

    • Value perspective: Customer

  • Achieving diversity and fault tolerance:

    • Component evaluation matrix per target per OP

    • Multidimensional evaluation/composition via DEA (Data Envelopment Analysis)

    • Internal contributor to dependability mapping

    • Value-based evaluation using single objective function

US/UK NCO/NEC Workshop


Terminology
Terminology Diversity and Fault Tolerance

  • Quality and dependability are typically defined in terms of conformance to customer’s expectations and requirements

    • Key concepts: defect, failure, fault, and error

    • Dependability: the focus in this presentation

    • Key attributes: reliability, security, etc.

  • Defect = some problem with the software

    • either with its external behavior

    • or with its internal characteristics

US/UK NCO/NEC Workshop


Failure fault error
Failure, Fault, Error Diversity and Fault Tolerance

  • IEEE STD 610.12 terms related to defect:

    • Failure: The inability of a system or component to perform its required functions within specified requirements

    • Fault: An incorrect step, process, or data definition in a computer program

    • Error: A human action that produces an incorrect result

  • Errors may cause faults to be injected into the software

  • Faults may cause failures when the software is executed

US/UK NCO/NEC Workshop


Reliability and other dependability attributes
Reliability and Diversity and Fault ToleranceOther Dependability Attributes

  • Software reliability = the probability for failure-free operation of a program for a specified time under a specified set of operating conditions (Lyu, 1995; Musa et al., 1987)

  • Estimated according to various model based on defect and time/input measurements

  • Standard definitions for other dependability attributes, such as security, fault tolerance, availability, etc.

US/UK NCO/NEC Workshop


Outline1
Outline Diversity and Fault Tolerance

  • Overall Framework

  • External Environment Profiling

  • Component Dependability:

    • Direct Measurement and Assessment

    • Indirect Assessment via Internal Contributor Mapping

    • Value Perspective

  • Experimental Evaluation

    • Fault Injection for Reliability and Fault Tolerance

    • Security Threat Simulation

  • Summary and Future Work

US/UK NCO/NEC Workshop


Diversity environmental perspective
Diversity: Environmental Perspective Diversity and Fault Tolerance

  • Dependability defined for a specific environment

  • Stationary vs dynamic usage environments

    • Static, uniform, or stationary (reached an equilibrium)

    • Dynamic, changing, evolving, with possible unanticipated changes or disturbances

  • Single/overall OP for former category

    • Musa or Markov variation

    • Single evaluation result possible per component per dependability attribute: e.g., component reliability R(i)

  • Environment Profiling for Individual Components

    • Environmental snapshots captured in Musa or Markov Ops

    • Evaluation matrix (later)

US/UK NCO/NEC Workshop


Operational profile op
Operational Profile (OP) Diversity and Fault Tolerance

  • Operational profile (OP) is a list of disjoint set of operations and their associated probabilities of occurrence (Musa 1998)

  • OP describes how users use an application:

    • Help guide the allocation of test cases in accordance with use

    • Ensure that the most frequent operations will receive more testing

    • As the context for realistic reliability evaluation

    • Other usages, including diversity and internal-external mapping in this presentation

US/UK NCO/NEC Workshop


Markov chain usage model
Markov Chain Usage Model Diversity and Fault Tolerance

  • Markov chain usage model is a set of states, transitions, and the transition probabilities

    • As an alternative to Musa (flat) OP

    • Each link has an associated probability of occurrence

    • Models complex and/or interactive systems better

  • Unified Markov Models(Kallepalli and Tian, 2001; Tian et al., 2003):

    • Collection of Markov Ops in a hierarchy

    • Flexible application in testing and reliability improvement

US/UK NCO/NEC Workshop


Operational profile development standard procedure
Operational Profile Development: Diversity and Fault ToleranceStandard Procedure

  • Musa’s steps (1998) for OP construction:

    • Identify the initiators of operations

    • Choose a representation (tabular or graphical)

    • Create an operations “list”

    • Establish the occurrence rates of the individual operations

    • Establish the occurrence probabilities

  • Other variations

    • Original Musa (1993): 5 top-down refinement steps

    • Markov OP (Tian et al): FSM then probabilities based on log files

US/UK NCO/NEC Workshop


Ops for composite systems
OPs for Composite Systems Diversity and Fault Tolerance

  • Using standard procedure whenever possible

    • For overall stationary environment

    • For individual component usage => component OP

    • For dynamic environment:

      • Snapshot identification

      • Sets of OPs for each snapshot

      • System OP from individual component OPs

  • Special considerations:

    • Existing test data or operational logs can be used to develop component OPs

    • Union of component OPs => system OP

US/UK NCO/NEC Workshop


Op and dependability evaluation
OP and Dependability Evaluation Diversity and Fault Tolerance

  • Some dependability attributes defined with respect to a specific OP: e.g., reliability

    • For overall stationary environment: direct measurement and assessment possible

    • For dynamic environment: OP-reliability pairs

    • Consequence of improper reuse due to different OPs (Weyuker 1998)

  • From component to system dependability:

    • Customization/selection of best-fit OP for estimation

    • Compositional approach (Hamlet et al, 2001)

US/UK NCO/NEC Workshop


Outline2
Outline Diversity and Fault Tolerance

  • Overall Framework

  • External Environment Profiling

  • Component Dependability:

    • Direct Measurement and Assessment

    • Indirect Assessment via Internal Contributor Mapping

    • Value Perspective

  • Experimental Evaluation

    • Fault Injection for Reliability and Fault Tolerance

    • Security Threat Simulation

  • Summary and Future Work

US/UK NCO/NEC Workshop


Diversity target perspective
Diversity: Target Perspective Diversity and Fault Tolerance

  • Component Dependability:

    • Component reliability, security, etc. to be scored/evaluated

    • Direct Measurement and Assessment

    • Indirect Assessment (later)

  • Under stationary environment:

    • Dependability vector for each component

    • Diversity maximization via DEA (data envelopment analysis)

  • Under dynamic environment:

    • Dependability matrix for each component

    • Diversity maximization via extended DEA by flattening out the matrix

US/UK NCO/NEC Workshop


Diversity maximization via dea
Diversity Maximization via DEA Diversity and Fault Tolerance

  • DEA (data envelopment analysis):

    • Non-parametric analysis

    • Establishes a multivariate frontier in a dataset

    • Basis: linear programming

    • Applying DEA

      • Dependability attribute frontier

      • Illustrative example (right)

      • N-dimensional: hyperplane

US/UK NCO/NEC Workshop


Dea example

Inputs Diversity and Fault Tolerance

Outputs

Efficiency

Output/Input

  • Software Reliability At Release

  • Defect Density after test

  • Software Productivity

  • Labor hours

  • Software Change Size

DEA Example

  • Lockheed-Martin software project performance with regard to selected metrics and production efficiency model

    • Measures efficiencies of decision making units (DMU) using weighted sums of inputs and weighted sums of outputs

    • Compares DMUs to each other

    • Sensitivity analysis affords study of non-efficient DMUs in comparison

    • BCC VRS Model used in initial study

US/UK NCO/NEC Workshop


Dea example 2
DEA Example (2) Diversity and Fault Tolerance

  • Using production efficiency model for Compute-Intensive dataset group

    • Ranked set of projects

    • Data showing distance and direction from efficiency frontier

US/UK NCO/NEC Workshop


Diversity internal perspective
Diversity: Internal Perspective Diversity and Fault Tolerance

  • Component Dependability:

    • Direct Measurement and Assessment: might not be available, feasible, or cost-effective

    • Indirect Assessment via Internal Contributor Mapping

  • Internal Contributors:

    • System design, architecture

    • Component internal characteristics: size, complexity, etc.

    • Process/people/other characteristics

    • Usually more readily available data/measurements

  • Internal=>External mapping

    • Procedure with OP as input too (e.g., fault=>reliability)

US/UK NCO/NEC Workshop


Example fault failure mapping for dynamic web applications
Example: Fault-Failure Mapping for Dynamic Web Applications Diversity and Fault Tolerance

US/UK NCO/NEC Workshop


Web example fault failure mapping
Web Example: Fault-Failure Mapping Diversity and Fault Tolerance

  • Input to analysis (and fault-failure conversion):

    • Anomalies recorded in web server logs (failure view)

    • Faults recorded during development and maintenance

    • Defect impact scheme (weights)

    • Operational profile

  • Product “A” is an ordering web application for telecom services

    • Consists of hundreds of thousands of lines of code

    • Running on IIS 6.0 (Microsoft Internet Information Server),

    • Process couple of millions requests per day

US/UK NCO/NEC Workshop


Web example fault failure mapping step 1
Web Example: Fault-Failure Mapping (Step 1) Diversity and Fault Tolerance

  • Pareto chart for the defect classification of product “A”

  • The top three categories represent 66.26% of the total defect data

US/UK NCO/NEC Workshop


Web example fault failure mapping steps 4 5
Web Example: Fault-Failure Mapping (Steps 4 & 5) Diversity and Fault Tolerance

  • OP for product “A” and the corresponding numbers of transactions.

US/UK NCO/NEC Workshop


Web example fault failure mapping step 6
Web Example: Fault-Failure Mapping (Step 6) Diversity and Fault Tolerance

  • Using the number of transactions calculated from OP and the defined fault impact schema, we calculated the fault exposure or corresponding potential failure frequencies

US/UK NCO/NEC Workshop


Web example fault failure mapping step 7
Web Example: Fault-Failure Mapping Diversity and Fault Tolerance(Step 7)

US/UK NCO/NEC Workshop


Web example fault failure mapping result analysis
Web Example: Fault-Failure Mapping (Result Analysis) Diversity and Fault Tolerance

  • A large number of failures were caused by a small number of errors with high usage frequencies

  • Fixing faults with a high usage frequency and a high impact could achieve better efficiency in reliability improvement

    • By fixing the top 6.8% faults, the total failures were reduced by about 57%

    • Similarly, 10% -> 66%, 15%->71%, 20%->75%, for top-faults induced failure reduction

  • Defect data repository and web server log recorded failures have insignificant overlap => both are needed for effective reliability improvement

US/UK NCO/NEC Workshop


Diversity value perspective
Diversity: Value Perspective Diversity and Fault Tolerance

  • Component Dependability Attribute:

    • Direct Measurement and Assessment: might not capture what customers truly care about

    • Different value attached to different dependability attributes

  • Value-based software quality analysis:

    • Quantitative model for software dependability ROI analysis

    • Avoid one-size-fits-all

  • Value-based process: experience at NASA/USC (Huang and Boehm) extend to dependability

  • Mapping to value-based perspective more meaningful to target customers

US/UK NCO/NEC Workshop


Value maximization
Value Maximization Diversity and Fault Tolerance

  • Single objective function:

    • Relative importance

    • Trade-off possible

    • Quantification scheme

    • Gradient scale to selecte component(s)

    • Compare to DEA

    • General cases

      • Combination with DEA

      • Diversity as a separate dimension possible

US/UK NCO/NEC Workshop


Outline3
Outline Diversity and Fault Tolerance

  • Overall Framework

  • External Environment Profiling

  • Component Dependability:

    • Direct Measurement and Assessment

    • Indirect Assessment via Internal Contributor Mapping

    • Value Perspective

  • Experimental Evaluation

    • Fault Injection for Reliability and Fault Tolerance

    • Security Threat Simulation

  • Summary and Future Work

US/UK NCO/NEC Workshop


Experimental evaluation
Experimental Evaluation Diversity and Fault Tolerance

  • Testbed

    • Basis: OPs

    • Focus on problems and system behavior under injected or simulated problems

  • Fault Injection for Reliability and Fault Tolerance

    • Reliability mapping for injected faults

    • Use of fault seeding models

    • Direct fault tolerance evaluation

  • Security Threat Simulation

    • Focus 1: likely scenarios

    • Focus 2: coverage via diversity

US/UK NCO/NEC Workshop


Summary and future work
Summary and Future Work Diversity and Fault Tolerance

  • Overall Framework

  • External Environment Profiling

  • Component Dependability:

    • Direct Measurement and Assessment

    • Indirect Assessment via Internal Contributor Mapping

    • Value Perspective

  • Experimental Evaluation

    • Fault Injection for Reliability and Fault Tolerance

    • Security Threat Simulation

  • Summary and Future Work

US/UK NCO/NEC Workshop


ad