Strider a black box state based approach to change and configuration management and support ccms l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support (CCMS) PowerPoint PPT Presentation


  • 74 Views
  • Uploaded on
  • Presentation posted in: General

STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support (CCMS). Yi-Min Wang, Chad Verbowski, John Dunagan, Yu Chen, Helen J. Wang, Chun Yuan, & Zheng Zhang Microsoft Research, Redmond & Beijing. The Problem: Computer Fragility.

Download Presentation

STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support (CCMS)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Strider a black box state based approach to change and configuration management and support ccms l.jpg

STRIDER: A Black-box, State-based Approach to Change and Configuration Management and Support (CCMS)

Yi-Min Wang, Chad Verbowski, John Dunagan, Yu Chen, Helen J. Wang, Chun Yuan, & Zheng Zhang

Microsoft Research, Redmond & Beijing


The problem computer fragility l.jpg

The Problem: Computer Fragility

  • “It worked yesterday, but not today.”

  • “It worked for that user, but not this user.”

  • “It worked on that machine, but not this machine.”

  • “I restarted the application, rebooted the machine, but still can’t fix the problem!”

  • We focus on Registry-related problems in this paper


Scott and susi s registry problem l.jpg

Scott and Susi’s Registry Problem


Slide4 l.jpg

PC: 200,000

Registry Values

Human: 3 billion

DNA base pairs

Desktop

Last Week

Human

#1

99% the same

99.9% the same

Desktop

Today

Human

#2

65%

Similarity

70% - 90%

Similarity

>11%

“Junk” Entries

50%

“Junk” DNA

Mouse

Laptop

3 billion

200,000

< 5%

Code for

Config. changes

< 2%

Code for

Proteins

Inspired by the Human Genome Project

Registry Entries for

“Garbage fonts disease”

Found at the Fontskey under

HKLM\Software\Microsoft\

Windows NT\CurrentVersion

Gene for

Huntington's disease

Found at the tip of the short arm of

Chromosome 4


Contributions of strider l.jpg

Contributions of STRIDER

  • Strider Principles

    • Key to handling complexity in CCMS

    • Problem decomposition into 7 Strider components

  • Strider Process

    • Conceptual use of Strider components to solve particular CCMS problem

  • Strider Toolkit

    • Implementation of Strider components as command-line building blocks

  • Strider Troubleshooter

    • UI root-cause analysis tool that strings together command-line tools for troubleshooting


Principle 1 state based analysis l.jpg

Symptom-

Based

Analysis

Knowledge, Experience, & Support database

Imprecise, nondeterministic search

Persistent

Failure

B

Y

Z

C

Mechanical

& Statistical

Latency

Precise

Database

Lookup

State-

Based

Analysis

  • “Is this a junk entry?”

  • “Who owns this entry?”

  • “Are there known problems with

  • this entry?”

PC Genomics

Database

Principle #1: State-Based Analysis

First-level decomposition: Mechanical, Statistical, & Database

App or Action

A

State


Principle 2 attack the mess with the mass l.jpg

Freedom & Flexibility

 Large install base  The Mess:

Number of different configurations

Grows with the number

Of machines

200,000

WinXP

Registry

77,000

Good

Bad

Diff

Large install base

 The Mass:

Number of data points

Grows with the number

Of machines

Diff

Trace

Intersection

Diff

System Restore

Checkpoints

Trace

Bad

Good

Mechanical

Principle #2: Attack The Mess With The Mass

Second-level decomposition: Diff, Trace, & Intersection


Principle 3 complexity noise filtering l.jpg

Principle #3: Complexity-Noise Filtering

Self-filtering of complexity as noise

  • A lot of the differences are not significant for systems management and troubleshooting

    • Registry entries that are constantly changing are less important; they are simply “operational states”

      • Inverse Change Frequency (ICF) ranking

    • Registry entries that are always different on different machines constitute natural diversity among Windows machines

  • Start with deterministic bad state, end with deterministic bad behavior

    • Nondeterministic activities in-between are often less important

    • Intersection of multiple traces can filter out such noise


Mechanical statistical l.jpg

Global state-snapshot

repository

Global cross-machine

analysis for noise filtering

& state ranking

Local cross-time analysis for

noise filtering & state ranking

Intersection

Diff

Trace

Good

Mechanical

Mechanical + Statistical

200,000

WinXP

Registry

77,000

Good

Bad

Diff

Diff

Trace

System Restore

Checkpoints

Bad


Registry change behavior analysis l.jpg

Registry Change-Behavior Analysis

  • Four machines, each with 84 days of checkpoints

  • Percentage ever changed: 4.7% - 13.2%

  • Percentage operational: 1.9% - 5.6%

  • Percentage installation/configuration: 2.1% - 11.3%

  • Median # changes/day = 302 (raw), 29 (noise filtered)


Strider components l.jpg

Strider Components

  • Mechanical

    • State Diff: diff “bad state” against “last known working state”

    • Tracing: failing app execution or booting

    • Intersection: diff & trace

  • Statistical

    • State Ranking:

      • Inverse Change Frequency (ICF) ranking: states with high change frequencies are less likely to be the root cause

      • Order ranking: states accessed later are more likely to be the result of execution divergence caused by the earlier root-cause entry

  • Database

    • PC Genomics Database: state functional & failure info

      5.1. “Is this a junk entry?” – Noise Filtering

      5.2. “Who owns this entry?” – Ownership Mapping

      5.3. “Are there known problems with this entry?” – Support Database Lookup


Strider process for troubleshooting l.jpg

Support

Articles

Config

Action

UI

App

Info

Doc

Tracing

State Diff

Support Database

Lookup

Ownership

Mapping

PC

Genomics

Database

Intersection

Noise Filtering

State Ranking

Filtered & Ranked

Candidate Set

Strider Process for Troubleshooting

Solution-query phase

Narrow-down phase

The program

keeps failing

It was

working

Now it

doesn’t

work

User

Tool


Strider troubleshooter cross restore point results l.jpg

After diff & trace

intersection

Average Registry size

Two

Orders

Another

Two

Orders

Of

Magnitude

After state diff

Root cause

Order-ranking

After noise filtering

Strider TroubleshooterCross-restore-point Results


Cross machine results l.jpg

Average Registry size

Root cause

Order-ranking

After noise filtering

Cross-machine Results

After diff & trace

intersection

Number of

Registry

Values

After state diff


Summary l.jpg

Summary

  • Think outside the white-box

    • Derive “black-box manifests” through PC Genomics (tracing, diffing, & behavior modeling) and show their benefits for CCMS

    • White-box & black-box approaches complement each other

  • State+Symptom-based troubleshooting

    • State-based support articles can be retrieved by symptom-based search

    • Symptom-based search can be enhanced with additional state-based strings

    • Symptom-based matching can help state ranking


Future work l.jpg

Future Work

  • Long-term goal: develop new abstractions for systems management

    • Configuration Change Audits

      • “What has changed on my machine since last week, and who did it?”

    • Impact Analysis

      • “Is applying this patch going to break my apps?”

    • Server Drift

      • “What’s causing my server machines’ configurations to diverge?”


  • Login