Automatic misconfiguration troubleshooting with peerpressure
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

Automatic Misconfiguration Troubleshooting with PeerPressure PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Automatic Misconfiguration Troubleshooting with PeerPressure. Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang Microsoft Research. Presenter: Sara Salahi Northwestern University. Agenda. Importance of this work Key ideas PeerPressure: Architecture & Algorithm Prototype

Download Presentation

Automatic Misconfiguration Troubleshooting with PeerPressure

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Automatic misconfiguration troubleshooting with peerpressure

Automatic Misconfiguration Troubleshooting with PeerPressure

Helen J. Wang, John C. Platt, Yu Chen, Ruyun Zhang, Yi-Min Wang

Microsoft Research

Presenter: Sara Salahi

Northwestern University



  • Importance of this work

  • Key ideas

  • PeerPressure: Architecture & Algorithm

  • Prototype

  • Performance

  • Future Work


Authors focus on this


  • Tech support = 17% total cost of ownership of today’s desktop PCs

  • Large amount of Tech support is spent on troubleshooting

  • Many troubleshooting cases are due to misconfiguration

  • Misconfiguration is often caused by data that is in shared persistent stores (e.g. Windows registry)

Key ideas misconfigurations

Key Ideas: Misconfigurations

  • Can have many different “root causes”

    • Seemingly innocuous changes to shared system configurations

    • System bugs

    • Security patches may introduce incompatible registry settings

    • Failed uninstallation of applications

    • Manual intervention using Registry editor

Key ideas the golden state

Key Ideas: The Golden State

  • “Golden State” – a perfect configuration

  • Assume that the golden state is in the mass

  • Combine statistical golden state with Bayesian statistics to identify anomalous misconfigurations on “sick” machines

Key ideas goals of troubleshooting

Key Ideas: Goals of Troubleshooting

  • Effectiveness

    • System should identify a small set of sick configuration candidates in a short amount of time

  • Automation

    • Minimize number of manual steps and number of users involved

Peerpressure architecture

3) Turns user- or machine-specific entries into canonicalized form

2) I found you 

1) Sick computer 

4) Database containing a number of machine configuration snapshots

5) Bayesian estimation used to calculate probability of a suspect being sick

PeerPressure: Architecture

Peerpressure architecture1

PeerPressure: Architecture

  • Manual Steps

    • User runs faulty application to record suspects

    • User determines if sickness is cured

  • Manual steps involve only the troubleshooting user and no second-party

Peerpressure algorithm

PeerPressure: Algorithm

  • Intuition and Objectives

  • e1: Probably healthy

  • e2: Most probably sick

  • e3: “Natural biological diversity”

  • Type I: application configuration states

    • e1 and e2

  • Type II: operational states (timestamps, caches etc)

    • e3

    • Want to weed out; most likely false positives

Peerpressure algorithm1

PeerPressure: Algorithm


  • (3) + (1)  when m=0, P(S|V) = 1

  • Bayesian estimation used to overcome this.

  • Vector pj: probability of event happening and its outcome being Vj; pj follows Direchtlet distribution.

  • mj: count of number of values matching suspect value

Peerpressure algorithm2

PeerPressure: Algorithm

Asymptotic Analysis:



  • GeneBank Database: Microsoft SQL Server 2000 containing snapshots from 87 Windows XP PCs

  • PeerPressure troubleshooter implemented in C#

  • “Data Sanitization”

    • Unification of different representations of the same value

  • Dual Intel Xeon 2.4 GHz CPU workstation with 1 Gb RAM hosts SQL Server

Performance response time vs number of suspects

PerformanceResponse Time vs. Number of Suspects

  • 20 real-world troubleshooting cases used

  • Database queries dominate troubleshooting response time (one query per suspect entry)

Prototype genebank

Prototype: GeneBank

  • Registry characteristics in GeneBank

  • Unseen – values that are unknown to the GeneBank, increments observed cardinality by 1

    • Any entry from GeneBank has cardinality of at least 2

  • Entries that do no exist on some sample machines have value no entry

  • When cardinality is low, conformity among samples is strong

Performance root cause ranking results

PerformanceRoot-Cause Ranking Results

  • 87% have cardinality of 2, 94% no more than 3, 97% no more than 4

Performance false positives

PerformanceFalse Positives

  • Large cardinality of root-cause entry

  • Relation between root-cause entry and other entries in the suspect set

  • GeneBank is not pristine

Performance impact of sample set size

PerformanceImpact of Sample Set Size

Performance sick machine sensitivity

PerformanceSick Machine Sensitivity

Format: RootCauseRanking (NumberOfTies) / NumberOfSuspects

Future work

Future Work

  • Multi-gene troubleshooting

    • Multiple sick entries among suspects

  • Cross-application misconfiguration

  • Heavy customization of apps can break assumption of strong conformance in most configuration entries

  • GeneBank maintenance – privacy issue

  • Login