automated known problem diagnosis with event traces l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Automated Known Problem Diagnosis with Event Traces PowerPoint Presentation
Download Presentation
Automated Known Problem Diagnosis with Event Traces

Loading in 2 Seconds...

play fullscreen
1 / 36

Automated Known Problem Diagnosis with Event Traces - PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on

Automated Known Problem Diagnosis with Event Traces. Chun Yuan , Ni Lao, Ji-Rong Wen, Jiwei Li, Zheng Zhang, Yi-Min Wang, Wei-Ying Ma Microsoft Research , Tsinghua, USTC. Motivation . Computer problem diagnosis: inefficient & inaccurate Too much human involvement

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Automated Known Problem Diagnosis with Event Traces' - lel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
automated known problem diagnosis with event traces

Automated Known Problem Diagnosis with Event Traces

Chun Yuan, Ni Lao, Ji-Rong Wen, Jiwei Li, Zheng Zhang, Yi-Min Wang, Wei-Ying Ma

Microsoft Research, Tsinghua, USTC

motivation
Motivation
  • Computer problem diagnosis: inefficient & inaccurate
    • Too much human involvement
    • Product feedback channel becoming mature
    • Diagnosis process remaining labor intensive
    • Solutions need manual inspection
  • Automating diagnosis process for known problems
    • Text-based symptom matching
    • Leveragingsystem information
      • State-based diagnosis + root cause ranking [Strider]
      • Online crash analysis
      • Mechanized troubleshooting steps
  • How to represent a known problem? How to recognize a known problem?
our approach
Our Approach
  • Known problems are annotated with relevant system behavior
  • New behavior is classified to some known problem based on similarity to existing behavior
  • Event traces as typical exhibitions of system behavior
workflow

Labeling

Known Problem DB

Workflow

(2)

Troubleshooter

Classifier

(3)

(1)

(4)

Apps

Tracer

tracer
Tracer
  • What events to collect?
    • System calls
  • What attributes to collect?
    • Process/thread ID
    • Process/thread name
    • System call name, parameters, return value
trace example
Trace Example

# process thread syscall paramaters & return value

...

18419 iexplore.exe 3892 CreateThread Process: 3888, Thread: 3896 SUCCESS

18420 iexplore.exe 3892 PostMessage WM_USER+0x300 1

18421 iexplore.exe 3892 OpenKey HKCU\SOFTWARE\Microsoft\Internet Explorer\Main SUCCESS

18422 iexplore.exe 3892 QueryValue HKCU\SOFTWARE\Microsoft\Internet Explorer\Main\Enable Browser Extensions NOTFOUND

18423 iexplore.exe 3892 OpenKey HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings SUCCESS

18424 iexplore.exe 3892 QueryValue HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings\DontUseDNSLoadBalancing NOTFOUND

18425 InoRT.exe 256 OpenKey HKEY_LOCAL_MACHINE\SOFTWARE\ComputerAssociates\eTrustAntivirus\CurrentVersion\Component SUCCESS

18426 InoRT.exe 256 OpenKey HKEY_LOCAL_MACHINE\SOFTWARE\ComputerAssociates\eTrustAntivirus\CurrentVersion\Contact SUCCESS

18427 InoRT.exe 256 OpenKey HKEY_LOCAL_MACHINE\SOFTWARE\ComputerAssociates\eTrustAntivirus\CurrentVersion\Distribute SUCCESS

18428 InoRT.exe 256 OpenKey HKEY_LOCAL_MACHINE\SOFTWARE\ComputerAssociates\eTrustAntivirus\CurrentVersion\InocDll Views SUCCESS

18429 iexplore.exe 3892 CloseKey HKCU\SOFTWARE\Microsoft\Internet Explorer\Main SUCCESS

18430 iexplore.exe 3892 CloseKey HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Internet Settings SUCCESS

18431 iexplore.exe 3892 PeekMessage WM_LBUTTONDOWN 1

18433 CcmExec.exe 3068 OpenKey HKCU SUCCESS

18434 CcmExec.exe 3068 OpenKey HKCU\Software\Policies\Microsoft\Control Panel\Desktop NOTFOUND

18435 InoTask.exe 2024 ReadFile \\??\C:\Program Files\MSDN\2003FEB\1033\dnmscs.hxi SUCCESS

18436 CcmExec.exe 3068 OpenKey HKCU\Control Panel\Desktop SUCCESS

18437 CcmExec.exe 3068 QueryValue HKCU\Control Panel\Desktop\MultiUILanguageId NOTFOUND

18438 CcmExec.exe 3068 CloseKey HKCU\Control Panel\Desktop SUCCESS

18439 CcmExec.exe 3068 CloseKey HKCU SUCCESS

18440 CcmExec.exe 3068 OpenFile \\??\C:\WINDOWS\System32\dwwin.exe SUCCESS

18441 CcmExec.exe 3068 CreateFile \\??\C:\WINDOWS\System32\dwwin.exe SUCCESS

18442 CcmExec.exe 3068 QueryValue HKCU\Control Panel\Desktop\MultiUILanguageId NOTFOUND

18443 CcmExec.exe 3068 CloseKey HKCU\Control Panel\Desktop SUCCESS

...

classifier
Classifier
  • Classification
    • Training: learn a model from annotated training data
    • Testing: predict the class a novel one belong to
  • Accuracy
    • Percentage of true positive data
  • Cross-validation
  • Feature: N-gram
    • Any N successive elements in a sequence
  • Feature vector
    • Given a set of sequences, let {g1, g2, …, g_K} be all unique N-grams occurring in any of the sequences
    • A sequence is represented by the vector (b1,…,b_K)
      • b_k = 1 if g_k is in the sequnce
      • b_k = 0 otherwise
support vector machines
Support Vector Machines
  • The decision boundary should be as far away from the data of both classes as possible, that is, maximizing the margin, m

Class 2

m

Class 1

comparing system call traces
Comparing System Call Traces
  • Understand system call variations
  • Sequence alignment

[abcdb] [abcd b ]

[adabe] => [a dabe ]

[bcdae] [ bcda e ]

[abcabf] [abc ab f]

  • Uniform ordering
    • Group by process/thread
    • Sort by process/thread name
system call variation

A

B

C

System Call Variation
  • Cross-time comparison
  • Noise filtering
    • Patterns occurring at less than Th_t% times are discarded
  • Object name canonicalization
    • Registry path discarded
    • Case-by-case rules
system call variation13
System Call Variation
  • Cross-machine comparison
  • Noise filtering
    • Patterns occurring on less than Th_m% machines are discarded
  • Canonicalization
    • File path discarded
evaluation
Evaluation
  • Select target problems
  • Collect traces with fault injection
  • Parameter sensitivity (cross-time & cross-machine)
    • Noise filtering threshold
    • Canonicalization or not
    • Pattern length
    • System call attributes availability
data collection
Data Collection
  • For each machine
    • For each round
      • For each problem
        • Inject the fault into the machine
        • Repeat a number of times
          • Start tracer
          • Replay the UI script to reproduce the symptom
          • Stop tracer
        • Remove the fault
  • Cross-time: 2 cases, 5 machines, 6 rounds (3 for training)
  • Cross-machine: 4 cases, 20 machines (10 for training)
noise filtering threshold
Noise Filtering Threshold
  • Impact on accuracy

IeDisplay

SharedFolder

noise filtering threshold19
Noise Filtering Threshold
  • Impact on feature space dimension and training time

SharedFolder

IeDisplay

canonicalization
Canonicalization

IeDisplay

SharedFolder

pattern length
Pattern Length

IeDisplay

SharedFolder

attributes availability
Attributes Availability
  • No thread name, system call parameters, return value

IeDisplay

SharedFolder

summary of cross time results
Summary of Cross-time Results
  • Noise filtering substantially reduces feature space and training time
  • Canonicalization has no effect
  • Longer patterns only helpful when fewer attributes are available
noise filtering threshold26
Noise Filtering Threshold
  • Feature space dimension and training time
pattern length29
Pattern Length
  • Sort traces by system call name
attributes availability30
Attributes Availability
  • Process name + system call name
convergence
Convergence

(a) 1-gram, full system call record

(b) 2-gram, process name + system call name

best results
Best Results

(a) 1-gram, full system call record

(b) 2-gram, process name + system call name

summary of cross machine results
Summary of Cross-machine Results
  • Canonicalization is very effective
  • 1-gram is good enough; 2-gram is useful when only process name and system call name are available
  • Accuracy converges very quickly with more training machines
related work
Related Work
  • Using computers to diagnose computer problems [Redstone et al.]
  • Capturing, Indexing, Clustering, and Retrieving System History [Cohen et al.]
  • Pinpoint [Chen et al.]
  • Magpie [Barham et al.]
  • Host-based intrusion detection
  • Fault localization in computer networks
conclusion
Conclusion
  • Automatic root cause recognition using statistical learning on system behavior similarity
  • For the 4 cases, nearly 90% recognition accuracy can be achieved with appropriate preprocessing of system call traces
  • Sensitivity study shows the method is robust to available event details