Predicting Unix Commands With Decision Tables and Decision Trees

Predicting Unix Commands With Decision Tables and Decision Trees Kathleen Durant Third International Conference on Data Mining Methods and Databases September 25, 2002 Bologna, Italy

How Predictable Are a User’s Computer Interactions? • Command sequences • The time of day • The type of computer your using • Clusters of command sequences • Command typos

Characteristics of the Problem • Time sequenced problem with dependent variables • Not a standard classification problem • Predicting a nominal value rather than a Boolean value • Concept shift

Dataset • Davison and Hirsh – Rutgers university • Collected history sessions of 77 different users for 2 – 6 months • Three categories of users: professor, graduate, undergraduate • Average number of commands per sessions: 2184 • Average number of distinct commands per session : 77

Rutgers Study • 5 different algorithms implemented • C4.5 a decision-tree learner • An omniscient predictor • The most recent command just issued • The most frequently used command of the training set • The longest matching prefix to the current command • Most successful – C4.5 • Predictive accuracy 38%

Typical History Session 96100720:13:31 green-486 vs100 BLANK 96100720:13:31 green-486 vs100 vi 96100720:13:31 green-486 vs100 ls 96100720:13:47 green-486 vs100 lpr 96100720:13:57 green-486 vs100 vi 96100720:14:10 green-486 vs100 make 96100720:14:33 green-486 vs100 vis 96100720:14:46 green-486 vs100 vi

WEKA System • Provides • Learning algorithms • Simple format for importing data –ARFF format • Graphical user interface

History Session in ARFF Format @relation user10 @attribute ct-2 {BLANK,vi,ls,lpr,make,vis} @attribute ct-1 {BLANK,vi,ls,lpr,make,vis} @attribute ct0 {vi,ls,lpr,make,vis} @data BLANK,vi,ls vi, ls, lpr ls,lpr, make lpr, make, vis make, vis, vi

Learning Techniques • Decision tree using 2 previous commands as attributes • Minimize size of the tree • Maximize information gain • Boosted decision trees - AdaBoost • Decision table • Match determined by k nearest neighbors • Verification by 10-fold cross validation • Verification by splitting data into training/test sets • Match determined by majority

makes time = -2 dir lss time = -1 ls vi pwd make gcc emacs ls more gcc make pwd more emacs ls emacs make man pine time = 0 Learning a Decision Tree Command Values

Boosting a Decision Tree Decision Tree SolutionSet

Learning a Decision Table K - Nearest Neighbors (IBk) Example

Prediction Metrics • Macro-average – average predictive accuracy per person • What was the average predictive accuracy for the users in the study ? • Micro-average – average predictive accuracy for the commands in the study • What percentage of the commands in the study did we predict correctly?

Macro-average Results

Micro-average Results

Results: Decision Trees • Decision trees – expected results • Compute-intensive algorithm • Predictability results are similar to simpler algorithms • No interesting findings • Duplicated the Rutger’s study results

Results: AdaBoost • AdaBoost – very disappointing • Unfortunately none or few boosting iterations performed • Only 12 decision trees were boosted • Boosted trees predictability only increased by 2.4% on average • Correctly predicted 115 more commands than decision trees ( out of 118,409 wrongly predicted commands) • Very compute intensive and no substantial increase in predictability percentage

Results: Decision Tables • Decision table – satisfactory results • good predictability results • relatively speedy • Validation is done incrementally • Potential candidate for an online system

Summary of Prediction Results • Ibk decision table produced the highest micro-average • Boosted decision trees produced the highest macro-average • Difference was negligible • 1.37% - micro-average • 2.21% - macro-average

Findings • Ibk decision tables can be used in an online system • Not a compute-intensive algorithm • Predictability is better or as good as decision trees • Consistent results achieved on fairly small log sessions (> 100 commands) • No improvement in prediction for larger log sessions (> 1000 commands) • due to concept shift

Summary of Benefits • Automatic typo correction • Savings in keystrokes is on average 30% • Given an average command length is 3.77 characters • Predicted command can be issued with 1 keystroke

Questions

The algorithm. Let Dt(i) denote the weight of example i in round t. Initialization: Assign each example (xi, yi) E the weight D1(i) := 1/n. For t = 1 to T: Call the weak learning algorithm with example set E and weight s given by Dt. Get a weak hypothesis ht : X . Update the weights of all examples. Output the final hypothesis, generated from the hypotheses of rounds 1 to T. AdaBoost Description

43 42 41 40 39 38 37 36 35 Decision table Decision table Decision table Decision trees AdaBoost using Ibk using majority using match percentage split Macro-average Micro-average Complete Set of Results

Command at time = t-2 Command at t-1 Command at t-1 Command at t-1 make dir grep grep grep grep Learning a Decision Tree … ls make dir pwd ls emacs pwd grep ls Predicted Commands time = t

Predicting Unix Commands With Decision Tables and Decision Trees

Predicting Unix Commands With Decision Tables and Decision Trees

Presentation Transcript

Decision Tables

Decision Trees

Classification with Decision Trees

Decision Trees

Decision Trees and Decision Tables

Decision Trees

Decision Analysis (Decision Trees )

Decision Tables

Decision Trees

Decision Trees

Decision Trees and Tables

Decision Trees

Decision Analysis-Decision Trees

DECISION TABLES

Decision Trees

Decision Tables

Decision Trees

Decision Trees

Decision trees

Decision Trees

Decision Trees