Learning and inference in the knowledge plane
This presentation is the property of its rightful owner.
Sponsored Links
1 / 35

Learning and Inference in the Knowledge Plane PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on
  • Presentation posted in: General

Learning and Inference in the Knowledge Plane. Thomas G. Dietterich School of EECS Oregon State University Corvallis, Oregon 97331 http://www.eecs.oregonstate.edu/~tgd. Traffic. Network Model. Performance Measures. Configurations. Claim: KP applications will be driven by learned models.

Download Presentation

Learning and Inference in the Knowledge Plane

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Learning and inference in the knowledge plane

Learning and Inference in the Knowledge Plane

Thomas G. Dietterich

School of EECS

Oregon State University

Corvallis, Oregon 97331

http://www.eecs.oregonstate.edu/~tgd


Claim kp applications will be driven by learned models

Traffic

Network

Model

Performance

Measures

Configurations

Claim: KP applications will be driven by learned models

Example: configuration/traffic model captures the tripartite relationship between network configurations, traffic properties, and network performance measures.

Configuration X + Traffic Y ) Performance level Z


Network model drives configuration

Traffic

Performance

Measures

Configurations

Network Model Drives Configuration

traffic mix

performance objectives

Network Model

Configuration

Engine

proposed network configuration


Roles for learning

Roles for Learning

  • Learn network model

    • measure…

      • configuration information

      • traffic properties (protocol mix, session lengths, error rates …)

      • performance measures (throughput, E2E delays, errors, application-level measures)

    • fit model


Roles for learning 2

Roles for Learning (2)

  • Improving the configuration engine

    • Learn repair rules

      • Observe operator-initiated repairs

    • Learn heuristics for rapidly finding good solutions

    • Cache previous good solutions


Models for why

Traffic

Sensor

Performance

Measures

Configurations

Variables Measured, Error Bounds, Costs

Models for WHY

Sensor Model

Network Model


Network and sensor models drive diagnosis and repair

Sensor Model

Network Model

observed anomaly or user’s complaint

Diagnosis Engine

Sensors

Interventions

diagnosis and recommended repair

Network and Sensor Models Drive Diagnosis and Repair

DE chooses measurement or intervention

DE executes measurement or intervention

DE outputs diagnosis

User makes complaint

DE receives results


Example bayesian network drives diagnosis

Example: Bayesian Network Drives Diagnosis


Semantics

Semantics

  • Every node stores a conditional probability distribution:


Diagnostic process

Diagnostic Process

  • Interventions:

    • Observation: Observe Radio

    • Repair attempts: Fill gas tank

    • Observe & Repair: Inspect fuel pump, replace if bad

  • Algorithm:

    • Compute P(component is bad | evidence)

    • Repair component that maximizes P(bad)/cost

    • Choose observation that maximizes value of information


Example

Example

Choose SparkPlugs as next component to repair. Repair it. Update probabilities, and repeat.


Role for learning

Role for Learning

  • Learn sensor model

    • Basic sensor model is manually engineered

    • Learn error bounds and costs (e.g., time delay, traffic impact)


Anomaly detection models

Anomaly Detection Models

Monitor network for unusual traffic, configurations, and routes

Measure of “Normalness”

Traffic

Measure of “Normalness”

Configurations

Measure of “Normalness”

Routes

Anomalies are phenomena to be understood, not alarms to be raised.


Role for learning1

Role for Learning

  • Learn these models by observing traffic, configurations, routes


Model properties

Model Properties

  • Spatially Distributed

    • Replicated/Cached

  • Hierarchical

    • Multiple levels of abstraction

  • Constantly Maintained


Available technology configuration engine

Available Technology:Configuration Engine

  • Existing formulations: constraint-satisfaction problems (CSP) with objective function

  • Systematic and repair-based methods

  • Some ideas for how to incorporate learning


Available technology 2 diagnostic engine

Available Technology (2):Diagnostic Engine

  • Special cases where optimal diagnosis is tractable

    • Single fault; all actions are repair attempts

    • Single fault; all actions are pure observations

  • Widely-used heuristic

    • One-step value of information (greedy approx)

  • Fully-general approach

    • Partially-observable Markov Decision Process

    • Some approximation algorithms are available


Available technology 3 anomaly detection

Available Technology (3): Anomaly Detection

  • Unsupervised Learning Methods

    • Clustering

    • Probability Density Estimation

    • One-class Classification Formulation


Research gaps 1 spatially distributed multi level models of traffic

Research Gaps (1): Spatially-distributed multi-level models of traffic

  • What are the right variables to model?

    • Packet-level statistics (RTT, throughput, jitter, …)

    • Connection-level statistics

    • Routing statistics

    • Application statistics

  • What levels reveal anomalies?

  • What levels can best be related to performance goals and configuration settings?


Research gaps 2 learning models of configurations

Research Gaps (2): Learning models of configurations

  • Network components

    • Switches, routers, firewalls, web servers, file servers, wireless access points, …

  • LANs and WANs

  • Autonomous Systems

  • What are the right levels of abstraction?

  • Can these things be observed?


Research gaps 3 relational learning

Research Gaps (3): Relational learning

  • Network models are relational

    • (traffic, configuration, performance)

    • network structure is a graph

    • routes are paths with attached properties

  • Relational modeling is a relatively young area of ML


Research gaps 4 distributed learning and reasoning

Research Gaps (4): Distributed learning and reasoning

  • Distributed model construction

    • Bottom-up summary statistics (easy)

    • Mixed (bottom-up/top-down) information flow (unexplored)

    • Essential for higher-level modeling

    • Opportunities for data sharing at lower levels

  • Distributed configuration

  • Distributed diagnosis

    • Opportunities for inference sharing


Research gap 5 openness

Research Gap (5):Openness

  • Standard AI Models assume a fixed set of classes/categories/faults/components

  • How do we reason about the possible existence of new classes, new components, new protocols (including intrusions/worms)?

  • How do we evaluate such systems?


Application model interface

Application/Model Interface

  • Subscription?

    • Applications subscribe to regular model updates/summaries

    • Applications specify the models they want the KP to build/maintain

  • Query?

    • Applications make queries to models?


Application kp interface two possibilities

Application/KP Interface: Two possibilities

  • KP provides inference services in addition to model services

    • WHY? client sends end-user complaint to TP where inference engine operates

  • Inference is performed on end-user machine?

    • WHY? client does inference, just sends queries to TPs

      • Some inference about end-user’s machine needs to happen locally. Maybe view as local TP?


Concluding remarks

Concluding Remarks

  • KP Applications will be driven by learned models

    • traffic models

    • sensor models

    • models of “normalness”

  • Models are acquired by mix of human authoring and machine learning

  • Main research challenges arise from multiple levels of abstraction and world-wide distribution


Kp support for learning models

KP support for learning models

  • Example: HP Labs email loop detection system

    • Bronstein, Das, Duro, Friedrich, Kleyner, Mueller, Singhal, Cohen (2001)

    • Convert mail log data into four “detectors” and combine using a Bayesian network


Kp support for learning models 2

KP support for learning models(2)

  • Variables to be measured:

    • Raw sensors: mail log (when received, from, to, size, time of delivery attempt, status of attempt)

    • Derived variables (10-minute windows)

      • IM: # incoming msgs

      • IMR: # incoming msgs / # outgoing msgs

      • PEAK: magnitude of peak bin in message size histogram

      • SHARPNESS: ratio of magnitude of peak bin to average size of four neighboring non-empty bins (excluding 2 nearest-nbr bins on either side of peak bin)

    • Summary statistics

      • mean and standard deviation of the derived variables trimmed to remove outliers beyond ±3.5 σ


Kp services

KP Services

  • Provide a language so that I can define

    • raw features

    • derived features

    • summary statistics

  • Provide subscription service so that I can collect the summary statistics

    • automatic or semi-automatic support for aggregating summary statistics from multiple sources (e.g., multiple border SMTP servers)

  • Provide subscription service so that I can sample the derived features (to build a supervised training data set)


Statistical aggregation middleware

Statistical Aggregation Middleware

  • Routines for aggregating statistics

    • Example: given

      • S1 = i x1,i and N1

      • S2 = j x2,j and N2, from two independent sources

    • I can compute

      • S3 = S1 + S2 and N3 = N1 + N2

    • From these, I can compute the mean value:

      •  = S3/N3

  • To compute variance, I need SS1 = i x21,i


Model fitting middleware

Model-Fitting Middleware

  • Given summary statistics, compute probabilities in a Bayesian network

Mail Loop

IM

IMR

PEAK

SHARPNESS


Sufficient statistics

Sufficient Statistics

  • Andrew Moore (CMU): For nearly every learning algorithm, there is a set of statistics sufficient to permit that algorithm to fit its models

  • There is usually a scalable way of collecting and aggregating these sufficient statistics


One kp goal

One KP Goal

  • Design the services for defining sensors, defining derived variables, defining sufficient statistics, and defining aggregation methods

  • Scalable, secure, etc.


Hierarchical modeling

Hierarchical Modeling

  • Abstract (or aggregate) model could treat conclusions/assertions of other models as input variables

  • “One model’s inference is another model’s raw data”

    • probably requires associated meta-data: provenance, age of original data

    • major issue: assessing the independence of multiple data sources (don’t want to double-count evidence): requires knowledge of KP/network topology


Fusing multiple subscriptions

Fusing Multiple Subscriptions

  • Multiple KPs and/or multiple KPapps may register for the same sufficient statistics

  • Fuse their subscriptions to save computation

  • Keep meta data on non-independence


  • Login