1 / 23

Example: Intrusion detection

Activity monitoring: Anomaly detection as on-line classification Tom Fawcett HP Laboratories 1501 Page Mill Rd. Palo Alto, CA Symposium on Machine Learning for Anomaly Detection May 22-23, 2004. User 1. User 2. User 3. User 4. ls. ls. gcc. netscape. from. cd. a.out. netscape. cd. ls.

yan
Download Presentation

Example: Intrusion detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Activity monitoring:Anomaly detection as on-line classificationTom FawcettHP Laboratories1501 Page Mill Rd.Palo Alto, CASymposium on Machine Learning for Anomaly DetectionMay 22-23, 2004

  2. User 1 User 2 User 3 User 4 ls ls gcc netscape from cd a.out netscape cd ls vi netscape latex from gcc acroread gv emacs a.out netscape emacs logout vi . su latex gdb . ypcat xfig cd . su docalc gcc su docalcs pwd rlogin from ls finger pwd from . . pwd intrusion . . . . . . Example: Intrusion detection

  3. Example: Monitoring digital switch health S1 S2 S3 ... Abnormal behavior culminating in hard failure Si

  4. Example: Monitoring business news • 1 May 20, 1999—VISX Inc. today announced that the Federal Trade Commission (FTC) has filed a notice of appeal of the decision issued earlier this month by the FTC administrative… • 2 May 29 (Reuters) – Federal advisers backed approval Thursday of a laser made by VISX Inc. used to correct nearsightedness with or without astigmatism. […] • 3 June 3 (PR Newswire) – VISX, Inc. of Santa Clara, California, will become a component of the Nasdaq-100 Index, effective at the beginning of trading Thursday, June 10, 1999. […] • 4 June 4 (PR Newswire) --- Amazon.com, facing threats of legal action from The New York Times, has asked the U.S. District Court in Seattle to allow Amazon.com to continue advertising. • 5 June 5, 1999 --- Motorola today announced that its MPC923, MPC950 and MPC960 PowerPC processors have been officially certified by Microsoft Corporation to support the… • 6 June 8, 1999 (PR Newswire) --- WebTV Networks, Inc. and EchoStar Communications Corp. at CES today announced the Microsoft WebTV Network Plus service for satellite and the EchoStar…

  5. Monitoring business news — VISX • 1 May 20, 1999—VISX Inc. today announced that the Federal Trade Commission (FTC) has filed a notice of appeal of the decision issued earlier this month by the FTC administrative… • 2 May 29 (Reuters) – Federal advisers backed approval Thursday of a laser made by VISX Inc. used to correct nearsightedness with or without astigmatism. […] • 3 June 3 (PR Newswire) – VISX, Inc. of Santa Clara, California, will become a component of the Nasdaq-100 Index, effective at the beginning of trading Thursday, June 10, 1999. […] VISX stock price

  6. Commonalities of the domains • Temporal: data comprise time series • Large number of entities (users, companies, accounts, devices) • Large volumes of data (commands, news stories, calls) on entity activity • General goal is to alert on interesting, rare events (intrusions, fraud, unusual business activity) Onset of significant activity Detection goals: • Identify as many interesting events as possible • Alert as soon as possible • Minimize false alarms

  7. Topic detection and tracking Allan, Papka & Lavrenko (SIGIR-98) Crabtree & Soltysiak (IJCAI-97) Allen (ed.), 2002 Fraud detection Chan & Stolfo (KDD-98) Cox et al. (DMKD-97) Fawcett & Provost (KDD-97) Burge & Shawe-Taylor (FDRM-97) Ezawa & Norton (KDD-95) News/event alerts Yang, Pierce & Carbonell Fawcett & Provost (KDD-99) Epidemic/bio-terrorism detectionWong et al. 2002, 2003Shmueli 2004 Activity monitoring problems • Intrusion detection Lee, Stolfo, Mok (KDD-99) Lane & Brodley (KDD-98) Ryan et al. (FDRM-97) DuMouchel & Schonlau (KDD-98) • Network alarm monitoring Sasisekharan et al. (KDD-94) Weiss & Hirsh (KDD-98) Klemettinen 99 Hardware fault detection Dasgupta & Forrest 96 Smyth 92

  8. Event stream ..................................... Window vector extraction Class Instance vectors - - - Classification problem - ... .. + + • Many approaches use |w|=1 Standard supervised learning approach Onset of significant activity

  9. Intrusion Login sessions: userintruder Challenges for machine learning approaches • Very skewed class distributions – inherent asymmetry • Differing error costs • Imprecision in class and cost distributions • Temporal dependencies among alarms Earlier is better than later Several is (usually) no better than one • Solutions may use different representations Different timescales, different granularity |w| = 1 command|w| = 1 login session |w| = 1 process life

  10. Normal activity Positive activity d d d d d d d d d d d d d d d d d d d d d d d Di    (H is alarm history; see paper) Formalism • D: set of data streams being monitored • Di = < d1, d2, d3, ..., dn>: sequence of data items in stream Di • : alarm time • : onset of positive activity Each episode has at most one  Benefit/cost of alarms: s(, a, H, Di): benefit of  if true positivef(, H, Di): cost of  if false positive

  11. (H is alarm history; see paper) Example: Plot of s(, O, Di) as a function of alarm time smax s 0  Formalism • D : set of data streams being monitored • Di = < d1, d2, d3, ..., dn>: sequence of data items in stream i • : alarm time • : onset of positive activity Benefit/cost of alarms: s(, , H, Di): benefit of  if true positivef(, H, Di): cost of  if false positive 

  12. smax s 0  Detecting digital switch failures  Minimum advance notice Hard failure point Onset of observable switch abnormalities

  13. How is this framework better?  More realistic evaluation of solution methods • Differing error costs • Skewed class distributions AMOC analysis • Temporal dependencies among alarms: • Earlier is better than later • Several is no better than one • Solutions may use different representations • Different timescales, granularities • Time and alarm history in s and f • AMOC normalizes WRT time • (no definite notion of false positive max)

  14. AMOC curves Random alarms with different frequencies (.1/hr, .2/hr, etc. 1 if 0 α-τ  50 otherwise s(τ,α) = f = 1 ROC curves vs AMOC curves

  15. Activity monitoring: Solution approaches • Fundamental problem characteristics: • Asymmetry of classes: Positive activity is inherently rare • Discriminating method: differentiates positive and normal activity • vs • Profiling method: models normal activity without reference to positive.(ie, learning from negative examples only) • Multi-level representation of data • Uniform modeling: Models activity uniformly across all monitored entities • vs • Individual modeling: Models Di activity individually

  16. Example: Monitoring business news • Goal: Scan news stories associated with businesses, alarm on stories that correlate with “interesting” behavior. • Interesting = 10% change in stock price (up or down) within 34.5 hours • Data: Yahoo stories and stock prices from 6000 companies over 3 months • DC-1 system • Developed for cell phone fraud detection • Performs discriminating, individual modeling DIntel 2 1

  17. Example: Monitoring business news Textual indicators for price spikes: said [it] expects same period revenues increase over per sharefourth compare[d] income quarter fiscalearnings per diluted fiscal quarter ended expenses months endedtoday reported consensus quarter earnings year ended repurchaselower than shortfall Q[1234] fourth-quarter first callbelow analyst for quarter research [and] development AMOC curve 1 if 0 α-τ  34.5 hours0 otherwise s(τ,α) = f = 1

  18. Pitfalls in evaluation Why performance may look better than it should • Evaluating too locally • Windows shouldn’t overlap • Behavior may be episodic or local (“bull market behavior”) •  Need out-of-time sampling … Di Train Test

  19. Pitfalls in evaluation • Mixing events from a single account between train and test sets • Goal of evaluation is to determine how well system will work on new, unseen accounts. • Events within an account may be much more similar to each other than to events in other accounts • Mixing one account’s examples between train and test sets may leak test info into training • Need out-of-account sampling Train … Train Test Test …

  20. Conclusions • This form of anomaly detection is inherently classification • Alarms  True positives, false positives, etc. • Classification methods can be brought to bear • But temporal aspects make standard classification metrics inappropriate • Activity monitoring domains are common in machine learning. Solution methods & strategies can be shared and adapted.

  21. [end]

  22. Activity monitoring: Learning methods … D1 … D2 … D3 … D4 … D5 ... Problem characteristics Class asymmetry Discriminating methodvsProfiling method Multi-level representation Uniform modelingvsIndividual modeling

  23. d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d  Transforming tau — Circuit failure Hard failure(end of episode) Beginning of positive visible activity Degradation Implicit lookahead interval

More Related