1 / 48

Adversarial Machine Learning

Adversarial Machine Learning. Daniel Lowd, University of Oregon Christopher Meek, Microsoft Research Pedro Domingos, University of Washington. Motivation. Many adversarial problems Spam filtering Malware detection Worm detection New ones every year! Want general-purpose solutions

aysel
Download Presentation

Adversarial Machine Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adversarial Machine Learning Daniel Lowd, University of Oregon Christopher Meek, Microsoft Research Pedro Domingos, University of Washington

  2. Motivation • Many adversarial problems • Spam filtering • Malware detection • Worm detection • New ones every year! • Want general-purpose solutions • We can gain much insight by modeling adversarial situations mathematically

  3. Example: Spam Filtering From: spammer@example.com Cheap mortgage now!!! 1. Feature Weights cheap = 1.0 mortgage = 1.5 2. 3. Total score = 2.5 > 1.0 (threshold) Spam

  4. Example: Spammers Adapt From: spammer@example.com Cheap mortgage now!!!Eugene Oregon 1. Feature Weights cheap = 1.0 mortgage = 1.5 Eugene = -1.0 Oregon = -1.0 2. 3. Total score = 0.5 < 1.0 (threshold) OK

  5. Example: Classifier Adapts From: spammer@example.com Cheap mortgage now!!!Eugene Oregon 1. Feature Weights cheap = 1.5 mortgage = 2.0 Eugene = -0.5 Oregon = -0.5 2. 3. Total score = 2.5 > 1.0 (threshold) Spam OK

  6. Outline • Problem definitions • Anticipating adversaries (Dalvi et al., 2004) • Goal: Defeat adaptive adversary • Assume: Perfect information, optimal short-term strategies • Results: Vastly better classifier accuracy • Reverse engineering classifiers (Lowd & Meek, 2005a,b) • Goal: Assess classifier vulnerability • Assume: Membership queries from adversary • Results: Theoretical bounds, practical attacks • Also: How to defeat a spam filter with 30 words or less! • Conclusion

  7. - + X2 X2 x x X1 X1 X2 X1 Definitions Adversarial cost function Instance space Classifier c(x): X {+,} c C, concept class (e.g., linear classifier) a(x): X R a  A (e.g., more legible spam is better) X = {X1, X2, …, Xn} Each Xi is a feature (e.g., a word) Instances, x X (e.g., emails)

  8. - + Adversarial scenario - + Defender’s Task:Choose new c’(x) minimize (cost-sensitive) error Adversary’s Task:Choose x to minimize a(x) subject to c(x) = 

  9. Cost-sensitive error:Not all errors are equal! “Better that ten guilty persons escape than that one innocent suffer.” -- William Blackstone, 1760 False Positive Rate (FP): Fraction of “good” instances misclassified as “bad” (e.g., good email blocked by filter) False Negative Rate (FN):Fraction of “bad” instances misclassified as “good” (e.g., spam that gets through) Classifier Utility: UC = 1 – cFP FP – cFN FN

  10. This is a game! • Adversary’s actions: {x X} • Defender’s actions: {c C} • Assume perfect information • Finding a Nash equilibrium is triply exponential (at best)! • Instead, we’ll look at optimal myopic strategies:Best action assuming nothing else changes

  11. Initial classifier • Set weights using cost-sensitive naïve Bayes • Assume: training data is untainted Learned weights: cheap = 1.0 mortgage = 1.5 Eugene = -1.0 Oregon = -1.0

  12. Adversary’s strategy From: spammer@ example.com Cheap mortgage now!!!Eugene Oregon From: spammer@ example.com Cheap mortgage now!!! • Use cost: a(x) = Σi w(xi, bi) • Solve knapsack-like problem with dynamic programming • Assume: that the classifier will not modify c(x) cheap = 1.0 mortgage = 1.5 Eugene = -1.0 Oregon = -1.0

  13. Classifier’s strategy • For given x, compute probability it was modified by adversary • Assume: the adversary is using the optimal strategy Learned weights: cheap = 1.0 mortgage = 1.5 Eugene = -1.0 Oregon = -1.0

  14. Classifier’s strategy • For given x, compute probability it was modified by adversary • Assume: the adversary is using the optimal strategy Learned weights: cheap = 1.5 mortgage = 2.0 Eugene = -0.5 Oregon = -0.5

  15. Evaluation: spam Good • Adversarial cost functions • Plain (PL) • Add Words (AW) • Synonyms (SYN) • Add Length (AL) • Classifier strategies • NB: Naïve Bayes (baseline) • AC: Adversarial Classifier • Similar results with one other dataset, ordifferent classifier costs Bad

  16. Repeated game: AC still wins • Adversary responds to new classifier; classifier predicts adversary’s revised response • Oscillations occur as adversaries switch strategiesback and forth.

  17. Outline • Problem definitions • Anticipating adversaries (Dalvi et al., 2004) • Goal: Defeat adaptive adversary • Assume: Perfect information, optimal short-term strategies • Results: Vastly better classifier accuracy • Reverse engineering classifiers (Lowd & Meek, 2005a,b) • Goal: Assess classifier vulnerability • Assume: Membership queries from adversary • Results: Theoretical bounds, practical attacks • Also: How to defeat a spam filter with 30 words or less! • Conclusion

  18. Imperfect information • What can an adversary accomplish with limited knowledge of the classifier? • Goals: • Understand classifier’s vulnerabilities • Understand our adversary’s likely strategies “If you know the enemy and know yourself, you need not fear the result of a hundred battles.” -- Sun Tzu, 500 BC

  19. Adversarial Classification Reverse Engineering (ACRE) - + Adversary’s Task:Minimize a(x) subject to c(x) =  Problem: The adversary doesn’t know c(x)!

  20. ? ? ? ? X2 - ? + ? ? ? X1 Adversarial Classification Reverse Engineering (ACRE) • Task: Minimize a(x) subject to c(x) =  • Given: Within a factor of k • Full knowledge of a(x) • One positive and one negative instance, x+ and x • A polynomial number of membership queries

  21. Comparison to other theoretical learning methods • Probably Approximately Correct (PAC): accuracy over same distribution • Membership queries: exact classifier • ACRE: single low-cost, negative instance

  22. X2 X1 X2 xa X1 ACRE example Linear classifier: c(x) = +, iff(w x > T) Linear cost function:

  23. X2 xa X1 Linear classifiers withcontinuous features • ACRE learnable within a factor of (1+) under linear cost functions • Proof sketch • Only need to change the highest weight/cost feature • We can efficiently find this feature using line searches in each dimension

  24. x- xa c(x) wi wj wk wl wm Linear classifiers withBoolean features • Harder problem: can’t do line searches • ACRE learnable within a factor of 2if adversary has unit cost per change:

  25. c(x) y x- xa wi wm wj wk wl c(x) y’ xa wi wj wk wl wp Algorithm Iteratively reduce the cost in two ways: • Remove any unnecessary change: O(n) • Replace any two changes with one: O(n3)

  26. c(x) xa y wi wj wk wl wm x wp wr Proof sketch (Contradiction) • Suppose there is some negative instance x with less than half the cost of y: • x’s average change is twice as good as y’s • We can replace y’s two worst changes with x’s single best change • But we already tried every such replacement!

  27. Evaluation: Is O(n3) practical? • Classifiers: Naïve Bayes (NB), Maxent (ME) • Data: 500k Hotmail messages, 276k features • Adversary feature sets: • 23,000 words (Dict) • 1,000 random words (Rand) Answer: Yes

  28. Histogram of Feature Weights “good” “spammy”

  29. Finding features • We can find good features (words) instead of good instances (emails) • Active attacks: Test emails allowed • Passive attacks: No filter access

  30. Active Attacks • Learn which words are best by sending test messages (queries) through the filter • First-N: Find n good words using as few queries as possible • Best-N: Find the best n words

  31. First-N AttackStep 1: Find a “Barely spam” message Original legit. Originalspam “Barely legit.” “Barely spam” Hi, mom! now!!! mortgage now!!! Cheap mortgage now!!! Spam Legitimate Threshold

  32. First-N AttackStep 2: Test each word Good words “Barely spam” message Spam Legitimate Less good words Threshold

  33. Best-N Attack Key idea: use spammy words to sort the good words. Spam Legitimate Better Worse Threshold

  34. Results: Words/queries tradeoff * words added + words removed

  35. Passive Attacks • Heuristics • Select random dictionary words (Dictionary) • Select most frequent English words (Freq. Word) • Select highest ratio: English freq./spam freq. (Freq. Ratio) • Spam corpus: spamarchive.org • English corpora: • Reuters news articles • Written English • Spoken English • 1992 USENET

  36. Passive Attack Results

  37. Comparison of all attacks * words added + words removed

  38. Conclusion • Mathematical modeling is a powerful tool in adversarial situations • Game theory lets us make classifiers aware of and resistant to adversaries • Complexity arguments let us explore the vulnerabilities of our own systems • This is only the beginning… • Can we weaken our assumptions? • Can we expand our scenarios?

  39. X2 X2 xa xa X1 X1 Convex classifiers withcontinuous features? • ACRE learnable under linear cost function? • Proof ideas • For convex spam, only need to change a single feature • For convex non-spam, may need to iteratively approximate the non-spam set using cutting planes x-

  40. Conclusion • Mathematical modeling is a powerful tool in adversarial situations • Game theory lets us make classifiers aware of and resistant to adversaries • Complexity arguments let us explore the vulnerabilities of our own systems • This is only the beginning… • Can we weaken our assumptions? • Can we expand our scenarios?

  41. X2 X1 Example: Trivial cost function • Suppose a small, constant number of instances have cost a(x) = b, and all others have cost a(x) =b’, where b’ >b • Test each of the b-cost instances • If none is negative, choose x - +

  42. Example: Boolean conjunctions • Suppose c is a conjunction of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (x1 = T,x2 = F, x3 = F, x4 = T) Guess: (x1x2x3 x4)

  43. Example: Boolean conjunctions • Suppose c is a conjunction of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (T,F, F, T) Guess: (x1x2x3 x4)

  44. Example: Boolean conjunctions • Suppose c is a conjunction of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (T, F, F, T) x’ = (F, F, F, T) c(x’) =  Guess: (x1x2x3 x4)

  45. Example: Boolean conjunctions • Suppose c is a conjunction of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (T, F, F, T) x’ = (T, T, F, T) c(x’) = + Guess: (x1x2x3 x4)

  46. Example: Boolean conjunctions • Suppose c is a conjunction of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (T, F, F, T) x’ = (T, F, T, T) c(x’) =  Guess: (x1x2x3 x4)

  47. Example: Boolean conjunctions • Suppose c is a conjunction of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (T, F, F, T) x’ = (T, F, F, F) c(x’) =  Guess: (x1x2x3 x4) Final Answer: (x1x3)

  48. Example: Boolean conjunctions • Suppose c is a conjunction of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn • Exact conjunction is learnable in n queries. • Now we can optimize any cost function. • In general: concepts learnable with membership queries are ACRE 1-learnable

More Related