1 / 42

Adversarial Learning: Practice and Theory

Adversarial Learning: Practice and Theory. Daniel Lowd University of Washington July 14th, 2006 Joint work with Chris Meek, Microsoft Research. “If you know the enemy and know yourself, you need not fear the result of a hundred battles.” -- Sun Tzu, 500 BC. Content-based Spam Filtering.

sanaa
Download Presentation

Adversarial Learning: Practice and Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adversarial Learning:Practice and Theory Daniel Lowd University of Washington July 14th, 2006 Joint work with Chris Meek, Microsoft Research “If you know the enemy and know yourself, you need not fear the result of a hundred battles.” -- Sun Tzu, 500 BC

  2. Content-based Spam Filtering From: spammer@example.com Cheap mortgage now!!! 1. Feature Weights cheap = 1.0 mortgage = 1.5 2. 3. Total score = 2.5 > 1.0 (threshold) Spam

  3. Good Word Attacks From: spammer@example.com Cheap mortgage now!!! Corvallis OSU 1. Feature Weights cheap = 1.0 mortgage = 1.5 Corvallis = -1.0 OSU = -1.0 2. 3. Total score = 0.5 < 1.0 (threshold) OK

  4. Outline • Practice: good word attacks • Passive attacks • Active attacks • Experimental results • Theory: ACRE learning • Definitions and examples • Learning linear classifiers • Experimental results

  5. Attacking Spam Filters • Can we efficiently find a list of “good words”? • Types of attacks • Passive attacks -- no filter access • Active attacks -- test emails allowed • Metrics • Expected number of words required to get median (blocked) spam past the filter • Number of query messages sent

  6. Filter Configuration • Models used • Naïve Bayes: generative • Maximum Entropy (Maxent): discriminative • Training • 500,000 messages from Hotmail feedback loop • 276,000 features • Maxent let 30% less spam through

  7. Comparison of Filter Weights “good” “spammy”

  8. Passive Attacks • Heuristics • Select random dictionary words (Dictionary) • Select most frequent English words (Freq. Word) • Select highest ratio: English freq./spam freq. (Freq. Ratio) • Spam corpus: spamarchive.org • English corpora: • Reuters news articles • Written English • Spoken English • 1992 USENET

  9. Passive Attack Results

  10. Active Attacks • Learn which words are best by sending test messages (queries) through the filter • First-N: Find n good words using as few queries as possible • Best-N: Find the best n words

  11. First-N AttackStep 1: Find a “Barely spam” message Original legit. Original spam “Barely legit.” “Barely spam” Hi, mom! now!!! mortgage now!!! Cheap mortgage now!!! Spam Legitimate Threshold

  12. First-N AttackStep 2: Test each word Good words “Barely spam” message Spam Legitimate Less good words Threshold

  13. Best-N Attack Key idea: use spammy words to sort the good words. Spam Legitimate Better Worse Threshold

  14. Active Attack Results(n = 100) • Best-N twice as effective as First-N • Maxent more vulnerable to active attacks • Active attacks much more effective than passive attacks

  15. Outline • Practice: good word attacks • Passive attacks • Active attacks • Experimental results • Theory: ACRE learning • Definitions and examples • Learning linear classifiers • Experimental results

  16. How to formalize? Q: What’s the spammer’s goal? A: Find the best possible spam message that gets through a spam filter. Q: How? A: By sending test messages through the filter to learn about it.

  17. Not just spam! • Credit card fraud detection • Network intrusion detection • Terrorist detection • Loan approval • Web page search rankings • …many more…

  18. - + X2 X2 x x X1 X1 X2 X1 Definitions Adversarial cost function Instance space Classifier a(x): X R a  A (e.g., more legible spam is better) c(x): X {+,} c C, concept class (e.g., linear classifier) X = {X1, X2, …, Xn} Each Xi is a feature Instances, x X (e.g., emails)

  19. X2 X1 - + Adversarial Classifier Reverse Engineering (ACRE) • Task: minimize a(x) subject to c(x) =  • Problem: the adversary doesn’t know c(x)!

  20. ? ? ? ? X2 - ? + ? ? ? X1 Adversarial Classifier Reverse Engineering (ACRE) • Task: minimize a(x) subject to c(x) =  • Given: Within a factor of k • Full knowledge of a(x) • One positive and one negative instance, x+ and x • A polynomial number of membership queries

  21. Adversarial Classifier Reverse Engineering (ACRE) • IF an algorithm exists that, for any aA, cC minimizes a(x) subject to c(x) = within factor k • GIVEN • Full knowledge of a(x) • Positive and negative instances, x+ and x • A polynomial number of membership queries • THEN we say that concept class C is ACRE k-learnable under a set of cost functions A

  22. X2 X1 Example: trivial cost function • Suppose A is the set of functions where: • m instances have cost b • All other instances cost b’ > b • Test each of the mb-cost instances • If none is negative, choose x - +

  23. Example: Boolean conjunctions • Suppose C is all conjunctions of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (x1 = T,x2 = F, x3 = F, x4 = T) Guess: (x1x2x3 x4)

  24. Example: Boolean conjunctions • Suppose C is all conjunctions of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (T,F, F, T) Guess: (x1x2x3 x4)

  25. Example: Boolean conjunctions • Suppose C is all conjunctions of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (T, F, F, T) x’ = (F, F, F, T) c(x’) =  Guess: (x1x2x3 x4)

  26. Example: Boolean conjunctions • Suppose C is all conjunctions of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (T, F, F, T) x’ = (T, T, F, T) c(x’) = + Guess: (x1x2x3 x4)

  27. Example: Boolean conjunctions • Suppose C is all conjunctions of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (T, F, F, T) x’ = (T, F, T, T) c(x’) =  Guess: (x1x2x3 x4)

  28. Example: Boolean conjunctions • Suppose C is all conjunctions of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn: x+= (T, F, F, T) x’ = (T, F, F, F) c(x’) = + Guess: (x1x2x3 x4) Final Answer: (x1x3)

  29. Example: Boolean conjunctions • Suppose C is all conjunctions of Boolean literals (e.g., x1x3) • Starting with x+, toggle each xi in turn • Exact conjunction is learnable in n queries. • Now we can optimize any cost function. • In general: concepts learnable with membership queries are ACRE 1-learnable

  30. Comparison to other theoretical learning methods • Probably Approximately Correct (PAC): accuracy over same distribution • Membership queries: exact classifier • ACRE: single low-cost, negative instance

  31. X2 xa X1 Linear Cost Functions Cost is weighted L1 distance from some “ideal” instance xa:

  32. X2 X1 Linear Classifier c(x) = +, iff(w x > T) Examples: Naïve Bayes, maxent, SVM with linear kernel

  33. X2 xa X1 Theorem 1: Continuous features • Linear classifiers with continuous features are ACRE (1+)-learnable under linear cost functions • Proof sketch • Only need to change the highest weight/cost feature • We can efficiently find this feature using line searches in each dimension

  34. x- xa c(x) wi wj wk wl wm Theorem 2:Boolean features • Linear classifiers with Boolean features are ACRE 2-learnable under uniform linear cost functions • Harder problem: can’t do line searches • Uniform linear cost: unit cost per “change”

  35. c(x) y x- xa wi wm wj wk wl c(x) y’ xa wi wj wk wl wp Algorithm Iteratively reduce cost in two ways: • Remove any unnecessary change: O(n) • Replace any two changes with one: O(n3)

  36. c(x) xa y wi wj wk wl wm x wp wr Proof Sketch (Contradiction) • Suppose there is some negative instance x with less than half the cost of y: • x’s average change is twice as good as y’s • We can replace y’s two worst changes with x’s single best change • But we already tried every such replacement!

  37. Application: Spam Filtering Spammer goal: minimally modify a spam message to achieve a spam that gets past a spam filter. Corresponding ACRE problem: spam filter linear classifier with Boolean features “minimally modify” uniform linear cost function

  38. Experimental Setup • Filter configuration (same as before) • Naïve Bayes (NB) and maxent (ME) filters • 500,000 Hotmail messages for training • > 250,000 features • Adversary feature sets • 23,000 English words (Dict) • 1,000 random English words (Rand)

  39. Results • Reduced feature set almost as good • Cost ratio is excellent • Number of queries is reasonable (parallelize) • Less efficient than good word attacks, but guaranteed to work

  40. Future Work • Within the ACRE framework • Other concept classes, cost functions • Other real-world domains • ACRE extensions • Adversarial Regression Reverse Engineering • Relational ACRE • Background knowledge (passive attacks)

  41. Related Work • [Dalvi et al., 2004] Adversarial classification • Game-theoretic approach • Assume attacker chooses optimal strategy against classifier • Assume defender modifies classifier knowing attacker strategy • [Kolter and Maloof, 2005] Concept drift • Mixture of experts • Theoretical bounds against adversary

  42. Conclusion • Spam filters are very vulnerable • Can make lists of good words without filter access • With filter access, better attacks are available • ACRE learning is a natural formulation for adversarial problems • Pick a concept class, C • Pick a set of cost functions, A • Devise an algorithm to optimize through querying

More Related