Detecting adversaries using metafeatures
Download
1 / 16

Detecting Adversaries Using Metafeatures - PowerPoint PPT Presentation


  • 153 Views
  • Uploaded on

Detecting Adversaries Using Metafeatures. Chad Mills Program Manager Windows Live Safety Platform Microsoft. Example Messages. Content Filter. Assumption: Spam words continue to appear in spam messages Good words continue to appear in good messages. m illion dollars t ransfer

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Detecting Adversaries Using Metafeatures' - ivanbritt


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Detecting adversaries using metafeatures l.jpg

Detecting Adversaries Using Metafeatures

Chad Mills

Program Manager

Windows Live Safety Platform

Microsoft



Content filter l.jpg
Content Filter

Assumption:

  • Spam words continue to appear in spam messages

  • Good words continue to appear in good messages

million

dollars

transfer

guardian

(dollars, 0.2)

(million, 0.1)

(transfer, 0.1)

(community, -0.01)

(social, -0.01)

(fellow, -0.01)

(guardian, 0.03)

(March, -0.08)

0.37

March

community

social

fellow

-0.11


Slide4 l.jpg

<style>

<br Bij board bar atteindre jYST GCS re sonrisa fuse Kiviuq padded />

<br Star Honolulu />

<br Ons apporter />

opens NRSU syringe />

<br Jerusalem comfort HTTPS 2604 confidence Miles />

<br 27 mails Qty backwards Meditations bans sedative ect salve <br insightful />

Korean relations header greeting Airllines Phantom CVS Rae 504 1009 perf<br graphiques />

undertaking paced Liquidation reduction />

From: "Chelsea Clark" <easyMoneySurveys@pointracer.com>

Subject: Get PaidFor yourOpinion



Finding good words l.jpg
Finding good words

Free

Nigeria

Viagra

+

=

Good Message

Spammy Words

Borderline Spam Message

+

late

click

commissioner

late

click

commissioner

=

Unknown Words

Inbox

Good Words

Borderline Spam

+

newsletter

select

month

newsletter

select

month

=

Unknown Words

Junk Folder

Non-Good Words

Borderline Spam



Application chaff l.jpg
Application: Chaff

Chaff Spam

  • [spam content]

  • newsletterpeersmonthselectthese

  • lateclickcommissionermedia

  • smoothlyoffclosesupport before

  • okaysponsorrockgoby ads

  • nonecasestextmembership

    Legitimate Mail

    MarchisallabouttheZune community. This month, you can help create a new featureforTheSocial, gettips from afellow Zuneuserandfind out the winners of theYour Zune Your Choice Awards.




Example metafeatures l.jpg
Example Metafeatures

  • Sum of weights (content filter score)

  • Average weight

  • Standard Deviation

  • Percent of words that are good

  • Percent of words that are spam

  • Number of features

  • Maximum feature weight

  • Number of strong spam words

  • Etc.


Metafeatures l.jpg
Metafeatures

Metafeatures

Features

Sum: 0.37

σ: 0.09

Max: 0.2

million

dollars

transfer

guardian

(dollars, 0.2)

(million, 0.1)

(transfer, 0.1)

(community, -0.01)

(social, -0.01)

(fellow, -0.01)

(guardian, 0.03)

(March, -0.08)

1.9

Sum: -0.11

σ: 0.04

Max: -0.1

March

community

social

fellow

-1.7

(feature, weight)

(Metafeature, weight)

(Sum:0.37, 1.0)

(σ: 0.09, 0.8)

(Max: 0.2, 0.1)

(Sum: -0.11, -0.8)

(σ: 0.04, -0.6)

(Max: -0.1, -0.3)


Evaluation data l.jpg
Evaluation Data

  • Hotmail Feedback Loop

    • Messages classified by recipients

  • Training Set: 1,800,000 messages

    • Ending on 5/20/07

  • Evaluation Set: 50,000 messages

    • Data from 5/21/07


Evaluation results l.jpg
Evaluation Results

45% improvement in TP at low FP levels


Qualitative results l.jpg
Qualitative Results

  • At a reasonable False Positive rate:

    • 98% of unique catches are chaff spam

    • Caught 99.5% of chaff spam missed by regular content filter

    • Similar types of False Positives as regular filter

  • Challenges Remaining

    • Primarily just helped on spam with chaff

    • Relies on base content filter to detect spam with obfuscated content (e.g. v1agra) or naïve spam without any chaff


Conclusions l.jpg
Conclusions

  • Spam messages with good word chaff have unnatural weight distributions

  • Metafeatures is able to identify and catch these messages

  • This resulted in a 45% improvement in TP

  • Gains were limited to spam with good word chaff


ad