Fighting spam an innovative enhancement to outlook express l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 13

Fighting Spam: An Innovative Enhancement to Outlook Express PowerPoint PPT Presentation


  • 216 Views
  • Updated On :
  • Presentation posted in: Internet / Web

Fighting Spam: An Innovative Enhancement to Outlook Express Zhengxiang Pan & Yuanbo Guo Target: Outlook Express Current anti-spam functionalities in OE: Blocked senders list Mail rules Limitations: Limited Rule-based filter Difficulties in generate rules Lack of flexibility

Related searches for Fighting Spam: An Innovative Enhancement to Outlook Express

Download Presentation

Fighting Spam: An Innovative Enhancement to Outlook Express

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Fighting spam an innovative enhancement to outlook express l.jpg

Fighting Spam: An Innovative Enhancement to Outlook Express

Zhengxiang Pan &

Yuanbo Guo


Target outlook express l.jpg

Target: Outlook Express

  • Current anti-spam functionalities in OE:

    • Blocked senders list

    • Mail rules

  • Limitations: Limited Rule-based filter

    • Difficulties in generate rules

    • Lack of flexibility

    • Not adaptive: spam mutate!

      • Free -> F r e e -> F*r*e*e


What did we design l.jpg

What did we design?

  • An Intelligent Spam Identification Component (ISIC) that use IDSS techniques, specifically CBR.

  • Absorb ideas from rule-based and statistical filter

  • Featuring dynamical attributes selection and heuristic-guided case base maintenance


Case representation l.jpg

Case Representation

  • Attribute-Value Pairs

    • possible values: Yes and No

  • Two sets of attributes

    • 51 predefined attributes

      • about specific properties of an email

      • selected from http://www.spamassassin.org

    • 100 dynamically determined attributes

      • About word occurrences in the email


Predefined attributes examples l.jpg

Predefined Attributes - Examples


Dynamically determined attributes l.jpg

Dynamically Determined Attributes

  • Attribute Selection

    • Use Odd-Ratio as the indicator of the predicative power of a word for the categories (spam, non-spam) and rank them

    • Select the top 50words from each vocabulary of spam emails and non-spam emails as the attributes

lots of details in the paper


An example case l.jpg

An Example Case

Case 1:

(predefined attributes)

CHARSET_FARAWAY = No

TO_EMPTY = Yes

FROM_AND_TO_SAME = Yes

LOTS_OF_CC_LINE = Yes

MISSING_HEADERS = Yes

(dynamically selected attributes)

Free = Yes

Guaranteed = Yes

Debt = Yes

Hello = No

(solution)

Spam = Yes


Similarity measurement l.jpg

Similarity Measurement

  • Simple Matching Coefficiency (SMC) based on Hamming Distance

SIMH (P, C) = ∑i=1..NEQ(Xi, Yi) / N

EQ(Xi, Yi) = 1 if Xi = Yi;

0 otherwise.


Case retrieval l.jpg

Case Retrieval

  • K-Nearest Neighbor like algorithm

    • For a new email P, calculate its similarity SIMH to each case in the case base, and pick out the top K cases with the largest SIMH values.

    • If the majority of those chosen cases are labeled as spam, the new email will be classified as spam too; otherwise non-spam;

    • e.g. K = 5


Case base maintenance l.jpg

Case Base Maintenance

  • Initially spam and non-spam base each has 200 cases

  • When case base size reaches 300

    • restore the case base size back using a mechanism which removes those cases that are

      • Old (to keep the freshness of cases so that they reflect the trend)

      • Close to “Center Case” (in an attempt to boost the variety of cases)

        • Introduced a new concept “Center Case”. Defined in the paper.

    • Redo attribute selection based on current cases


Architecture l.jpg

Outlook Express API

GUI

Case Base

Case Base Manager

Classifier

Attribute Selector

Parser

Email Repository Manager

Email

Repository

Components of the ISIC system

Architecture


Use enhanced outlook express l.jpg

Use enhanced Outlook Express

Same UI as OE


Conclusion l.jpg

Conclusion

  • Highlights:

    • Localized & easy to construct

    • Personalized

    • Easy to use

    • Adaptive

  • Limitations

    • Initial cases limit personalization

    • Not for standalone use: on top of current OE


  • Login