Fighting spam an innovative enhancement to outlook express l.jpg
Sponsored Links
This presentation is the property of its rightful owner.
1 / 13

Fighting Spam: An Innovative Enhancement to Outlook Express PowerPoint PPT Presentation


  • 225 Views
  • Updated On :
  • Presentation posted in: Internet / Web

Fighting Spam: An Innovative Enhancement to Outlook Express Zhengxiang Pan & Yuanbo Guo Target: Outlook Express Current anti-spam functionalities in OE: Blocked senders list Mail rules Limitations: Limited Rule-based filter Difficulties in generate rules Lack of flexibility

Related searches for Fighting Spam: An Innovative Enhancement to Outlook Express

Download Presentation

Fighting Spam: An Innovative Enhancement to Outlook Express

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Fighting Spam: An Innovative Enhancement to Outlook Express

Zhengxiang Pan &

Yuanbo Guo


Target: Outlook Express

  • Current anti-spam functionalities in OE:

    • Blocked senders list

    • Mail rules

  • Limitations: Limited Rule-based filter

    • Difficulties in generate rules

    • Lack of flexibility

    • Not adaptive: spam mutate!

      • Free -> F r e e -> F*r*e*e


What did we design?

  • An Intelligent Spam Identification Component (ISIC) that use IDSS techniques, specifically CBR.

  • Absorb ideas from rule-based and statistical filter

  • Featuring dynamical attributes selection and heuristic-guided case base maintenance


Case Representation

  • Attribute-Value Pairs

    • possible values: Yes and No

  • Two sets of attributes

    • 51 predefined attributes

      • about specific properties of an email

      • selected from http://www.spamassassin.org

    • 100 dynamically determined attributes

      • About word occurrences in the email


Predefined Attributes - Examples


Dynamically Determined Attributes

  • Attribute Selection

    • Use Odd-Ratio as the indicator of the predicative power of a word for the categories (spam, non-spam) and rank them

    • Select the top 50words from each vocabulary of spam emails and non-spam emails as the attributes

lots of details in the paper


An Example Case

Case 1:

(predefined attributes)

CHARSET_FARAWAY = No

TO_EMPTY = Yes

FROM_AND_TO_SAME = Yes

LOTS_OF_CC_LINE = Yes

MISSING_HEADERS = Yes

(dynamically selected attributes)

Free = Yes

Guaranteed = Yes

Debt = Yes

Hello = No

(solution)

Spam = Yes


Similarity Measurement

  • Simple Matching Coefficiency (SMC) based on Hamming Distance

SIMH (P, C) = ∑i=1..NEQ(Xi, Yi) / N

EQ(Xi, Yi) = 1 if Xi = Yi;

0 otherwise.


Case Retrieval

  • K-Nearest Neighbor like algorithm

    • For a new email P, calculate its similarity SIMH to each case in the case base, and pick out the top K cases with the largest SIMH values.

    • If the majority of those chosen cases are labeled as spam, the new email will be classified as spam too; otherwise non-spam;

    • e.g. K = 5


Case Base Maintenance

  • Initially spam and non-spam base each has 200 cases

  • When case base size reaches 300

    • restore the case base size back using a mechanism which removes those cases that are

      • Old (to keep the freshness of cases so that they reflect the trend)

      • Close to “Center Case” (in an attempt to boost the variety of cases)

        • Introduced a new concept “Center Case”. Defined in the paper.

    • Redo attribute selection based on current cases


Outlook Express API

GUI

Case Base

Case Base Manager

Classifier

Attribute Selector

Parser

Email Repository Manager

Email

Repository

Components of the ISIC system

Architecture


Use enhanced Outlook Express

Same UI as OE


Conclusion

  • Highlights:

    • Localized & easy to construct

    • Personalized

    • Easy to use

    • Adaptive

  • Limitations

    • Initial cases limit personalization

    • Not for standalone use: on top of current OE


  • Login