fighting spam an innovative enhancement to outlook express l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Fighting Spam: An Innovative Enhancement to Outlook Express PowerPoint Presentation
Download Presentation
Fighting Spam: An Innovative Enhancement to Outlook Express

Loading in 2 Seconds...

play fullscreen
1 / 13

Fighting Spam: An Innovative Enhancement to Outlook Express - PowerPoint PPT Presentation


  • 285 Views
  • Uploaded on

Fighting Spam: An Innovative Enhancement to Outlook Express Zhengxiang Pan & Yuanbo Guo Target: Outlook Express Current anti-spam functionalities in OE: Blocked senders list Mail rules Limitations: Limited Rule-based filter Difficulties in generate rules Lack of flexibility

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Fighting Spam: An Innovative Enhancement to Outlook Express' - Sharon_Dale


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
target outlook express
Target: Outlook Express
  • Current anti-spam functionalities in OE:
    • Blocked senders list
    • Mail rules
  • Limitations: Limited Rule-based filter
    • Difficulties in generate rules
    • Lack of flexibility
    • Not adaptive: spam mutate!
      • Free -> F r e e -> F*r*e*e
what did we design
What did we design?
  • An Intelligent Spam Identification Component (ISIC) that use IDSS techniques, specifically CBR.
  • Absorb ideas from rule-based and statistical filter
  • Featuring dynamical attributes selection and heuristic-guided case base maintenance
case representation
Case Representation
  • Attribute-Value Pairs
    • possible values: Yes and No
  • Two sets of attributes
    • 51 predefined attributes
      • about specific properties of an email
      • selected from http://www.spamassassin.org
    • 100 dynamically determined attributes
      • About word occurrences in the email
dynamically determined attributes
Dynamically Determined Attributes
  • Attribute Selection
    • Use Odd-Ratio as the indicator of the predicative power of a word for the categories (spam, non-spam) and rank them
    • Select the top 50words from each vocabulary of spam emails and non-spam emails as the attributes

lots of details in the paper

an example case
An Example Case

Case 1:

(predefined attributes)

CHARSET_FARAWAY = No

TO_EMPTY = Yes

FROM_AND_TO_SAME = Yes

LOTS_OF_CC_LINE = Yes

MISSING_HEADERS = Yes

(dynamically selected attributes)

Free = Yes

Guaranteed = Yes

Debt = Yes

Hello = No

(solution)

Spam = Yes

similarity measurement
Similarity Measurement
  • Simple Matching Coefficiency (SMC) based on Hamming Distance

SIMH (P, C) = ∑i=1..NEQ(Xi, Yi) / N

EQ(Xi, Yi) = 1 if Xi = Yi;

0 otherwise.

case retrieval
Case Retrieval
  • K-Nearest Neighbor like algorithm
    • For a new email P, calculate its similarity SIMH to each case in the case base, and pick out the top K cases with the largest SIMH values.
    • If the majority of those chosen cases are labeled as spam, the new email will be classified as spam too; otherwise non-spam;
    • e.g. K = 5
case base maintenance
Case Base Maintenance
  • Initially spam and non-spam base each has 200 cases
  • When case base size reaches 300
    • restore the case base size back using a mechanism which removes those cases that are
      • Old (to keep the freshness of cases so that they reflect the trend)
      • Close to “Center Case” (in an attempt to boost the variety of cases)
        • Introduced a new concept “Center Case”. Defined in the paper.
    • Redo attribute selection based on current cases
architecture

Outlook Express API

GUI

Case Base

Case Base Manager

Classifier

Attribute Selector

Parser

Email Repository Manager

Email

Repository

Components of the ISIC system

Architecture
conclusion
Conclusion
  • Highlights:
    • Localized & easy to construct
    • Personalized
    • Easy to use
    • Adaptive
  • Limitations
    • Initial cases limit personalization
    • Not for standalone use: on top of current OE