1 / 27

Do humans beat computers at pattern recognition? Andra Miloiu Costina

Do humans beat computers at pattern recognition? Andra Miloiu Costina. Spam Analyst. What do you think?. Do humans beat computers at pattern recognition? NO YES. What is the correct answer?.  NO!.  NO!.  NO!.

vina
Download Presentation

Do humans beat computers at pattern recognition? Andra Miloiu Costina

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Do humans beat computers at pattern recognition? AndraMiloiuCostina Spam Analyst

  2. What do you think? Do humans beat computers at pattern recognition? • NO • YES

  3. What is the correct answer?

  4.  NO!

  5.  NO!

  6.  NO! • Each time we answered “NO” one of the following automated signatures mechanism was designed: • Patterns extraction; • Lines detection; • Cluster base rules generation; • Automated signatures creation;

  7. Why aren’t we all on a beach?

  8. PATTERN EXTRACTION • Short description: • Thus the mechanism is conceptually divided into four steps: one that finds groups of similar emails – layout based filtering, a second that extracts information for each group – a pattern discovery algorithm, a third that determines the utility of each extracted feature – a version of the Relief algorithm, and finally one that fits the pieces together, creating the signatures – a genetic algorithm. • - Pattern extraction mechanism like Teiresias and basic suffix tree • - Pro & cons: +It was among the first methods of automated pattern extraction that we designed. –It was difficult to use and an analyst would have finished the signature a lot faster; • Stats: It brought an increase in our detection rate of 2%.

  9. What did we do next? …LINES DETECTION

  10. LINES DETECTION(1) • How did spam looked at that time? • Almost a year and a half ago, spam waves took a new turn. The number of lines in a spam message decreased to 1 or 2 spammy lines and one URL.

  11. LINES DETECTION(2) • This type of waves came in such big numbers that it affected our response time, therefore we thought of implementing a system which would sign these spams in a shorter period of time.

  12. LINES DETECTION • Short description: • Basically the mechanism worked in three steps: • Extracting the pattern represented by a relevant text line; • Each line was associated with the number of apparitions and the it was sorted descending; • Automated signatures ware created for the top relevant lines. • - Pattern extraction mechanism: • Based on a predefined set of key words, the program would extract the lines containing relevant information;

  13. LINES DETECTION For instance:

  14. LINES DETECTION • -While in use, this system increased our response time by 6.4% and helped us sign a series of spam waves which otherwise would have taken an analyst much more time to handle. • The C.O.D. was mainly the decreasing number of spam waves bearing the same relevant phrases in more than 40% of the cases. • The different statements used to express the same point : “Buy Replica Watches”, made us change the perspective on how to create lasting signatures.

  15. RIGHT NOW… CLUSTER BASED RULES GENERATION & AUTOMATED SIGNATURES CREATION

  16. CLUSTER BASE RULES GENERATION • Short description: • Mails are clustered; • The clusters are seen by an analyst; • 3. The analyst adds a simple content related pattern and creates the signature; • - Pattern extraction mechanism • In comparison with the previously described system which was entirely based on the content of a spam message, the cluster base rules rely on patterns belonging to the email’s template, such as: the body summary, the date format, the number of URL or the number of separators found in the subject.

  17. CLUSTER BASE RULES GENERATION - Pro & Cons The great advantage given by this system is it’s universal appliance. There are no messages that can’t be clustered. Therefore the predefined set o features are calculated for each email. The features based on the email’s template alone are not enough to mark an email as spam, as more and more of these messages copy the template used by regular/legit emails. Hence we are working on new features that will allow the cluster based rules to tag emails as spam without the intervention of an analyst.

  18. AUTOMATED SIGNATURES CREATION • Short description: • Until a few month ago we were considering that an automated pattern extraction mechanism wouldn’t be very efficient taking into account the current variety found in spam belonging to the same wave. • By simplifying the process we get 4 steps: • Extracts patterns from a pool of spam; • Sorts them by the number of apparitions; • Creates automated signatures; • Tests the newly created signs; • Sends them for a FP test;

  19. AUTOMATED SIGNATURES CREATION • - Pattern extraction mechanism • If the line extraction mechanism was based on a set of keywords to define the relevant phrases, this system extracts almost all the lines from a spam message (body and headers). Afterwards it eliminates the patterns which contain only html tags or lines shorter than a predefined threshold. • Pro & Cons • +Helps decrease the reaction time; • +Doesn’t create FPs; • -It still needs an analyst to validate the resulting signatures;

  20. Overview All these systems are a step closer toward a fully automated mechanism of creating signatures. The most important advantage brought is that of better reaction time and an increase of the detection rate by 5%-10%. There are noFPs, as all the systems in use are overlooked by analysts and they make the final decision of whether a signature is good or not.

  21.  NO! What methods of automated pattern recognition have you developed?

  22. What do you think? Do humans beat computers at pattern recognition? • NO • YES

  23. If (YES) { ANALYSTS RULE }

  24. ANALYSTS TEAM Short description: We are a team of 10 people, full of enthusiasm and desire of putting an end to spam. What makes us great? Our enhanced senses of recognizing patterns.

  25. ANALYSTS TEAM • - Pros & Cons • + We can find a pattern in any given spam; • + We know when is safe to say “This is spam”; • + We adapt to any situation; • + We can predict certain evolution of spam waves and be proactive about it; • + We can maintain a detection rate of over 97%; • We are expensive; • We have a longer reaction time ; • We sometimes make mistakes… we’re just humans after all;

  26. A few ..conclusions • Automated pattern extraction mechanisms • - Shorter reaction time; • Work only for some spam waves; • - Are less expensive; • Analysts team • Longer reaction time; • Can extract a pattern for any spam wave; • Cost a lot;

  27. Q&A Andra Miloiu amiloiu@bitdefender.com

More Related