1 / 23

Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware

Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware. SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging Yanfang Ye, Lifei Chen, Dingding Wang, Tao Li, Qingshan Jiang, Min Zhao. MOTIVATION.

mnegron
Download Presentation

Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging Yanfang Ye, Lifei Chen, Dingding Wang, Tao Li, Qingshan Jiang, Min Zhao

  2. MOTIVATION • Urgent need to detect malicious executables • Major Threats • Metamorphic Executables • Reprograms itself • Capable of infecting two OS. • Polymorphic Executables • Emulates as Non-malicious code • Unseen Executables

  3. Need of the Hour • SBMDS String Based Malware Detection System • What this system is exactly all about?? • Performs Interpretable String Analysis Interpretable string is line of codes in a program which contains both API execution calls and important semantic strings representing the intent and goal of the program writer.

  4. Interpretable String??? • Eg: Worm “Nimda” “html script language = ‘javascript’ window.open(‘readme.eml’)” • Another Example: “&gameid= %s&pass=%s; myparentthreadid=%d; myguid=%s” • But all Strings are not interpretable Eg: “!0&0h0m0o0t0y0” “*3d%3dtgyhjij”,

  5. Major Steps to perform • Constructing the interpretable strings by developing a feature parser. • Performing feature selection to select informative strings. • Using SVM ensemble with bagging to construct the classifier. • Conducting the malware detector, also predict the exact type of the malware.

  6. Step 1 • Develop Feature parser 39,838 executable collected from Kingsoft Anti-virus lab. All executables are PE files. Extract static features API calls from import table. Strings carrying semantic interpretation.

  7. SAMPLE (Backdoor-Redgirl.exe) ‘%s’ goto delete” always implicates that the malware may generate the “.bat” file to suicide

  8. Step 2 • Feature Selection Selects only interpretable strings from the huge set of strings obtained from previous step. Assign these strings as signatures of the PE files.

  9. Step 3 • Using SVM to CLASSIFY Why SVM ?? • Have showed state-of-art results in classification problem. Problem: training complexity of SVM dependent on size of data set.

  10. Problem Training Accuracy becomes Constant when size of dataset reaches 3000

  11. Curse of Dimensionality?? • Problem caused by the exponential increase in volume of data. • How does SVM deals with “Curse of Dimensionality” • Solution: By Using SVM ensemble & • Bagging • SVM ensemble and Bagging???

  12. 3.1 SVM Ensemble with Bagging • Ensemble is a set of classifiers whose individual decisions are combined in some way to classify new samples. • Bagging technique on the training set “BAGGING” (Bootstrap AGGregating) • Uniform sampling of training data set

  13. 3.2 Multi-Classification • Various classes of Malwares. • To select the identical values from two different classes method of “MAJORITY VOTING” is used. • Smallest index is chosen 1= Backdoors 2= Spywares 3= Trojans 4= Worms 0= Benign files

  14. STEP 4: Malware Detection • Unknown variants of malwares are used. • Malicious or not. • To which class Malware belongs to.

  15. System Architecture 1.Feature Parser 2. Feature Selection 3. SVM Ensemble Classifier 4. Malware Detector

  16. Reason why I Chose This paper • Comparisons With the Popular Anti- Virus Software. Points of Comparisons: • Detecting Known Variants of Malware. • Detecting Unknown Variants. • Efficiency (Detection Time). • Number of False positive Detections.

  17. Detecting Known Variants

  18. Detecting Unknown Variants

  19. Efficiency (Detection Time)

  20. Number of False Positives

  21. Conclusion • This system has been already incorporated into the scanning tool of a commercial Anti- Virus software. • Anti-Virus Name not Disclosed.

  22. Questions?????

  23. All Well that Ends Well THANK YOU

More Related