1 / 8

An ANN approach to identify malicious URLs

An ANN approach to identify malicious URLs. ECE 539 – Final Project Jayneel Gandhi. Motivation. Prevent users from visiting malicious webpage Lot of effort into reducing internet crimes Try to learn which URL is malicious from different sources

cisco
Download Presentation

An ANN approach to identify malicious URLs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An ANN approach to identify malicious URLs ECE 539 – Final Project Jayneel Gandhi

  2. Motivation • Prevent users from visiting malicious webpage • Lot of effort into reducing internet crimes • Try to learn which URL is malicious from different sources • Stop users from accessing such website in future

  3. Data Set (1) • Developed by SysNet group at University of California at San Diego • Posted at UCI Machine Learning Repository http://archive.ics.uci.edu/ml/datasets/URL+Reputation

  4. Data Set (2) • Feature Space is made up of: • Lexical Features • Hostname • Primary Domain • Path Tokens • Host Based Features • WHOIS info • IP prefix • Geographical • Feature Vector (sparse): 3,231,961 • Number of instances: 2,396,130 HUGE data set !!! Takes long time to run … in the range of 20-30 days

  5. Learning Model Source: Sysnet group webpage at University of California, San Diego

  6. Experiments (1) • Data set organized as URLs visited over the period of 121 days (Day0-Day120) • Each day has roughly 15,000-40,000 URLs visited • I will only be running experiments on Day0 consisting of 16000 URLs

  7. Experiment (2) • Experiment 1 • Use single perceptron model • Online learning possible • Has history of all the URLs visited is preserved • Experiment 2 • Use Support Vector Machine (SVM) • Online learning not possible • Can only learn based on certain past history • Losses certain history with time

  8. Thank You…

More Related