1 / 28

Machine Learning for Identification of P2P Traffic

Machine Learning for Identification of P2P Traffic. Victor Gau Yi-Hsien Wang 2007.12.07. Performance Metrics. False positive rate: X/N X: number of non-P2P flows which were detected as P2P flow N: number of P2P flows Detection rate: Y/N Y: number of P2P flows which were detected correctly.

oriel
Download Presentation

Machine Learning for Identification of P2P Traffic

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine Learning forIdentification of P2P Traffic Victor Gau Yi-Hsien Wang 2007.12.07

  2. Performance Metrics • False positive rate: X/N • X: number of non-P2P flows which were detected as P2P flow • N: number of P2P flows • Detection rate: Y/N • Y: number of P2P flows which were detected correctly

  3. Review • T. Karagiannis et. al. (UC Riverside) • Based on the five-tuple key {source IP, destination IP, protocol, source port, destination port}and 64-second flow timeout, examine two primary heuristics: • TCP/UDP IP pairs • {IP, port} pairs

  4. Methodology • Based on the five-tuple key {source IP, destination IP, protocol, source port, destination port}and 64-second flow timeout, examine two primary heuristics: • TCP/UDP IP pairs • {IP, port} pairs

  5. TCP/UDP IP Pairs • Look for pairs of source-destination hosts that use both TCP and UDP, • Excluding

  6. {IP, Port} Pairs • for the advertised destination {IP, port} pair of host A, • the number of distinct IPs connected to host A will be equal to • the number of distinct ports used to connect to host A. 2 IPs = 2 Ports {B, 15} {C, 10}

  7. Exclusion • For HTTP server, a client will initiate usually more than one concurrent connection in order to download objects in parallel. • A higher ratio of the number of distinct ports versus number of distinct IPs • 4 ports / 2 IPs = 2 {B, 15} {B, 30} {C, 10} {C, 20}

  8. Using Machine Learning • Backpropagation Neural Network (BPNN)

  9. Architecture for P2P Traffic Identification

  10. Feature Selection • Src IP, port • Dest IP, port • Service type • TCP flags field (ACK, SYN, FIN, …) • Time To Live (TTL) • Flow duration • Packet size per flow • Packet number per flow • Packet rate per flow • …

  11. Flow Characteristics

  12. Packet Size Std. Dev. Per Flow Short duration Long duration

  13. Observation • Package size switching frequency • The number of times that difference between current packet size and previous packet size exceeds a threshold.

  14. Distribution of Flow’s Packet Size Switching Frequency

  15. WWW Flow Behavior

  16. Streaming Flow Behavior

  17. Gaming Flow Behavior

  18. BT Flow Behavior

  19. eDonkey Flow Behavior

  20. Gnutella Flow Behavior

  21. Feature Selection • Average packet size per flow • Packet size switching frequency per flow • Packet size Std. Dev. per flow • Number of packet per flow • Total bytes per flow • Flow duration

  22. BPNN Architecture for P2P Flow Identification

  23. Learning

  24. Performance Comparison • T. Karagiannis et. al. • Detection rate: 95% • False rate: 8%~12% • BPNN • Detection rate: 96% • False positive rate: 2.7%

  25. Datasets From NLANR(http://pma.nlanr.net/Special/sdsc1.html)

  26. Flows

  27. Packets

  28. MBytes

More Related