280 likes | 351 Views
This study explores the use of Machine Learning, specifically Backpropagation Neural Network (BPNN), in identifying P2P traffic based on flow characteristics and feature selection methods. Performance metrics such as false positive rate and detection rate are evaluated, alongside dataset comparisons. The methodology includes analyzing TCP/UDP IP pairs and exclusion criteria for identifying P2P behaviors. The study delves into flow behaviors of various applications such as streaming, gaming, and BT, utilizing unique metrics for accurate identification.
E N D
Machine Learning forIdentification of P2P Traffic Victor Gau Yi-Hsien Wang 2007.12.07
Performance Metrics • False positive rate: X/N • X: number of non-P2P flows which were detected as P2P flow • N: number of P2P flows • Detection rate: Y/N • Y: number of P2P flows which were detected correctly
Review • T. Karagiannis et. al. (UC Riverside) • Based on the five-tuple key {source IP, destination IP, protocol, source port, destination port}and 64-second flow timeout, examine two primary heuristics: • TCP/UDP IP pairs • {IP, port} pairs
Methodology • Based on the five-tuple key {source IP, destination IP, protocol, source port, destination port}and 64-second flow timeout, examine two primary heuristics: • TCP/UDP IP pairs • {IP, port} pairs
TCP/UDP IP Pairs • Look for pairs of source-destination hosts that use both TCP and UDP, • Excluding
{IP, Port} Pairs • for the advertised destination {IP, port} pair of host A, • the number of distinct IPs connected to host A will be equal to • the number of distinct ports used to connect to host A. 2 IPs = 2 Ports {B, 15} {C, 10}
Exclusion • For HTTP server, a client will initiate usually more than one concurrent connection in order to download objects in parallel. • A higher ratio of the number of distinct ports versus number of distinct IPs • 4 ports / 2 IPs = 2 {B, 15} {B, 30} {C, 10} {C, 20}
Using Machine Learning • Backpropagation Neural Network (BPNN)
Feature Selection • Src IP, port • Dest IP, port • Service type • TCP flags field (ACK, SYN, FIN, …) • Time To Live (TTL) • Flow duration • Packet size per flow • Packet number per flow • Packet rate per flow • …
Packet Size Std. Dev. Per Flow Short duration Long duration
Observation • Package size switching frequency • The number of times that difference between current packet size and previous packet size exceeds a threshold.
Feature Selection • Average packet size per flow • Packet size switching frequency per flow • Packet size Std. Dev. per flow • Number of packet per flow • Total bytes per flow • Flow duration
Performance Comparison • T. Karagiannis et. al. • Detection rate: 95% • False rate: 8%~12% • BPNN • Detection rate: 96% • False positive rate: 2.7%
Datasets From NLANR(http://pma.nlanr.net/Special/sdsc1.html)