Innovative Algorithms for Network Traffic Identification

Algorithms for Identification of Network Data Streams • flow length in packets Data space • mean packet size Cluster of atypical traffic Cluster of typical traffic Purchased Signatures Alarms of signature changes Staleness Detector Signature Factory New signatures Packets of changed signature 5-tuple, packet size, … Router Jun Li*, and Peter Rabinovitch** *Carleton University, **Bell Labs, Alcatel-Lucent Supervisor: Dr. Yiqiang Q. Zhao (Carleton University) INTRODUCTION ALGORITHMS (CONT.) RESULTS (CONT.) • Background: • There is too much traffic in the Internet and identifying accurately its essential traits is a challenging problem. Existing techniques typically rely on manually generated signatures specified in packet headers, which makes traffic identification tests relatively simple. However, it lacks the flexibility required to deal with the constant changes in network traffic patterns. • Problems: • How to constantly sense/detect changes of network traffic streams • How to identify suspicious traffic streams without pre-specified signatures • Can we generate network traffic signatures automatically (i.e., without consumption of a network expert’s power) • Allocate network resources only when needed • Proposed Solution: • AutoImmune System: an Intelligent IP Service Infrastructure Fig. 2: Clustering 2-dimensional Data Fig. 6: Flow Clustering and Classification in S1 Fig. 5: Change Detection in S3 • Signature Extraction • A signature-based algorithm similar to Bro [2], SNORT [3], and based on [4] • Only the cluster of atypical traffic is examined for extracting signatures RESULTS In an implemented system, 20 computers are connected through Router (shown in Fig. 1) and communicate multimedia traffic. Staleness Detector and Signature Factory connect Router and run separately. Five types of traffic flows are Web, Mix, Smtp, VoIP, and Video. The statistics of the traffic flows are shown in Table 1. AUTOIMMUNE SYSTEM ARCHITECTURE Fig. 7: Flow Clustering and Classification in S2 Fig. 8: Flow Clustering and Classification in S3 As traffic flows through router, Staleness Detector monitors the characteristics of traffic and triggers an alarm if the behavior has changed significantly. The alarm starts a process on Signature Factory, which clusters the flows matching the alarmed signature into groups. The new cluster is analyzed for signature. The new signatures are merged with purchased signatures, and then the new set of signatures is tested against a corpus of end user traffic. Table 3: Numerical Values of Parameters Table 1: Five Types of Traffic Flows CONCLUSION Network speed is assumed to be 1 Gbps. At the beginning of simulation, each computer generates traffic without Mix flows. When simulation enters steady state, Mix flows start to be generated on each computer with a specified proportion shown in Table 2. The payload of each Mix packet is injected with a synthetic worm. The injected Mix traffic is of Web type while passing through the router. Fig. 1: AutoImmune Architecture AutoImmune addressees a more general traffic stream identification problem that needs complex packet-payload based membership tests without pre-specified signature sets. We implemented AutoImmune by integrating the three developed algorithms, and tested the system against simulated data traffic. The system runs very well in various networking environments for non-stationary traffic streams. It adapts automatically to changes in the characteristics of network traffic and identifies new types of traffic patterns almost in real time. (It takes less than 10 seconds in a Gbps communication network to obtain a new traffic pattern). Simulation results showed that the system successfully identifies a new type of network traffic, which occupies as small as 0.2% of total network traffic. To the best of our knowledge, the lowest reachable worm detection rate that has been reported in the literature is 1.1% by a worm detection system referred to as DoWitcher. The smaller the percentage of the new type of traffic is, the longer the time spent for identifying the new type of signature is. ALGORITHMS • Change Detection • The algorithm (in Staleness Detector) keeps a dictionary of data elements that are deemed useful in predicting future data elements. New data points that are not well explained by this dictionary are signaled as alarms. For each new data point • Compute distance from this point to the points already in a dictionary • If this point is very far, then set Red Alarm • If it is somewhat far, then set OrangeAlarm • If it is close, then no alarm • Periodically, evaluate Orange Alarms,and clean up dictionary A related study to our change detection algorithm is [1]. Table 2: Proportions of Traffic Flows Define the following parameters for each simulation run: T -- Period from when malware (e.g., Mix traffic) starts until new signature is obtained by Router N -- Number of items in the Cluster of atypical traffic N’ -- Number of items in the atypical traffic Cluster that are NOT malware (or of Mix type) MEAN -- Mean of the length (in Bytes) of packets in the Cluster of atypical traffic L – Length (in Bytes) of the signature extracted REFERENCES [1] T. Ahmed, M. Coates and A. Lakhina, Multivariate online anomaly detection using kernel recursive least squares, in Proc. IEEE INFOCOM, Anchorage, AK, May 2007. [2] Paxson, Vern, “Bro: A System for Detecting Network Intruders in Real-Time,” Lawrence Berkeley National Laboratory Proceedings, the 7th USENIX Security Symposium, Jan. 26-29, 1998, San Antonio TX. [3] Roesch, Martin, “Snort - Lightweight Intrusion Detection for Networks,” Proc. USENIX Lisa '99, Seattle: Nov. 7-12,1999. [4] F. Hao, M.S. Kodialam, T.V. Lakshman, and H. Zhang, “Fast Payload-Based Flow Estimation for Traffic Monitoring and Network Security,” in Proc. ANCS 2005, Oc. 26-28, 2005, New Jersey, USA. • Data Clustering and Classification • The algorithm (in Signature Factory) classifies test data points into two clusters, typical and atypical traffic clusters. • The data space is split into small regions • Obtaining TWO density estimates for each region • The proportion of known observations • The proportion of test observations • The observations in areas that have a nil (or very small) estimate under typical traffic, but a relatively large estimate assuming test traffic, are classified as atypical traffic. ACKNOWLEDEMENT This research was supported in part by the MITACS Internship Program. The authors would like to acknowledge the contributions made by KatrinaRogers-Stewart, Yihui Tang, and Pin Yuan. Fig. 3: Change Detection in S1 Fig. 4: Change Detection in S2

Innovative Algorithms for Network Traffic Identification

Innovative Algorithms for Network Traffic Identification

Presentation Transcript

Managing Data Streams

Data Streams

Network algorithms

Algorithms for data streams Lecture 2

Data Streams

Data Streams

Mining Data Streams

Active Mining of Data Streams

Algorithms for Data Streams

Processing Continuous Network-Data Streams

Algorithms for geometric data streams

Identification of the Network

Harnessing the Strengths of Anytime Algorithms for Constant Data Streams

Privacy Preservation for Data Streams

Estimating Entropy for Data Streams

Data Streams

Algorithms for Network Security

Data Mining for Data Streams

Mining Data Streams

Algorithms for geometric data streams

Data Structures & Algorithms Network Flow

Innovative Algorithms for Network Traffic Identification