Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic

Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic Stefan Kornexl1, Vern Paxson2, Holger Dreger1, Anja Feldmann1, Robin Sommer1 1TU München, 2ICSI/LBNL Internet Measurement Conference (IMC) 2005

Reference • Stenfan Kornel, Vern Paxson, Holger Dreger, Anja Feldmann, Robin Sommer, “Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic,” 5th ACM IMC 2005. • “High-Performance Packet Recording for Network Intrusion Detection,” master thesis by Stefan Kornexl, 2005. • Time Machine webpage • http://www.net.t-labs.tu-berlin.de/research/tm/ Speaker: Li-Ming Chen

Outline • Motivation and Goals • Feasibility Study (trace-driven simulation) • System Architecture • Performance Evaluation • Conclusion and Comments Speaker: Li-Ming Chen

Motivation • The availability of packet recording is considered a big benefit for network security monitoring • Security forensics • Determining how an attacker compromised a given host • Network trouble-shooting • Inspecting the precursors to a fault after the fault • Event correlation • NIDS could analyze past events that are not considered “interesting” until more recently seen traffic hinted at their relevance Speaker: Li-Ming Chen

Problems • Looking at raw packets (not only headers but full contents) • Storage constrains • In many operational environments, it’s infeasible to capture the entire traffic stream due to the enormous volume of the traffic • Problems for data filtering • Hard to decide beforehand what context will turn out to be relevant retrospectively to investigate incidents • Filtering still becomes technically problematic in high speed network (n TB per day) • Data retrieval is like finding needle in a haystack • It’s time-consuming and cumbersome Speaker: Li-Ming Chen

Related Work • Brute-force bulk-recording • Only in low volume environments • Record those packets that trigger alerts • Do not support retrospective analysis of a problematic host’s earlier activity • Sampling – might loose important evidence • Data abstraction – provide less information Speaker: Li-Ming Chen

Objection • Design and implement a packet recording system “Time Machine” • Use dynamic packet filtering and buffering to enable effective recording of large traffic stream • Nearly complete historic data for several days • Allowing to conveniently “travel back in time” • Application: • E.g., a forensic tool – to extract detailed past information about unusual activities once they are detected Speaker: Li-Ming Chen

The Approach • Observation (key insight): • “Heavy-tailed” distribution in network traffic • Most network connections are quite short • Only a small number of large connections accounting for the bulk of the total volume • Compromising is at the beginning of most attacks • For forensics and trouble-shooting applications the beginning of a large connection contains the most significant information Speaker: Li-Ming Chen

The Approach (cont’d) • Exploit the “heavy-tailed” nature to partition the traffic stream into a small subset of high interest vs. a large remainder of low interest • Then record the small subset and discard the rest • Cutoff limit, N: • For every connection, it buffers up to the first N bytes of traffic • Greatly reduce the traffic we must buffer • Retain full context for small connections and the beginning for large connections Speaker: Li-Ming Chen

Design Goals for the Time Machine • Provide raw packet data • Buffer traffic comprehensively • Prioritize traffic • Automated resource management • Efficient and flexible retrieval • Suitable for high-volume environments using commodity hardware Speaker: Li-Ming Chen

Outline • Motivation and Goals • Feasibility Study (trace-driven simulation) • System Architecture • Performance Evaluation • Conclusion and Comments Speaker: Li-Ming Chen

Environments • MWN • Munich Scientific Research Network in Munich, Germany • About 50,000 hosts, 2 TB/day • 15-20% FTP traffic • 350 Mbps (68 Kpps) at busy-hour • LBNL • Lawrence Berkeley National Laboratory in California, USA • About 9,000 hosts & 4,00 users • 320 Mbps (37 Kpps) at busy-hour • NERSC • National Energy Research Scientific Computing Center • About 600 hosts & 2,000 users (dominated by large transfers) • 260 Mbps (43 Kpps) at busy hour Speaker: Li-Ming Chen

Datasets • Connection-level summaries (1 week) collected by Bro NIDS • MWN – 355 million connections (from 2004/10/18) • LBNL – 22 million connections (from 2005/2/7) • NERSC – 4 million connections (from 2005/4/29) • These logs capture the nature of their environments but with a relatively low volume compared to full packet-level data • Use packet-buffer model to simulate packet-level communication and evaluate the memory requirements of a Time Machine Speaker: Li-Ming Chen

Heavy-tailed Distribution and the Cutoff ≈ 90% connections (record) Cutoff = 20 KB Their bytes: (discard) NERSC 99.86% LNBL 96% MWN 87% 12% 14% 15% ≈ 10% connections (discard) (log-log scaled) Speaker: Li-Ming Chen

Evaluate the Memory Requirements • Eviction time, Te : • How long the buffer stores each connection’s data • (The goal) aim for a value of Te on the order of days rather than minutes • Changing the cutoff N and the eviction time Te to evaluate the efficiency (feasibility) of a Time Machine • Results: using a cutoff of 10-20 KB, buffering several days of traffic is practical Speaker: Li-Ming Chen

Required Memory for LBNL Increase the duration of data availability by a factor of 32 (3h vs. 4d) Stop to increase after 4 days, since the constrain of eviction time Te 68 GB 64 GB 5th day Speaker: Li-Ming Chen

Required Memory for NERSC NERSC has large proportion of high-volume traffic (14% connections -> 99.86% bytes) • Without a cutoff, the • volume is spiky • Te only is helpless • for volume because • of the intermittent • bursts of traffic 344 GB 14.9 GB Speaker: Li-Ming Chen

Required Memory for MWN MWN has lower fraction of bytes in the larger connections (15% connections -> 87% bytes) The gain from the cutoff is not quite as large, likely due to the larger fraction of HTTP traffic Speaker: Li-Ming Chen

Outline • Motivation and Goals • Feasibility Study (via trace-driven simulation) • System Architecture • Performance Evaluation • Conclusion and Comments Speaker: Li-Ming Chen

Time Machine System Architecture 4 Main Functions : 2. migrating the buffered packets to disk and managing the associated storage 1. buffering traffic using a cutoff 4. enabling customization Speaker: Li-Ming Chen 3. providing flexible retrieval of subsets of the packets

Two-thread Architecture • Separates user interaction from recording to ensure that packet capture has higher priority than packet retrieval Speaker: Li-Ming Chen

Packet Capture • The capture unit • Receive packets from network tap and passes them on to the classification unit • Use libpcap packet capture library to collect and store each packet’s full content and capture timestamp • libpcap can specify a kernel-level BPF (BSD Packet Filter) capture filter to discard “uninteresting” traffic as early as possible Speaker: Li-Ming Chen

Classification • The classification unit • Divide the incoming packet stream into user-defined classes • Assign packets to different storage containers based on their classes • Responsible for monitoring the cutoff with the help of the connection tracking unit • Connection tracking unit keeps per connection statistics and checks if the connection the packet belongs to has exceeded its cutoff threshold • An example of the “telnet” class: name BPF filter (rule) priority cutoff Speaker: Li-Ming Chen memory and disk buffer size

Storage Containers • The architecture supports customization by splitting the overall storage into several storage containers • Each storage container is responsible for storing a subset of packets within the resources (memory/disk) • According to the user defined classes • RAM and disk buffers are implemented as two ring buffers • Packets evicted from the RAM buffer are migrated to the disk buffer • And eventually be deleted Speaker: Li-Ming Chen

Indexing • For efficient retrieval • Use an index across all packets stored in all storage containers • Each index manages a list of time intervals for every unique key value • Update [Tstart, Tend] for each key (each incoming packets) • The time intervals provide information on whether packets with that key value are available in a given storage container and at what starting timestamp • Just scan linearly through the intervals it gets from the index • Multiple indexes • Support any number of indexes over an arbitrary set of protocol header fields Speaker: Li-Ming Chen

Query Processing • Provides a flexible language to express queries for subsets of the packets • Each query consists of a logical combination of time ranges, keys, and an optional BPF filter • Check index, get the time range of the query. • Locate the time ranges in the storage containers using binary search • Scanning all packets in the identified time ranges and checking if they match the query • Writing the results to a tcpdump trace file on disk Speaker: Li-Ming Chen

User Interface • Allows the user to configure the recording parameters • Classification rules, cutoff, storage management, indexing Δt, etc. • Issues queries to the query processing unit to retrieve subsets of the recorded packets Speaker: Li-Ming Chen

Outline • Motivation and Goals • Feasibility Study (via trace-driven simulation) • System Architecture • Performance Evaluation • Conclusion and Comments Speaker: Li-Ming Chen

Evaluation in LBNL • Configuration: • 3 classes, each with • a 20KB cutoff: • TCP 90GB • UDP 30GB • Others 10GB • Retention: • The distance back in time to which we • can travel at any particular moment • Increases after the Time Machine starts • until the disk buffers have filled • Correlates with the incoming bandwidth for each class • and its variations due to diurnal and weekly effects Speaker: Li-Ming Chen

Evaluation • In LBNL • 98% of the traffic gets discarded • The remainder imposes an a average (maximum) rate of 300 KB/s (2.6 MB/s) • Over the 2 weeks of operation libpcap reported only 0.016% of all packets dropped • In MWN • 85% of the traffic gets discarded • Average (maximum) rate of 3.5 MB/s (13.9 MB/s) • larger volume of HTTP traffic • Issues: need to more aggressively exploit the classification and cutoff mechanisms to appropriately manage the large fraction of HTTP traffic Speaker: Li-Ming Chen

Conclusion • A concept of a Time Machine for efficient network packet recording and retrieval is proposed • Relies on the “heavy-tailed” nature of network traffic • Record most connections in their entirety and skip the bulk of the total volume • Time Machine • Can buffer several days of raw high-volume traffic using commodity hardware • Provides an efficient query interface • Automatically manages its available strorage • Using a trace-driven simulation and real experience to demonstrate the effectiveness Speaker: Li-Ming Chen

Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic

Building a Time Machine for Efficient Recording and Retrieval of High-Volume Network Traffic

Presentation Transcript

Building Scalable and High Efficient Java Multimedia Collaboration

Space-Efficient Algorithms for Document Retrieval

Protomatching Network Traffic for High Throughput Network Intrusion Detection

A Network Virtual Machine for Real-Time Coordination Services

Efficient Network Flooding and Time Synchronization with Glossy

BUILDING EFFICIENT NETWORK INFRASTRUCTURES: OUR EXPERIENCE

Volume Recording

A Generic Framework for Efficient and Effective Subsequence Retrieval

Determination of time of recording with Electric Network Frequency (ENF)

A Distributed Indexing Strategy for Efficient XML Retrieval

A Network Virtual Machine for Real-Time Coordination Services

Traffic Building

High Speed Network Monitoring and Traffic Analysis

High-Fidelity Real-Time Modeling and Simulation of Network Traffic Processes

Protomatching Network Traffic for High Throughput Network Intrusion Detection

Memory-efficient Virtual Machine High Availability

Hierarchical Organization of Shapes for Efficient Retrieval

Time & Attendance Recording

Extruding machine for sale High Speed, High Volume Production at Low Cost

Scope and Time of Recording (Chapter I)

Efficient Case Retrieval

CNC Machining Parts for Low Volume and High Volume