Andrew Williams & Nikunj Kela

Ranking Attackers Through Network Traffic Analysis Andrew Williams & Nikunj Kela

Agenda • Background • Tools We've Developed • Our Approach • Results • Future Work

Background: The Problem Setting 1: Corporate Environment • Large number of attackers • How do you prioritize which attacks to investigate? RSA

Background: The Problem Setting 2: Hacking Competitions • How do you know who should win?

Background: Information Available • Network Traffic Captures • Alerts from Intrusion Detection Systems (IDS) • Application and Operating System Logs

Background: Traffic Captures • HUGE volumes of data • A complete history of interactions between clients and servers* Information available: • Traffic Statistics • Info on interactions across multiple servers • How traffic varies with time • Everything up to and including application layer info**

Background: IDS Alerts • Messages indicating that a packet matches the signature of a known malicious one • Still a fairly large amount of data • Same downsides as anti-virus programs, but most IDS signatures are open source! • If IDS is compromised, these might not be available Information available: • Indication that known attacks are being launched • Alert Statistics • How alerts vary with time

Background: Application/OS Logs • Ex: mysql logs, apache logs, Windows 7 Security logs, ... • Detailed, application-specific error messages and warnings • Large amount of data • If a server is compromised, logs may not be available • Information available: • Very detailed information with more context • Access to errors/issues even if traffic was encrypted

Background: iCTF 2010 Contest • 72 teams attempting to compromise 10 servers • Vulnerabilities include SQL Injection, exploitable off-by-one errors, format string exploits, and several others* • Pretty complex set of rules Dataset from competition: • 27 GB of Network Traffic Captures • 46 MB of Snort Alerts (from competition) • 175 MB of Snort Alerts (generated with updated rulesets) • No Application or OS Logs More information on the contest can be found here: http://www.cs.ucsb.edu/~gianluca/papers/ctf-acsac2011.pdf

Tools We've Developed We wrote scripts to... • Parse the large amount of data: • Extract network traffic between multiple parties • Filter out less important Snort Alerts • Track connection state to generate statistics and stream data • Visualize the data • Show all of the alerts and flag submissions with respect to time • Analyze the data • Pull out the transaction distances and find statistics on them • Generate Application and OS Logs • Replay network traffic to live virtual machine images

Our Approach: Intuition Vulnerability Discovery Phase Identify the type of vulnerability Vulnerability Exploitation Phase Refine the attack string It is quite intuitive that a skilled attacker will come up with the attack-string in less time than an unskilled attacker How do we know if the attacker has broken into the system? We only have logs to work with! Time taken to break into the system reflects the learning capabilities of an attacker Fast learner implies good attacker

Our Approach: Identify the attack string • Once the attacker break into the system, he/she would use the same attack string almost every time to gather information • We observed from the traffic logs that in most of the cases, the attacker used one TCP stream to break into the system One TCP connection for each attempt! • We chose Levenshtein distance (Edit Distance) as our metric to compare the two TCP communication from attacker to server • Consecutive zero as the distance between TCP data means the attacker has successfully broken into the system

Stream1: "%27%20or%20%27%27%3D%27%0Alist%0A" Stream2: "%27%20OR%20%27%27%3D%27%0ALIST%0A" Stream3:"asdfasd%20%27%20UNION%20SELECT%20%28%27secret.txt%27%29 %3B%20--%20%20%0AMUGSHOT%0ASADF%0A" Stream4:"asdfasd%20%27%20UNION%20SELECT%20%28%27secret.txt%27%29 %3B%20--%20%20%0AMUGSHOT%0A39393%0A" Stream5:"asdfasd%20%27%20UNION%20SELECT%20%28%27secret.txt%27%29 %3B%20--%20%20%0AMUGSHOT%0A1606%0A" S Example: Identify the attack string Stream6: "asdfasd%20%27%20UNION%20SELECT%20%28%27secret.txt%27%29 %3B%20--%20%20%0AMUGSHOT%0A1606%0A"

Our Approach: Features Selection • Time taken to successfully break into the system • Mean and standard deviation of the distances between consecutive TCP streams • Number of attempts before successfully breach into the service • Length of the largest sequence of consecutive zero's

Result: Distance-Time Plot

Interesting Findings from the contest • Although the contest involved only attacking the vulnerable services, yet the teams tried to break into each others systems • We noticed that teams shared the Flag value with each other through the chat server • The active status of the service was maintained through a complex petri-net system and most of the teams struggled to understand it • Hints about different vulnerabilities in the services were released time to time through out the contest by the administrators

Future Work • Use of data mining tools(e.g. SAS miner) to analyse the relationships among the features • Use of data mining tools for developing a scoring systems to give scores to each teams based on the feature set • Continue improving the replay script to handle the large number of connections

Thank You! Questions?

Image Sources WooThemes, free for commercial use Icons-Land, free for non-commercial use Fast Icon Studio, used with permission

Andrew Williams & Nikunj Kela