1 / 26

BotGraph: Large Scale Spamming Botnet Detection

BotGraph: Large Scale Spamming Botnet Detection. Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum. Speaker: 林佳宜. References.

roddy
Download Presentation

BotGraph: Large Scale Spamming Botnet Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker:林佳宜

  2. References Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum, BotGraph: Large Scale Spamming Botnet Detection , in The 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI '09), USENIX, April 2009

  3. Outline • Introduction • BotGraph Architecture • Random Graph Theory • Hierarchical algorithm • False Positive Analysis • Conclusion

  4. Introduction • Design and implement a novel system called BotGraph to detect a new type of botnet spamming attacks targeting major Web email providers. • Two months of Hotmail log containing over 500 million users. • Identified over 26 million botnetcreated user accounts with a low false positive rate.

  5. Date and Environment • Each record in the input log data contains three fields: UserID, IPAddress, and Login Timestamp. • The implementation is based on the existing distributed computing models such as MapReduce and DryadLINQ • Using the same 240-machine cluster in the experiments.

  6. BotGraph Architecture • BotGraph has two components: • aggressive sign-up detection • stealthy botuser detection based on their login activities

  7. Detection of Aggressive Signups • A sudden increase of signup activities is suspicious. • EWMA algorithm to detect sudden changes in signup activities.

  8. Detection of Stealthy Bot accounts • The sharing of one IP address • Multiple bot-users must log in from a common bot • The sharing of multiple IP addresses • Each account needs to be assigned to different bots • Multiple shared IP addresses in the same AutonomousSystem (AS) are only counted as one sharedIP address.

  9. Graph-Based Bot-User Detection • Use random graph modelsto analyze the user-user graph,and design a hierarchical algorithm to extract such components formed by bot-users.

  10. Random Graph Theory • G(n, p) as the random graph model • n-vertex graph by simply assigning an edgeto each pair of vertices with probability p ∈ [0, 1] • G(n, p) has average degree d = n · p • If d < 1, high probability the largest component in the graph has size less than O(log n). • If d > 1, high probability the largest ,component in the graph has size O(n).

  11. spammers for assigning bot-accounts to bots • Consider the following three typical strategies • Bot-user accounts are randomly assigned to bots • The spammer assigns k available bot-users for bot request. a bot makes only one request for k bot-users each day • no limit on the number of bot-users a bot can request for one day and that k = 1

  12. Simulate assigning strategies • Simulate the above typical spamming strategies and construct the corresponding user-user graph • model1:10000 acount 500 bot • model2:pick k = 20 • model3:assume the bots go online with a Poisson arrival distribution and the length of bot online time fits a exponential distribution

  13. Result T is a transition point. Model 2 has a transition value of T = 2. Model 1 and 3 have the same transition value of T = 3. Normal users usually cannot form large components with more than 100 nodes.

  14. Extracting the graph components • From the user-user graph generated with some predefined threshold T • Need to handle the following issues • Hard to choose a single fixed threshold of T • Bot-users from different bot-user groups may be in thesame connected component • Exist connected components of normal users

  15. Partitioned data by IP addresses

  16. Partitioned data by user IDs

  17. Hierarchical algorithm

  18. Bot-user groups

  19. Confidence measures • BotGraph computes two histograms from a 30-day email log: • h1: the numbers of emails sent per day per user. • h2: the sizes of emails. • Computes two statistics, s1 and s2, from the normalized histograms to quantify their differences: • s1: the percentage of users who sent more than 3 emails per day; • s2: the areas of peaks in the normalized email-size histogram, or the percentage of users who sent out emails with a similar size. • Boths1 and s2 are in the range of [0, 1] and can be used as confidence measures

  20. Bot-User Pruning

  21. Performance Evaluation[1/2]

  22. Performance Evaluation[2/2]

  23. False Positive Analysis • Naming Patterns • User-name template. such names are ‘w9168d4dc8c5c25f9” and ‘x9550a21da4e456a2”. • Only 0.44% of the identified bot-users do not strictly follow the naming templates. • Signup Dates • Only 0.08% bot-users were signed up before year 2007 • About 59.1% of bot-account of all the input dataset signed up before 2007 • Adjust the false positive rate to be 0.08%/59.1% = 0.13%

  24. Naming pattem score

  25. Conclution • BotGraph is implemented as a parallel Dryad/DryadLINQ application running on a large-scale computer cluster • Using two-month Hotmail logs, BotGraph successfully detected more than 26 million botnet accounts • The experience will be useful to a wide category of applications for constructing and analyzing large graphs

  26. Thank you

More Related