1 / 43

Rule-based Anomaly Detection on IP Flows

Rule-based Anomaly Detection on IP Flows. Nick Duffield, Partick Haffner, Balachander Krishnamurthy (AT&T) , Haakon Ringberg (Princeton Univ.) INFOCOM’09. Rule actions. protocol. Source IP & port. direction. Destination IP & port. Detail of rule. Message text. Packet size.

Download Presentation

Rule-based Anomaly Detection on IP Flows

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rule-based Anomaly Detection on IP Flows Nick Duffield, Partick Haffner, Balachander Krishnamurthy (AT&T), Haakon Ringberg (Princeton Univ.) INFOCOM’09

  2. Rule actions protocol Source IP & port direction Destination IP & port Detail of rule Message text Packet size Patterns in packet’s payload Snort • Snort is a powerful, flexible open source NIDS  Rule-based Anomaly Detection on Packets • A Snort rule: • alertudp$EXTERNAL_NET any->$HOME_NET 1434 (msg:"MS-SQL version ove…"; dsize:>100; content:"|04|"; …) Speaker: Li-Ming Chen

  3. Challenge for deployingSnortover a Large Network (e.g., a Tier-1 ISP) • Deploy at the edge: • Network scale is huge •  Deployment issues • Deploy at the core: • Links capacity is high •  Performance issues • Hundreds of rules may need to be operated concurrently for each packet Speaker: Li-Ming Chen

  4. Idea: Rules for IP Flows ! • Does it possible to construct rules at the flow level that accurately reproduce the action of packet-level rules ? • e.g., alerts should be raised for a flow, if some packets of this flow trigger packet-level rules • Why? • Easy to have IP flows • ISPs already collect flow statistics ubiquitously (e.g., NetFlow) • More scalable Speaker: Li-Ming Chen

  5. Think about Rules for IP Flows… (1/2) • If packet-level rule looks like: • alertudp$EXTERNAL_NET any->$HOME_NET 1434 (msg:"MS-SQL version ove…";dsize:>100;content:"|04|"; …) • In flow-level, maybe we can do: • AlertUDP flows come from $EXTERNAL_NET to $HOME_NET at port 1434 with mean packet size larger than 100 • Yes, we ignore the content !! • Although we don’t know the exact packet size, we can measure mean packet size of each flow !? • What’s the detection accuracy !? Speaker: Li-Ming Chen

  6. Think about Rules for IP Flows… (2/2) • What about packet-level rule is: • alerticmpany any->any any (msg:"ICMP Dest. Unreachable Comm. Administratively Prohibited"; icode:13; itype:3; …) • In flow-level, what can do? • ICMP destination unreachable is generated by the host or its inbound gateway to inform the client that the destination is unreachable for some reason • e.g., every packet points to IP address A will trigger this event • Can we LEARN this kind of events? Speaker: Li-Ming Chen

  7. Motivation & Goal • For NIDS, inspecting every packet would be ideal, but impractical • Signature-based NIDS has scale and performance problems • Goal: develop an architecture that can translate many existing packet signature to instead operate effectively on IP flows • Premise: flow statistics are compact and collected within most ISPs’ network Speaker: Li-Ming Chen

  8. Build Flow Rules via Learning • Authors use machine learning (ML) approaches to learn the association between flow features and packet payload • Problem: • Flows: aggregate packet header information, while lose payload information •  Flow rules: loss of accuracy !? •  Does ML mitigate the impact of losing payload information !? Speaker: Li-Ming Chen

  9. Outline • Motivation & Goal • Packet Rule Classification • Packet Rules  Flow Rules • Dataset & Evaluation Methodology • Experimental Results • Real Deployment Issues • Conclusion & My Comments Speaker: Li-Ming Chen

  10. Why to classify packet rules?Packet Rule Classification (1/3) • Not all packet rules can be effectively learned… • Using a taxonomy of packet rules to understand their impacts, and • Evaluate the performance of proposed ML-method • For example: • ML-method can learn perfectly …? • ML-method is likely to learn very well …? • The accuracy of ML-method varies based on the nature of the rule…? Speaker: Li-Ming Chen

  11. What kinds of predicates in a packet rule?Packet Rule Classification (2/3) • 3 set of predicates consist a packet rule • FH (flow header): packet fields exactly reported in the flow record • PP (packet payload): content signature • MI (meta information): other packet header information that is reported either inexactly or not at all in the flow record alertudp$EXTERNAL_NET any->$HOME_NET 1434 (msg:"MS-SQL version ove…"; dsize:>100; content:"|04|"; …) (FH) (FH) (FH) (FH) (FH) (MI) (PP) Speaker: Li-Ming Chen

  12. How to classify packet rules?Packet Rule Classification (3/3) • Partition packet rules into disjoint classes • Classify rules based on types of predicates present Other rules (noPP, do have MI, may include FH) Rules comprise onlyFH predicates rule Rules include at least onePP predicates Speaker: Li-Ming Chen

  13. Outline • Motivation & Goal • Packet Rule Classification • Packet Rules  Flow Rules • Dataset & Evaluation Methodology • Experimental Results • Real Deployment Issues • Conclusion & My Comments Speaker: Li-Ming Chen

  14. Rules in Practice FH, MI & PP • Snort rules: • A Boolean formula composed of predicates that check for specific values of various fields present in the IP header, transport header, and payload • Features used to construct flow rules in this paper: • Src. port, Dst. port, • Src. IP address, Dst. IP address, • #packets, #bytes, mean packet size, • duration, mean packet interarrival time, • TCP flags, protocol, ToS. Speaker: Li-Ming Chen

  15. Packet Rules  Flow Rules Packets Snort Snort alerts e.g., NetFlow IP flows Build training data ML -method Flow rules (associate the packet alert with the corresponding flow) Speaker: Li-Ming Chen

  16. Packet Rules  Flow Rules (detailed) • For eachSnort rule, • training data  (xi, yi), flow i has flow • features xi, and yi = {–1, 1} indicates • where flow i triggered this snort rule. • then we can run ML algo. by minimizing • the classification error: Snort Snort alerts Assign each Snort rule a score Give each feature a weight. Learn these weights to minimize training error. Build training data ML -method (xi, yi) Flow rules Speaker: Li-Ming Chen

  17. Learning Flow Rules • Note that • A single packet may raise multiple Snort alerts •  individual flows can be associated with many Snort alerts • Machine learning algorithms • Choose AdaBoost as the candidate algorithm • Due to, actual number of features is large • AdaBoost use incremental greedy training procedure to only adds features needed for finer discrimination • Good generalization (than SVM) • Low level of noise in the training data Speaker: Li-Ming Chen

  18. Outline • Motivation & Goal • Packet Rule Classification • Packet Rules  Flow Rules • Dataset & Evaluation Methodology • Experimental Results • Real Deployment Issues • Conclusion & My Comments Speaker: Li-Ming Chen

  19. Dataset (during Aug ~ Sep 2005) OC-3 link • 29 days (4 weeks) • Total: >106 flows, >5 TBytes. • Average rate: 2 MBytes/sec. • Average: 14.5 pkt/flow. • 55% of flows comprised 1 pkt ! • For machine learning: • Week 1: training • Week 2: training & testing • Week 3 & 4: testing border router (all) Packets unsampled NetFlow IP flows Speaker: Li-Ming Chen

  20. Dataset (learning performance…!?) Number of flows (106) per week • Further speedup: • Remove deterministic features  reduce # of training data • 1) remove flows whose source is part of local network • 2) Snort rules only apply to a single protocol  train for specific protocol (TCP, UDP, ICMP) Normal flows: Anomalous flows: (Neg: True Negative, Pos: True Positive) Amount of unique examples is small ( speed up training) Speaker: Li-Ming Chen

  21. Evaluation Criteria • A detection is a boolean action (T or F ?) • For each rule, we get a confidence score after testing by a classifier •  require an threshold to determine T or F • Use precision and recall as evaluation criteria • Precision = TPk/(TPk + FPk) • Average Precision =>  value closer to 1 is better ! Speaker: Li-Ming Chen

  22. Evaluation Methodology • Focus on 21 most triggered rules over wk 1 & 2 • Refer to next slide! • Compare the AP (Avg. Precisions) for: • 1) Baseline behavior • Training on one full week and testing on the subsequent week • E.g., wk1-2  training on wk 1 and testing on wk 2. • 2) Data drift • Determine how often re-training should be applied (e.g., wk1-3) • 3) Sampling of negative example • Normal flows are the majority • Reduce normal flows keep accuracy while reduce training time !? Speaker: Li-Ming Chen

  23. 1 3 4 9 10 15 20 (Snort alerts) Show the complexity of a unique flow ICMP content? flag size flag See alert details Speaker: Li-Ming Chen

  24. 1 3 4 9 10 15 20 Header • Data Draft: • 2-week drift is acceptable • 3-week drift  loss of performance • especially for Meta-Info & Payload Meta-Info Payload  Payload rules show great variability Speaker: Li-Ming Chen

  25. 1 3 4 9 10 15 20 Header • Sampling of Negative (normal) Example: • measurable loss in performance • while 6x faster in training Meta-Info Payload Speaker: Li-Ming Chen

  26. What features are more important than others? Feature is removed during detection • Payload rules are hard to reproduced • in a flow setting. • some rules have several predicates • (that could be learned) Speaker: Li-Ming Chen

  27. Outline • Motivation & Goal • Packet Rule Classification • Packet Rules  Flow Rules • Dataset & Evaluation Methodology • Experimental Results • Real Deployment Issues • Conclusion & My Comments Speaker: Li-Ming Chen

  28. Architecture • Other issues: • Can rules learned from a site be used for other sites? • Some flow features (e.g., duration) are link/network dependent… Speaker: Li-Ming Chen

  29. Other issues • Computational efficiency • Initial correlation of Flows and Snort Alarms • AdaBoost parameter setup, and learning time • Run-time classification Speaker: Li-Ming Chen

  30. Conclusion Speaker: Li-Ming Chen

  31. My Comments Speaker: Li-Ming Chen

  32. Back to evaluation Appendix – 21 Snort Rules used in this paper From snort-rules-version

  33. Header (1/2) Back to evaluation • 1) alert icmp any any -> any any (msg:"ICMP Destination Unreachable Communication Administratively Prohibited"; icode:13; itype:3; classtype:misc-activity; sid:485; rev:4;) • 2) alert icmp any any -> any any (msg:"ICMP Destination Unreachable Communication with Destination Host is Administratively Prohibited"; icode:10; itype:3; classtype:misc-activity; sid:486; rev:4;) Speaker: Li-Ming Chen

  34. Header (2/2) • 3)alert icmp $EXTERNAL_NET any -> $HOME_NET any (msg:"ICMP Source Quench"; icode:0; itype:4; classtype:bad-unknown; sid:477; rev:2;) Speaker: Li-Ming Chen

  35. Meta-Information (1/3) • 4) alert icmp $EXTERNAL_NET any -> $HOME_NET any (msg:"ICMP webtrends scanner"; icode:0; itype:8; content:"|00 00 00 00|EEEEEEEEEEEE"; reference:arachnids,307; classtype:attempted-recon; sid:476; rev:4;) • 5)alert tcp $EXTERNAL_NET any -> $HOME_NET any (msg:"BAD-TRAFFIC data in TCP SYN packet"; flow:stateless; dsize:>6; flags:S,12; reference:url,www.cert.org/incident_notes/IN-99-07.html; classtype:misc-activity; sid:526; rev:11;) Speaker: Li-Ming Chen

  36. Meta-Information (2/3) • 6) alert icmp $EXTERNAL_NET any -> $HOME_NET any (msg:"ICMP Large ICMP Packet"; dsize:>800; reference:arachnids,246; classtype:bad-unknown; sid:499; rev:4;) • 7) alert icmp $EXTERNAL_NET any -> $HOME_NET any (msg:"ICMP PING NMAP"; dsize:0; itype:8; reference:arachnids,162; classtype:attempted-recon; sid:469; rev:3;) Speaker: Li-Ming Chen

  37. Meta-Information (3/3) • 8) alert tcp $EXTERNAL_NET any -> $HOME_NET any (msg:"SCAN FIN"; flow:stateless; flags:F,12; reference:arachnids,27; classtype:attempted-recon; sid:621; rev:7;) • 9) 111 || 8 || spp_stream4: FIN Stealth Scan • gid: 111  Snort Pre-processor, 4th stream pre-processor • alert id: 8 Speaker: Li-Ming Chen

  38. Payload (1/6) • 10) alert udp $EXTERNAL_NET any -> $HOME_NET 1434 (msg:"MS-SQL version overflow attempt"; flowbits:isnotset,ms_sql_seen_dns; dsize:>100; content:"|04|"; depth:1; reference:bugtraq,5310; reference:cve,2002-0649; reference:nessus,10674; classtype:misc-activity; sid:2050; rev:8;) • 11) alert tcp $AIM_SERVERS any -> $HOME_NET any (msg:"CHAT AIM receive message"; flow:to_client; content:"*|02|"; depth:2; content:"|00 04 00 07|"; depth:4; offset:6; classtype:policy-violation; sid:1633; rev:6;) Speaker: Li-Ming Chen

  39. Payload (2/6) • 12) 2376 || EXPLOIT ISAKMP first payload certificate request length overflow attempt || bugtraq,9582 || cve,2004-0040 • 13) 483 || ICMP PING CyberKit 2.2 Windows || arachnids,154 • 14) 480 || ICMP PING speedera Speaker: Li-Ming Chen

  40. Payload (3/6) Speaker: Li-Ming Chen

  41. Payload (4/6) Speaker: Li-Ming Chen

  42. Payload (5/6) Speaker: Li-Ming Chen

  43. Payload (6/6) Speaker: Li-Ming Chen

More Related