1 / 31

Ranveer Chandra and Dina Katabi

Learning Communication Rules. Srikanth Kandula. Ranveer Chandra and Dina Katabi. Network Admins. are Groping in the Dark. Besides focusing on volume, learn rules underlying the traffic. (Active) user browsing web, reading/sending mail (Automatic) SMS scan on a network, outlook refresh.

mimi
Download Presentation

Ranveer Chandra and Dina Katabi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Communication Rules Srikanth Kandula Ranveer Chandra and Dina Katabi

  2. Network Admins. are Groping in the Dark Besides focusing on volume, learn rules underlying the traffic • (Active) user browsing web, reading/sending mail • (Automatic) SMS scan on a network, outlook refresh Focus on Traffic Volume • TCP=80%, HTTP=30% • Adapt report categories (e.g., AutoFocus) • Much traffic from ports 500-600 But, What’s Going On? Traffic follows plan? Misconfigurations Suspicious Traffic

  3. X X Y X Y Y X t Whenever flowy happens, flowx is likely to occur flowY flowX Rule (http  DNS) If you could learn such rules directly from a trace, • Infer the actual behavior of applications • AFS root servers direct traffic to volume servers evenly • mail to the incoming MX, is forwarded onto group MXes • Notice misconfigurations and badness • these clients shld not be talking on known command-control ports this server shld not be responding to DHCP requests • this mail server shld not attempt connectionsto non-existent MXes

  4. Report all significant rules with no specific knowledge about a trace

  5. Mining for Rules is Hard • eXpose • A scoring function for significance • Heuristics that bias search toward high hit-rate • Empirical validation on enterprise traces • How to define significance? • When is a group of flows interesting enough to report? • Avoid observer biasbut cannot evaluate everything • Focus on one server, miss what you are not looking for • Practical, deal with noise, search quickly

  6. Overview Activity Matrix Packet Trace Rules • Packet trace to Activity Matrix • Rows are 1s windows; Columns are flows • Is flow active in [timei-1, timei )? (at least one packet) • Association rule mining (X,Y are r.v. for columns) • Need not worry about interleaving • Dependencies are at these time-scales (an rtt, a server response) All windows in [.25s, 2s] range yield similar rules

  7. X  Y Which Rules are Significant? • High Joint Probability? • X, Y may occur very often individually (e.g., breeze, sun shining) • High Conditional Probability? • Say Y occurs only when X does, but both are rare (lottery, buy a jet)

  8. Which Rules are Significant? Kerberos Reservation X  Y *Measures fraction of change in Y due to X Score=0, if Y is independent of X Score=Max, if Y is fully dependent on X *Trades off dependency & frequency *Encodes Directionality High Joint Probability? High Conditional Probability? We use mutual information (combines the two)

  9. Modifying Scores for Networking • P(Y|X)  1 leads to high score … … X Y • Negative Correlation • Flows with little overlap

  10. Modifying Scores for Networking • Negative Correlation • Flows with little overlap • Long Running Flows • Large downloads, ssh/remote desktop • Trivial overlaps with long flow • Distinguishnew vs. present • Present rules reported only if small mismatch in freq. • Too Many Possibilities • Bias, focus on pairs with at least one common IP • Missrules, but hit-rate up 1000x and costs down 10x … … X Y … … Y X • P(Y|X)  1

  11. Generics Database - Miss, if no client accesses server often + Rules that abstract away parts of a flow Client : Server  Server : Database Server * Client : Server  Server : Database (any client) Kerberos Client : Rsrv.  Client : Kerberos * * Client : Rsrv.  Client : Kerberos Reservation (any client, but same on both sides) To do this automatically, • what to abstract? (IP addresses at non-server port) • which pairs to consider for rule? • flows match IP, generics match abstracted IP

  12. Mining for Rules O(f2) O(fn+1) Recursive Spectral Partitioning (VKV’00) Rule Mining Digests 105—106 flows into 102—103 rule clusters Techniques extend to arbitrary sized rules Instead, • Focus on pair-wise rules(simpler is likelier) • Group similar rules • Eliminate weak rules between strongly connected groups • Transitive closure to read off clusters

  13. Recap: eXpose Mines for Rules Activity Matrix Rules Rule Clusters … flowi.new  flowj.present ... Packet Trace Contributions • Learn all significant rules without prior knowledge • Scoring function for rule significance • Avoids observer bias, yet stays feasible by focusing on high hit-rate • Algorithms to mine and prune

  14. Related Work • Semi-Automated Discovery of App. Session Structure (KJPK’06) • Sherlock (Diagnosing Performance Problems, BCGKMZ’07) • Autofocus (ESV’03) • BLINC (KPF’05) • Stepping Stones (ZP’00) • Learn all significant rules without prior knowledge • Avoids observer bias, yet stays feasible by focusing on high hit-rate • Scoring function for rule significance • Algorithms to mine and prune

  15. Results

  16. Evaluation Setup Before CSAIL’s Servers Inside Microsoft CSAIL’s Access Access Link of Conf. LANs • Traces at access and internal server-facing links • Packet Headers, Connection Records(Bro), some anon. • Operational n/w with 103 clients, diverse traffic mix • Corroborated on test-bed traffic& vetted by admins. • Ran eXpose on a 2.4GHz x86 with 8GB RAM

  17. Rules Discovered by eXpose • Dependencies for Major Applications email @ microsoft Client.* – PFS1.X Client.* – PFS2.X Client.* – Proxy.80 Client.* – DC.88 Client.* – Mail.X Client.* – Mail.135

  18. Rules Discovered by eXpose afs @ csail AFS1.7000 – Root.7002 C.7001 – *.* C.7001 – AFS2.7000 C.7001 – Root.7003 C.7001 – AFS1.7000 Dependencies for Major Applications

  19. Rules Discovered by eXpose web @ microsoft Proxy3.80 – *.* Proxy2.80 – *.* Proxy1.80 – *.* Proxy4.80 – *.* • Dependencies for Major Applications • web, e-mail, file-servers, IM, print, video broadcast

  20. Rules Discovered by eXpose • Dependencies for Major Applications • web, e-mail, file-servers, IM, print, video broadcast • Configuration Errors & Other Badness smtp + IDENT @ csail Client.113 – MailServer.* Client.* – MailServer.25

  21. Rules Discovered by eXpose Legacy email ids @ csail UnivMail.* – Old1.25 UnivMail.* – Old3.25 UnivMail.* – Old2.25 • Dependencies for Major Applications • web, e-mail, file-servers, IM, print, video broadcast • Configuration Errors & Other Badness • IDENT, Legacy emails, ssh scans, wingate

  22. Nagios.7001 – AFS2.7000 Nagios.7001 – AFS1.7000 Nagios.* – Mail1.25 Nagios.* – Mail2.25 Rules Discovered by eXpose • Dependencies for Major Applications • web, e-mail, file-servers, IM, print, video broadcast • Configuration Errors & Other Badness • IDENT, Legacy emails, ssh scans, wingate • Rules for stuff we didn’t know before Nagios monitors @ csail

  23. Rules Discovered by eXpose Link level multicast name resolution @ hotspots H.137 – Wins.137 Black box: Little prior knowledge about servers, applications, or users  Can evolve H.* – Multicast.5355 H.* – DNS.53 • Dependencies for Major Applications • web, e-mail, file-servers, IM, print, video broadcast • Configuration Errors & Other Badness • IDENT, Legacy emails, ssh scans, wingate • Rules for stuff we didn’t know before • Nagios, LLMNR, iTunes

  24. Correctness & Completeness • False Positives • 13% of rule-clusters in CSAIL trace, we couldn’t explain • False Negatives • Main CSAIL Web Server (too many different activities) • Dependencies on Personal Web Pages (too few traffic) • PlanetLab Traffic (punted) • Other Limitations • IPSec, Anonymized, Cover Traffic • Extensions • Rules repeatover time, and across traces • Application whitelisting, Customize Generics

  25. Time to Mine for Rules At CSAIL’s access link, high fan-out with many distinct flows Stream Mining Appears Feasible!

  26. eXpose Packet Trace Rules for frequently reoccurring flow sets • Learn all significant rules with no specific knowledge • Avoids observer bias, but feasible by focusing on high hit-rate • Scoring function for rule significance • Algorithms to mine and prune • Empirical validation on enterprise traces • found configurations & protocols that we didn’t know existed • learnt rules for actual behavior of applications • found config. errors, bot scans, infected machines http://research.microsoft.com/~srikanth

  27. Backup

  28. Expanding Search Space (# of flows)… # of Discovered Rules Rule Score (Modified JMeasure) … exposes few significant rules!

  29. Expanding Search Space (# of flows)… Time to Mine Rules (s) Memory Footprint (million rules) # Top Active Flows # Top Active Flows … exposes few rules & costs a lot in time, memory

  30. Varying Size of Time Windows # of Discovered Rules Rule Score (Modified JMeasure) All window sizes in [.25s, 2s] produce similar rules!

  31. For all rules X  Y Joint Probability Prob. (Y) Prob. (X)

More Related