1 / 64

Things That Go Bump in the Net: Implementing and Securing a Scientific Network

May 15 th 2013 – Merit Member Conference Matt Lessins – Wayne State University Jason Zurawski – Internet2. Things That Go Bump in the Net: Implementing and Securing a Scientific Network. Outline. Science DMZ Overview Network Performance Expectations Campus Security

ion
Download Presentation

Things That Go Bump in the Net: Implementing and Securing a Scientific Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. May 15th 2013 – Merit Member Conference Matt Lessins – Wayne State University Jason Zurawski – Internet2 Things That Go Bump in the Net: Implementing and Securing a Scientific Network

  2. Outline • Science DMZ Overview • Network Performance Expectations • Campus Security • Wayne State Science DMZ Case Study

  3. Science DMZ Overview • Motivation • Need to move lots of data, consistently • From scientific instrument generating data to storage & processing resources • Physics (telescopes, colliders) • Biology (gene sequencers) • Between different storage/processing within same research entity network • Between different data storage/processing devices scattered around the country and the world • Mirrored data • Distributed analysis of data

  4. Science DMZ Overview “Typical” campus network • Border FW and router between Internet and Campus network • Good at passing loads of smallish flows • Short bursts, minimal payload • Large number of flows, but minimal impact on overall resources • FW is a roadblock in more than one sense

  5. Science DMZ Overview • Firewalls, IDPs, Shapers and the like present “bumps” in the net • These devices can hinder the movement of large data sets (single flows)

  6. Science DMZ Overview • Congestion (e.g. mouse flows) is another obstacle to overcome • Numerous/Short lived flows • Fills up buffers/takes away processing time on path devices

  7. Science DMZ Overview • What is needed is an End-to-End, “bump” free path • The Science DMZ is an architecture that provides a solution • Special purpose part of the network near the network perimeter • Takes its name from “DMZ networks”, a network that hosts devices outside of the secured perimeter of the organizations network

  8. Science DMZ Overview Source: http://fasterdata.es.net/science-dmz/science-dmz-architecture/

  9. Science DMZ (In One Slide) • Consists of 3 key components, all required: • “friction free” network path • Highly capable network devices (wire-speed, deep queues) • Virtual circuit (implementation agnostic - e.g. SDN in any flavor) connectivity option • Security policy and enforcement specific to science workflows • Located at or near site perimeter if possible • Dedicated, high-performance data movers • a.k.a.: Data Transfer Node (DTN) • Optimized bulk data transfer tools such as GlobusOnline/GridFTP • Performance measurement/test node • perfSONAR • Details at: http://fasterdata.es.net/science-dmz/ Source: B. Tierney @ ESnet

  10. Science DMZ Overview Science DMZ Objects • Science DMZ Switch/Router • Place in the Science DMZ network where security via Access Control Lists (ACLs) takes place • Attachments to Data Transfer Node, WAN, Border Router, PerfSONAR and other Science DMZ connections that extend into the Campus network • Does it support emerging technologies such as OSCARS and OpenFlow (e.g. “Software Defined Networking” – allowing dynamic, out of band, control over network devices and protocols)? • Does it have enough buffer space to handle “fan-in” issues

  11. Science DMZ Overview Science DMZ Objects • Data Transfer Node (DTN) • Usually a PC-based Linux server built out of quality components tuned for high-speed data transfer to remote systems • Could have its own local storage or be connected to SAN or some combination • Has high-speed network interfaces • Runs software tools designed for high-speed data transfer like GridFTP and versions of ssh/scp with high-performance patches • Doesn’t run other software that might increase security risks, like browsers, media players, etc… • Treat this as a ‘cache’ – stage data movement through this device

  12. Science DMZ Overview Science DMZ Objects • perfSONAR • Software framework focused on federating performance monitoring infrastructure • E.g. everyone monitors internally, why not share that information with others? • Facilitates sharing of historic measurement data; allows parties to negotiate end to end testing across multiple domains • Catches the cases that fall between the cracks • Network Demarcation • Application/local performance vs. true network problems

  13. Science DMZ Overview Science DMZ Objects • WAN • Not the commodity Internet, there are lots of bumps in that net • National Research networks • Internet2 • National Lambda Rail (NLR) • If endpoints are local you might be able to use your Regional Optical Network (RON), for example, Merit • The key is that the WAN component not be a black box • No bumps • Minimal number of hops

  14. Science DMZ Overview

  15. Outline • Science DMZ Overview • Network Performance Expectations • Campus Security • Wayne State Science DMZ Case Study

  16. Reality Check: Expectations • The following shows how long it (should*) taketo transfer 1 Terabyte of data across various speed networks**: • 10 Mbps network : • 300 hrs (12.5 days) • 100 Mbps network : • 30 hrs • 1 Gbps network : • 3 hrs • 10 Gbps network : • 20 minutes *Can your network do this? **Assumes running at 100% efficiency, no performance problems

  17. State of the Campus • Show of hands – is there a firewall on your campus? • Do you know who ‘owns’ it? Maintains it? Is it being maintained? • Have you ever asked for a ‘port’ to be opened? White list a host? Does this involve an email to ‘a guy’ you happen to know? • Has it prevented you from being ‘productive’? • In General … • Yes, they exist. • Someone owns them, and probably knows how to add rules – but the ‘maintenance’ question is harder to answer. • Like a router/switch, they need firmware updates too… • Will it impact you – ‘it depends’. Yes, it will have an effect on your traffic at all times, but will you notice? • Small streams (HTTP, Mail, etc.) – you won’t notice slowdowns, but you will notice blockages • Larger streams (Data movement, Video, Audio) – you will notice slowdowns

  18. State of Campus – Word of Caution… • To be 100% clear – the firewall is a useful tool: • A layer or protection that is based on allowed, and disallowed, behaviors • One stop location to install instructions (vs. implementing in multiple locations) • Very necessary for things that need ‘assurance’ (e.g. student records, medical data, protecting the HVAC system, IP Phones, and printers from bad people, etc.) • To be 100% clear again, the firewall delivers functionality that can be implemented in different ways: • Filtering ranges can be implemented via ACLs • Port/Host blocking can be done on a host by host basis • IDS tools can implement near real-time blocking of ongoing attacks that match heuristics

  19. State of the Campus - Clarifications • I am not here to make you throw away the Firewall • The firewall has a role; it’s time to define what that role is, and is not • Policy may need to be altered (pull out the quill pens and parchment) • Minds may need to be changed • I am here to make you think critically about campus security as a system. That requires: • Knowledge of the risks and mitigation strategies • Knowing what the components do, and do not do • Humans to implement and manage certain features – this may be a shock to some (lunch is never free)

  20. State of the Campus – End Game • The end goal is enabling true R&E use of the network • Most research use follows the ‘Elephant’ Pattern. You can’t stop the elephant and inspect it’s hooves without causing a backup at the door to the circus tent • Regular campus patterns are often ‘mice’, small, fast, harder to track on an individual basis (e.g. we need big traps to catch the mice that are dangerous) • Security and performance can work well together – it requires critical thought (read that as time, people, and perhaps money) • Easy economic observation – impacting your researchers with slower networks makes them less competitive, e.g. they are pulling in less research dollars vs. their peers

  21. When Security and Performance Clash • What does a firewall do? • Streams of packets enter into an ingress port – there is some buffering • Packet headers are examined. Have I seen a packet like this before? • Yes – If I like it, let it through, if I didn’t like it, goodbye. • No - Who sent this packet? Are they allowed to send me packets? What port did it come from, and what port does it want to go to? • Packet makes it through processing and switching fabric to some egress port. Sent on its way to the final destination. • Where are the bottlenecks? • Ingress buffering – can we tune this? Will it support a 10G flow, let alone multiple 10G flows? • Processing speed – being able to verify quickly is good. Verifying slowly will make TCP sad • Switching fabric/egress ports. Not a huge concern, but these can drop packets too • Is the firewall instrumented to know how well it is doing? Could I ask it?

  22. “Personal” (Software) Firewall Flow Chart

  23. Causes of Jitter • Processing Delay: Time to process a packet • Queuing Delay: Time spent in ingress/egress queues to device • Transmission Delay: Time needed to put the packet on the wire • Propagation Delay: Time needed to travel on the wire

  24. When Security and Performance Clash • Lets look at two examples, that highlight two primary network architecture use cases: • Totally protected campus, with a border firewall • Central networking maintains the device, and protects all in/outbound traffic • Pro: end of the line customers don’t need to worry (as much) about security • Con: end of the line customers *must* be sent through the disruptive device • Unprotected campus, protection is the job of network customers • Central networking gives you a wire and wishes you best of luck • Pro: nothing in the path to disrupt traffic, unless you put it there • Con: Security becomes an exercise that is implemented by all end customers

  25. Brown University Example • Totally protected campus, with a border firewall

  26. Brown University Example • Behind the firewall:

  27. Brown University Example • In front of the firewall:

  28. Brown University Example – TCP Dynamics • Want more proof – lets look at a measurement tool through the firewall. • Measurement tools emulate a well behaved application • ‘Outbound’, not filtered: • nuttcp -T 10 -i 1 -p 10200 bwctl.newy.net.internet2.edu • 92.3750 MB / 1.00 sec = 774.3069 Mbps 0 retrans • 111.8750 MB / 1.00 sec = 938.2879 Mbps 0 retrans • 111.8750 MB / 1.00 sec = 938.3019 Mbps 0 retrans • 111.7500 MB / 1.00 sec = 938.1606 Mbps 0 retrans • 111.8750 MB / 1.00 sec = 938.3198 Mbps 0 retrans • 111.8750 MB / 1.00 sec = 938.2653 Mbps 0 retrans • 111.8750 MB / 1.00 sec = 938.1931 Mbps 0 retrans • 111.9375 MB / 1.00 sec = 938.4808 Mbps 0 retrans • 111.6875 MB / 1.00 sec = 937.6941 Mbps 0 retrans • 111.8750 MB / 1.00 sec = 938.3610 Mbps 0 retrans • 1107.9867 MB / 10.13 sec = 917.2914 Mbps 13 %TX 11 %RX 0 retrans 8.38 msRTT

  29. Brown University Example – TCP Dynamics • ‘Inbound’, filtered: • nuttcp -r -T 10 -i 1 -p 10200 bwctl.newy.net.internet2.edu • 4.5625 MB / 1.00 sec = 38.1995 Mbps 13 retrans • 4.8750 MB / 1.00 sec = 40.8956 Mbps 4 retrans • 4.8750 MB / 1.00 sec = 40.8954 Mbps 6 retrans • 6.4375 MB / 1.00 sec = 54.0024 Mbps 9 retrans • 5.7500 MB / 1.00 sec = 48.2310 Mbps 8 retrans • 5.8750 MB / 1.00 sec = 49.2880 Mbps 5 retrans • 6.3125 MB / 1.00 sec = 52.9006 Mbps 3 retrans • 5.3125 MB / 1.00 sec = 44.5653 Mbps 7 retrans • 4.3125 MB / 1.00 sec = 36.2108 Mbps 7 retrans • 5.1875 MB / 1.00 sec = 43.5186 Mbps 8 retrans • 53.7519 MB / 10.07 sec = 44.7577 Mbps 0 %TX 1 %RX 70 retrans 8.29 msRTT

  30. Brown University Example – TCP Plot (2nd)

  31. Brown University Example – TCP Plot (2nd)

  32. Brown University Example – TCP Plots

  33. Brown University Example • Series of problems and solutions implemented: • 10G Firewall was not even coming close – configuration and issue with tech support were to blame • After this, internal switching infrastructure was revealed to be dropping packets on large flows (lack of buffering) • Mitigating step of using 1G network (not protected through firewall) was found to be insufficient due to demand • Epilogue: • perfSONAR Monitoring (Department and Campus) goes a long way in producing ‘proof’ • Network architectural changes to support heavy hitters will be needed • Firewalls are complex, its easy to get it ‘wrong’ in terms of configuration. • And they need a human to watch them – its not set and forget

  34. The Pennsylvania State University Example • Unprotected campus, protection is the job of network customers

  35. The Pennsylvania State University Example • Initial Report from network users: performance poor both directions • Outbound and inbound (normal issue is inbound through protection mechanisms) • From previous diagram – CoE firewalll was tested • Machine outside/inside of firewall. Test to point 10ms away (Internet2 Washington) • jzurawski@ssstatecollege:~> nuttcp -T 30 -i 1 -p 5679 -P 5678 64.57.16.22 • 5.8125 MB / 1.00 sec = 48.7565 Mbps 0 retrans • 6.1875 MB / 1.00 sec = 51.8886 Mbps 0 retrans • … • 6.1250 MB / 1.00 sec = 51.3957 Mbps 0 retrans • 6.1250 MB / 1.00 sec = 51.3927 Mbps 0 retrans • 184.3515 MB / 30.17 sec = 51.2573 Mbps 0 %TX 1 %RX 0 retrans 9.85 msRTT

  36. The Pennsylvania State University Example • Observation: net.ipv4.tcp_window_scaling did not seem to be working • 64K of buffer is default. Over a 10ms path, this means we can hope to see only 50Mbps of throughput: • BDP (50 Mbit/sec, 10.0 ms) = 0.06 Mbyte • Implication: something in the path was not respecting the specification in RFC 1323, and was not allowing TCP window to grow • TCP window of 64 KByte and RTT of 1.0 ms <= 500.00 Mbit/sec. • TCP window of 64 KByte and RTT of 5.0 ms <= 100.00 Mbit/sec. • TCP window of 64 KByte and RTT of 10.0 ms <= 50.00 Mbit/sec. • TCP window of 64 KByte and RTT of 50.0 ms <= 10.00 Mbit/sec. • Reading documentation for firewall: • TCP flow sequence checkingwas enabled • What would happen if this was turn off (both directions?

  37. The Pennsylvania State University Example • jzurawski@ssstatecollege:~> nuttcp -T 30 -i 1 -p 5679 -P 5678 64.57.16.22 • 55.6875 MB / 1.00 sec = 467.0481 Mbps 0 retrans • 74.3750 MB / 1.00 sec = 623.5704 Mbps 0 retrans • 87.4375 MB / 1.00 sec = 733.4004 Mbps 0 retrans • … • 91.7500 MB / 1.00 sec = 770.0544 Mbps 0 retrans • 88.6875 MB / 1.00 sec = 743.5676 Mbps 28 retrans • 69.0625 MB / 1.00 sec = 578.9509 Mbps 0 retrans • 2300.8495 MB / 30.17 sec = 639.7338 Mbps 4 %TX 17 %RX 730 retrans 9.88 msRTT

  38. The Pennsylvania State University Example • Impacting real users:

  39. The Pennsylvania State University Example • Series of problems and solutions implemented: • Firewall was not configured properly • Lack of additional paths to implement a true research bypass • Epilogue: • perfSONAR Monitoring (Department and Campus) still goes a long way in producing ‘proof’ • FYI – Penn State has around 50 perfSONAR boxes now for all of their campuses. Tremendous value from a $1,000 machine and free software • No “One Size Fits All” solution will cut it in a dynamic environment

  40. Outline • Science DMZ Overview • Network Performance Expectations • Campus Security • Wayne State Science DMZ Case Study

  41. Science DMZ (?) • A staple of the meeting circuit for several years, Matt gave an excellent overview • What is it really? • “Blueprint”, not a specific design • Approach to network architecture that preserves the ability to securely manage two different worlds • Enterprise – BYOD, IP Phones, Printers, HVAC, things you don’t know enough about to trust, and shouldn’t • Research – Well defined access patterns, Elephant flows, (normally) individuals that can manage their destiny with regards to data protection

  42. Science DMZ – Pro/Con on Generalities • Pro: • Unspecified nature makes the pattern fungible for anyone to implement • Hits the major requirements for major science use cases • A concept that “anyone” should be able to understand on a high level • Con: • Unspecified nature implies you need your own smart person to think critically, and implement it for a specific instantiation • Those that don’t do heavy science (or don’t know they do) may feel “its not for us” • A concept easy to treat as a ‘checkbox’ (hint: CC-NIE schools – are you stating ‘we have perfSONAR’ and moving on?)

  43. Where the Rubber Meets the Road • Lets start with the generic diagram (again):

  44. Where the Rubber Meets the Road • There are 4 areas I am going to hit on, briefly (note the last one is not ‘pictured’): • Network Path • Adoption of “New” Technology • Security • User Outreach

  45. Network Path • Engineers ‘get it’ • No one will dispute that protected and unprotected path will have benefits (and certain dangers). • $, 100G isn’t cheap (10G and 40G are). You don’t have to go 100G, implementing the architecture with existing technology is a perfectly good way forward • You still need a security professional (if you don’t have one already) for the secured and non-secured paths. Learn to love your IDS just as much as your firewall and shapper … • Tuning is important. Small buffers (as seen previously) make data movement sad. This means servers, and network devices • Ounce of prevention – you need monitoring, and you certainly need training in how to use the performance tools to debug. You will be debugging (bet me a $1 if you honestly think you won’t be…)

  46. Adoption of “New” Technology • SDN, perfSONAR, etc. etc. • We will keep making acronyms, don’t worry • What matters in all this? Being able to make your job easier • perfSONAR = insurance policy against risky behavior. • Will tell you if you have done things wrong, and warn you if something breaks. • Crucial for your campus, and costs only the price of a server, and getting an engineer up to speed on how to use it • SDN will be a game changer. Is it ready for production (?) – hard to say. The ability to afford more control over the network to the end user relies on applications (and end users) getting caught up. Hint. • There will be more changes in the future, it’s the nature of the game. R&E needs to be about certain risky moves away from the norm

  47. Security • I can spend an entire deck on this, but to keep it short: • Component based security is wrong. Needs to be a system. • E.g. the firewall by itself has limited use, and can be easily broken by a motivated attacker • System: • Cryptography to protect user access and data integrity • IDS to monitor before (and after) events • Host-based security is better for performance, but takes longer to implement. Firewalls are bad on performance but easy to plot down in a network. • Let your router help you – if you know communication patterns (and know those that should be disallowed), why not use filters? • Campus CI Plan. Make one, update it often. Shows funding bodies you know what is going on and have plans to address risks, and foster growth • Economic argument – if you are non-competitive for grants because you cheaped out on security, are you better in the long run?

  48. Security - Examples • Data Provenance • Some bureaucratic document states that all campus traffic must be a) encrypted and b) passed through a firewall for packet inspection. Why? • a) What data is private, and what isn’t? Student records, sure. Maybe even sensitive grant-related research. Encrypting all data is not necessary if you stop to think about the data. At least make it a user choice. • b) Firewalls work when you can’t be sure of a traffic profile (e.g. they stop everything and give it the business). If you know the traffic profile, use that to your advantage. Data from X sites on ports Y, and Z. • Policy is: • Written by those that often do not have practical experience • Outdated almost immediately • Review (create) CI Plan regularly.

  49. Security - Examples • User Management • What is better: centrally managed user system for all resources vs. independently managed on each machine? • Central • Pro: Easier administration when adding/deleting • Con: Single point of failure • Individual • Pro/Con: Breach of once machine doesn’t necessarily imply that accounts on others are compromised (N.B. I think we are all guilty of recycling passwords though…) • Answer depends on your campus, which is another reason why the DMZ is a blueprint, not a packaged solution

  50. Security - Examples • Device Profiles • All the devices are equal (untrusted) • Have the number of phones/tablets eclipsed hard campus resources for any of you yet? • You should absolutely not trust these, or *many* of your hard campus resources • Some are more equal than others (trusted) • Does the Physics group have a dedicated admin who ‘gets it’? They know Linux, and have implemented host-based security, plus split out heavy hitters from normal users? • Give them a fast path (Penn State Model) • If policy needs to be changed, start handing out certificates to groups that complete a training. CYA…

More Related