1 / 46

IP Network Management

IP Network Management. Richard Mortier. Overview. Introduction Abstractions IP network components IP network management protocols Pulling it all together An alternative approach. Overview. Introduction What’s it all about then? Abstractions IP network components

wklinger
Download Presentation

IP Network Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IP Network Management Richard Mortier Digital Communications II, CUCL

  2. Overview • Introduction • Abstractions • IP network components • IP network management protocols • Pulling it all together • An alternative approach Digital Communications II, CUCL

  3. Overview • Introduction • What’s it all about then? • Abstractions • IP network components • IP network management protocols • Pulling it all together • An alternative approach Digital Communications II, CUCL

  4. What is network management? • One point-of-view: a large field full of acronyms • EMS, TMN, NE, CMIP, CMISE, OSS, AN.1, TL1, EML, FCAPS, ITU, ... • (Don’t ask me what all of those mean, I don’t care!) • From question.com: • In 1989, a random of the journalistic persuasion asked hacker Paul Boutin “What do you think will be the biggest problem in computing in the 90s?” Paul’s straight-faced response: “There are only 17,000 three-letter acronyms.” • We will ignore most of them  Digital Communications II, CUCL

  5. What is network management? • Computer networks are considered to have three operating timescales • Data: packet forwarding [μs, ms] • Control: flows/connections [ secs, mins ] • Management: aggregates, networks [ hours, days ] • …so we’re concerned with “the network” rather than particular devices or protocols • Standardization is key! Digital Communications II, CUCL

  6. Overview • Introduction • Abstractions • ISO FCAPS, TMN EMS, ATM • IP network components • IP network management protocols • Pulling it all together • An alternative approach Digital Communications II, CUCL

  7. ISO FCAPS: functional separation • Fault • Recognize, isolate, correct, log faults • Configuration • Collect, store, track configurations • Accounting • Collect statistics, bill users, enforce quotas • Performance • Monitor trends, set thresholds, trigger alarms • Security • Identify, secure, manage risks Digital Communications II, CUCL

  8. TMN EMS: administrative separation • Telecommunications Management Network • Element Management System • “...simple but elegant...” (!) • (my emphasis) • (often the two go together) • NEL: network elements (switches, transmission systems) • EML: element management (devices, links) • NML: network management (capacity, congestion) • SML: service management (SLAs, time-to-market) • BML: business management (RoI, market share, blah) Digital Communications II, CUCL

  9. The B-ISDN reference model • Asynchronous Transfer Mode “cube” • See IAP lectures, maybe  • Plane management… • The whole network • …vs layer management • Specific layers • Topology • Configuration • Fault • Operations • Accounting • Performance management plane control plane user plane higher layers higher layers plane management layer management ATM adaptation layer ATM layer physical layer Digital Communications II, CUCL

  10. Network management • Models of general communication networks • Tend to be quite abstract and exceedingly tedious! • Many practitioners still seem excited about OO programming, WIMP interfaces, etc • …probably because implementation is hard due to so many excessively long and complex standards! • My view: basic “need-to-know” requirements are • What should be happening? [ c ] • What is happening? [ f, p, a ] • What shouldn’t be happening? [ f, s ] • What will be happening? [ p, a ] Digital Communications II, CUCL

  11. Network management • We’ll concentrate on IP networks • Still acronym city: ICMP, SNMP, MIB, RFC  • Sample size: 102 routers, 105 hosts • We’ll concentrate on the network core • Routers, not hosts • We’ll ignore “service management” • DNS, AD, file stores, etc Digital Communications II, CUCL

  12. Overview • Introduction • Abstractions • IP network components • IP, networks, routers • IP network management protocols • Pulling it all together • An alternative approach Digital Communications II, CUCL

  13. IP primer (you probably know all this) • Destination-routed packets – no connections • Time-to-live field: allow removal of looping packets • Routers forward packets based on routeing tables • Tables populated by routeing protocols • Routers and protocols operate independently • …although protocols aim to build consistent state • RFCs ~= standards • Often much looser semantics than e.g. ISO, ITU standards • Compare for example OSPF [RFC2327] and IS-IS [RFC1142, RFC1195], two link-state routeing protocols Digital Communications II, CUCL

  14. So, how do you build an IP network? $1m? $2m? for a new, populated, backbone router! • Buy (lease) routers • Buy (lease) fibre • Connect them all together • Configure routers • Configure end-systems Wayleaves = $$$ Be a landowner! Correctly. For now. Mwuhahaha. Someone else’s can of worms. Digital Communications II, CUCL

  15. Cisco CRS-1 Multi-shelf system Multiple router flavours A sample taxonomy • Core • OC-12 (622Mbps) and up (to OC-768 ~= 40Gbps) • Big, fat, fast, expensive • E.g. Cisco HFR, Juniper T-640 • HFR: 1.2Tbps each, interconnect up to 72 giving 92Tbps, start at $450k • Transit/Peering-facing • OC-3 and up, good GigE density • ACLs, full-on BGP, uRPF, accounting • Customer-facing • FR/ATM/… • Feature set as above, plus fancy queues, etc • Broadband aggregator • High scalability: sessions, ports, reconnections • Feature set as above • Customer-premises (CPE) • 100Mbps, maybe • NAT, DHCP, firewall, wireless, VoIP, … • Low cost, low-end, perhaps just software on a PC Digital Communications II, CUCL

  16. Network design • Whose network? • ISPs, IXs, enterprise, campus • POPs, DCs • Many designs: flat, hierarchical, hybrids, multiple scales • Many constraints • Business • Backwards compatibility. Who to connect. Peering. • Technology • Power – directly (24x7 operation) and indirectly (cooling) • Port density vs. raw bandwidth • Software reliability • Hardware/software capability • Addressing schemes for scalability, summarization • Can’t run feature X with feature Y on vendor C in network size N • Connectivity/resiliency • “All core routers connect to at least 2 other core routers” • “All edge routers connect to at least 2 core routers” Digital Communications II, CUCL

  17. Router configuration • Initialization • Name the router, setup boot options, setup authentication options • Configure interfaces • Loopback, ethernet, fibre, ATM • Subnet/mask, filters, static routes • Shutdown (or not), queueing options, full/half duplex • Configure routeing protocols (OSPF, BGP, IS-IS, …) • Process number, addresses to accept routes from, networks to advertise • Access lists, filters, ... • Numeric id, permit/deny, subnet/mask, protocol, port • Route-maps, matching routes rather than data traffic • Other configuration aspects: traps, syslog, etc • (Oh, and switch configuration is about as painful) Digital Communications II, CUCL

  18. Router configuration fragments hostname FOOBAR ! boot system flash slot0:a-boot-image.bin boot system flash bootflash: logging buffered 100000 debugging logging console informational aaa new-model aaa authentication login default tacacs local aaa authentication login consoleport none aaa authentication ppp default if-needed tacacs aaa authorization network tacacs ! ip tftp source-interface Loopback0 no ip domain-lookup ip name-server 10.34.56.78 ! ip multicast-routing ip dvmrp route-limit 7000 ip cef distributed interface Loopback0 description router-1.network.corp.com ip address 10.65.21.43 255.255.255.255 ! interface FastEthernet0/0/0 description Link to New York ip address 10.65.43.21 255.255.255.128 ip access-group 175 in ip helper-address 10.65.12.34 ip pim sparse-mode ip cgmp ip dvmrp accept-filter 98 neighbor-list 99 full-duplex ! interface FastEthernet4/0/0 no ip address ip access-group 183 in ip pim sparse-mode ip cgmp shutdown full-duplex router ospf 2 log-adjacency-changes passive-interface FastEthernet0/0/0 passive-interface FastEthernet0/1/0 passive-interface FastEthernet1/0/0 passive-interface FastEthernet1/1/0 passive-interface FastEthernet2/0/0 passive-interface FastEthernet2/1/0 passive-interface FastEthernet3/0/0 network 10.65.23.45 0.0.0.255 area 1.0.0.0 network 10.65.34.56 0.0.0.255 area 1.0.0.0 network 10.65.43.0 0.0.0.127 area 1.0.0.0 access-list 24 remark Mcast ACL access-list 24 permit 239.255.255.254 access-list 24 permit 224.0.1.111 access-list 24 permit 239.192.0.0 0.3.255.255 access-list 24 permit 232.192.0.0 0.3.255.255 access-list 24 permit 224.0.0.0 0.0.0.255 access-list 1011 deny 0000.0000.0000 ffff.ffff.ffff ffff.ffff.ffff 0000.0000.0000 0xD1 2 eq 0x42 access-list 1011 permit 0000.0000.0000 ffff.ffff.ffff 0000.0000.0000 ffff.ffff.ffff tftp-server slot1:some-other-image.bin tacacs-server host 10.65.0.2 tacacs-server key xxxxxxxx rmon event 1 trap Trap1 description "CPU Utilization>75%" owner config rmon event 2 trap Trap2 description "CPU Utilization>95%" owner config Digital Communications II, CUCL

  19. Router configuration • Lots of large, fragile text files • 00s/000s routers, 00s/000s lines per config • Errors are hard to find and have non-obvious results • Router configuration also editable on-line • Order matters! • How to keep track of them all? • Naming schemes, directory trees, CVS, ssh upload and atomic commit to router • Perhaps even a proper database • State of the art is pretty basic • Few tools to check consistency, design goals • Generally generate configurations from templates and have human-intensive process to control access to running configs • Topic of current research [Feamster et al] This counts as advanced! Digital Communications II, CUCL

  20. Overview • Introduction • Abstractions • IP network components • IP network management protocols • ICMP, SNMP, NetFlow • Pulling it all together • An alternative approach Digital Communications II, CUCL

  21. ICMP • Internet Control Message Protocol [RFC792] • IP protocol #1 • In-band “control” • Variety of message types • echo/echo reply [ PING (packet internet groper) ] • time exceeded [ TRACEROUTE ] • destination unreachable, redirect • source quench Digital Communications II, CUCL

  22. Ping (Packet INternet Groper) • Test for liveness • …also used to measure (round-trip) latency • Send ICMP echo • Valid IP host [RFC1122, RFC1123] must reply with ICMP echo response • Subnet PING? • Useful but often not available/deprecated • “ACK” implosion could be a problem • RFCs ~= standards Digital Communications II, CUCL

  23. Traceroute • Which route do my packets take to their destination? • Send UDP packets with increasing time-to-live values • Compliant IP host must respond with ICMP “time exceeded” • Triggers each host along path to so respond • Not quite that simple • One router, many IP addresses: which source address? • Router control processor, inbound or outbound interface • Asymmetric routes (return path != outbound path) • Routes change • Do we want full-mesh host-host routes anyway?! • Size of data set, amount of probe traffic • This is topology, what about load on links? Digital Communications II, CUCL

  24. SNMP • Protocol to manage information tables at devices • Provides get, set, trap, notify operations • get, set: read, write values • trap: signal a condition (e.g. threshold exceeded) • notify: reliable trap • Complexity mostly in the MIB design • Some standard tables, but many vendor specific • Non-critical, so often tables populated incorrectly • Many tens of MIBs (thousands of lines) per device • Different versions, different data, different semantics • Yet another configuration tracking problem • Inter-relationships between MIBs Digital Communications II, CUCL

  25. IPFIX • IETF working group • Export of flow based data out of IP network devices • Developing suitable protocol from Cisco NetFlow™ v9 • [RFC3954, RFC3955] • Statistics reporting • Setup template • Send data records matching template • Many variables • Packet/flow counters, rule matches, quite flexible Digital Communications II, CUCL

  26. Overview • Introduction • Abstractions • IP network components • IP network management protocols • Pulling it all together • Network mapping, statistics gathering, control • An alternative approach Digital Communications II, CUCL

  27. An hypothetical NMS • GUI around ICMP (ping, traceroute), SNMP, etc • Recursive host discovery • Broadcast ping, ARP, default gateway: start somewhere • Recursively SNMP query for known hosts/connected networks • Ping known hosts to test liveness • Iterate • Display topology: allow “drill-down” to particular devices • Configure and monitor known devices • Trap, Netflow™, syslog message destinations • Counter thresholds, CPU utilization threshold, fault reporting • Particular faults or fault patterns • Interface statistics and graphs (MRTG) Digital Communications II, CUCL

  28. NOC, NOC. Calling AT&T… Digital Communications II, CUCL

  29. What are they all looking at? http://www.stat.ee.ethz.ch/mrtg/ Digital Communications II, CUCL

  30. An hypothetical NMS • All very straightforward? No, not really • A lot of software engineering: corner cases, traceroute interpretation, NATs, etc • Correctness • MIBs may contain rubbish • Can only view inside your network anyway • Tunnelled, encrypted protocols becoming prevalent • Efficiency • Rate pacing discovery traffic: ping implosion/explosion • SNMP overloading router CPUs • Using NMSs also not straightforward • How to setup “correct” thresholds? • How to decide when something “bad” has happened? • How to present (or even interpret) reams and reams of data? Digital Communications II, CUCL

  31. Overview • Introduction • Abstractions • IP network components • IP network management protocols • Pulling it all together • An alternative approach • From the edges… Digital Communications II, CUCL

  32. Anemone • An endsystem network management platform • Collect flow information from endsystems, and • Combine with topology information from routeing protocols • Endsystems have more information about their traffic network devices • No router support required • A platform to support many applications • Currently concentrating on “managed networks” • E.g. governments, enterprises, etc • High complexity, high value • High degree of endsystem control Digital Communications II, CUCL

  33. Applications • Real-time and historical analysis • Current topology + ingress, egress flows gives global picture of network behaviour • Capacity planning, anomaly detection • Modelling “what if” scenarios • Plug into a simulator back-end • “What happens to the network if we move all our Exchange servers to a single data centre?” • Automatic configuration • Close the loop: enable network to meet dynamic SLAs • Reconfigure network to track e.g. time of day load changes Digital Communications II, CUCL

  34. Challenges • How do you instrument an entire network? Do you need to instrument all endsystems? • What network information should be captured and stored? • How do you access data stores distributed across a large network (ca. 300,000 nodes)? 1% coverage gives 99.999% bytes and flows Flow data augmented with application and user context Use distributed query system Digital Communications II, CUCL

  35. Summary • Introduction • What is network management? • Abstractions • ISO FCAPS, TMN EMS, ATM • IP network components • IP, networks, routers • IP network management protocols • ICMP, SNMP, etc • Pulling it all together • Outline of a network management system • An alternative approach: from the edges Digital Communications II, CUCL

  36. The end • Questions • Answers? • http://www.cisco.com/ • http://www.routergod.com/ • http://www.ietf.org/ • http://ipmon.sprintlabs.com/pyrt/ • http://www.nanog.org/ Digital Communications II, CUCL

  37. Backup slides • Internet routeing • OSPF • BGP Digital Communications II, CUCL

  38. Internet routeing • Q: how to get a packet from node to destination? • A1: advertise all reachable destinations and apply a consistent cost function (distance vector) • A2: learn network topology and compute consistent shortest paths (link state) • Each node (1) discovers and advertises adjacencies; (2) builds link state database; (3) computes shortest paths • A1, A2: Forward to next-hop using longest-prefix-match Digital Communications II, CUCL

  39. OSPF (~link state routeing) • Q: how to route given packet from any node to destination? • A: learn network topology; compute shortest paths • For each node • Discover adjacencies (~immediate neighbours); advertise • Build link state database (~network topology) • Compute shortest paths to all destination prefixes • Forward to next-hop using longest-prefix-match (~most specific route) Digital Communications II, CUCL

  40. BGP (~path vector routeing) • Q: how to route given packet from any node to destination? • A: neighbours tell you destinations they can reach; pick cheapest option • For each node • Receive (destination, cost, next-hop) for all destinations known to neighbour • Longest-prefix-match among next-hops for given destination • Advertise selected (destination, cost+, next-hop') for all known destinations • Selection process is complicated • Routes can be modified/hidden at all three stages • General mechanism for application of policy Digital Communications II, CUCL

  41. Digital Communications II, CUCL

  42. Anemone: where are we now? • Three major components • Flow collection • Route collection • Distributed database • Building prototypes, simulating system Digital Communications II, CUCL

  43. Data collection • Flow collection • Hosts track active flows • Using low overhead event posting infrastructure, ETW • Built prototype device driver provider & user-space consumer • Used packet traces for feasibility study on (client, server) • Peaks at (165, 5667) live and (39, 567) active flows per sec • Route collection • OSPF is link-state: passively collect link state adverts • Extension of my work at Sprint (for IS-IS and BGP); also been done at AT&T (NSDI’04 paper) Digital Communications II, CUCL

  44. The distributed database • Logically contains • Traffic flow matrix (bandwidths), {srcs}×{dsts} • …each entry annotated with current route from src to dst • N.B. src/dst might be e.g. (IP end-point, application) • Large dynamic data set suggests aggregation • Related work • { distributed, continuous query, temporal } databases • Sensor networks • Potential starting points: Astrolabe or SDIMS (SIGCOMM’04) • Where/what/how much to aggregate? • Is data read- or write-dominated? • Which is more dynamic, flow or topology data? • Can the system successfully self-tune? Digital Communications II, CUCL

  45. The distributed database • Construct traffic matrix from flow monitoring • Hosts can supply flows they source and sink • Only need a subset of this data to get complete traffic matrix • Construct topology from route collection • OSPF supplies topology → routes • Wish to be able to answer queries like • “Who are the top-10 traffic generators?” • Easy to aggregate, don’t care about topology • “What is the load on link l ?” • Can aggregate from hosts, but need to know routes • “What happens if we remove links {l…m} ?” • Interaction between traffic matrix, topology, even flow control Digital Communications II, CUCL

  46. The distributed database • Building simulation model • OSPF data gives topology, event list, routes • Simple load model to start with (load ~ # subnets) • Precedence matrix (from SPF) reduces flow-data query set • Can we do as well/better than e.g. NetFlow? • Accuracy/coverage trade-off • How should we distribute the DB? • Just OSPF data? Just flow data? A mixture? • How many levels of aggregation? • How many nodes do queries touch? • What sort of API is suitable? • Example queries for sample applications Digital Communications II, CUCL

More Related