1 / 59

Rethinking Network Control & Management The Case for a New 4D Architecture

Rethinking Network Control & Management The Case for a New 4D Architecture. David A. Maltz Carnegie Mellon University/Microsoft Research Joint work with Albert Greenberg, Gisli Hjalmtysson Andy Myers, Jennifer Rexford, Geoffrey Xie, Hong Yan, Jibin Zhan, Hui Zhang.

schuyler
Download Presentation

Rethinking Network Control & Management The Case for a New 4D Architecture

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rethinking Network Control & ManagementThe Case for a New 4D Architecture David A. Maltz Carnegie Mellon University/Microsoft Research Joint work with Albert Greenberg, Gisli Hjalmtysson Andy Myers, Jennifer Rexford, Geoffrey Xie, Hong Yan, Jibin Zhan, Hui Zhang

  2. The Role of Network Control and Management • Many different network environments • Access, backbone networks • Data-center networks, enterprise/campus • Sizes: 10-10,000 routers/switches • Many different technologies • Longest-prefix routing (IP), fixed-width routing (Ethernet), label switching (MPLS, ATM), circuit switching (optical, TDM) • Many different policies • Routing, reachability, transit, traffic engineering, robustness The control plane software binds these elements together and defines the network

  3. We Can Change the Control Plane! • Pre-existing industry trend towards separating router hardware from software • IETF: FORCES, GSMP, GMPLS • SoftRouter [Lakshman, HotNets’04] • Incremental deployment path exists • Individual networks can upgrade their control planes and gain benefits • Small enterprise networks have most to gain • No changes to end-systems required

  4. A Clean-slate Design • What are the fundamental causes of network problems? • How to secure the network and protect the infrastructure? • How to provide flexibility in defining management logic? • What functionality needs to be distributed – what can be centralized? • How to reduce/simplify the software in networks? • What would a “RISC” router look like? • How to leverage technology trends? • CPU and link-speed growing faster than # of switches

  5. Three Principles forNetwork Control & Management Network-level Objectives: • Express goals explicitly • Security policies, QoS, egress point selection • Do not bury goals in box-specific configuration Reachability matrix Traffic engineering rules Management Logic

  6. Three Principles forNetwork Control & Management Network-wide Views: • Design network to provide timely, accurate info • Topology, traffic, resource limitations • Give logic the inputs it needs Reachability matrix Traffic engineering rules Management Logic Read state info

  7. Three Principles forNetwork Control & Management Direct Control: • Allow logic to directly set forwarding state • FIB entries, packet filters, queuing parameters • Logic computes desired network state, let it implement it Reachability matrix Traffic engineering rules Write state Management Logic Read state info

  8. Overview of the 4D Architecture Network-level objectives Decision Plane: • Allmanagement logic implemented on centralized servers making all decisions • Decision Elements use views to compute data plane state that meets objectives, then directly writes this state to routers Decision Dissemination Direct control Network-wide views Discovery Data

  9. Overview of the 4D Architecture Network-level objectives Dissemination Plane: • Provides a robust communication channel to each router – and robustness is the only goal! • May run over same links as user data, but logically separate and independently controlled Decision Dissemination Direct control Network-wide views Discovery Data

  10. Overview of the 4D Architecture Network-level objectives Discovery Plane: • Each router discovers its own resources and its local environment • E.g., the identity of its immediate neighbors Decision Dissemination Direct control Network-wide views Discovery Data

  11. Overview of the 4D Architecture Network-level objectives Data Plane: • Spatially distributed routers/switches • Can deploy with today’s technology • Looking at ways to unify forwarding paradigms across technologies Decision Dissemination Direct control Network-wide views Discovery Data

  12. Concerns and Challenges • Distributed Systems issues • How will communication between routers and DEs survive failures in the network? • Latency means DE’s view of network is behind reality. Will the control loop be stable? • What is the overhead to/from the DEs? • What happens in a network partition? • Networking issues • Does the 4D simplify control and management? • Can we create logic to meet multiple objectives?

  13. The Feasibility of the 4D Architecture We designed and built a prototype of the 4D Architecture • 4D Architecture permits many designs – prototype is a single, simple design point • Decision plane • Contains logic to simultaneously compute routes and enforce reachability matrix • Multiple Decision Elements per network, using simple election protocol to pick master • Dissemination plane • Uses source routes to direct control messages • Extremely simple, but can route around failed data links

  14. Evaluation of the 4D Prototype • Evaluated using Emulab (www.emulab.net) • Linux PCs used as routers (650 – 800MHz) • Tested on 9 enterprise network topologies (10-100 routers each) Example network with 49 switches and 5 DEs

  15. Performance of the 4D Prototype Trivial prototype has performance comparable to well-tuned production networks • Recovers from single link failure in < 300 ms • < 1 s response considered “excellent” • Faster forwarding reconvergence possible • Survives failure of master Decision Element • New DE takes control within 1 s • No disruption unless second fault occurs • Gracefully handles complete network partitions • Less than 1.5 s of outage

  16. Fundamental Problem: Wrong Abstractions OSPF OSPF OSPF BGP BGP BGP Shell scripts Traffic Eng • Management Plane • Figure out what is happening in network • Decide how to change it Planning tools Databases Configs SNMP netflow modems OSPF • Control Plane • Multiple routing processes on each router • Each router with different configuration program • Huge number of control knobs: metrics, ACLs, policy Link metrics Routing policies FIB • Data Plane • Distributed routers • Forwarding, filtering, queueing • Based on FIB or labels FIB FIB Packet filters

  17. Good Abstractions Reduce Complexity Management Plane All decision making logic lifted out of control plane • Eliminates duplicate logic in management plane • Dissemination plane provides robust communication to/from data plane switches Configs Decision Plane Control Plane FIBs, ACLs FIBs, ACLs Dissemination Data Plane Data Plane

  18. Today: Simple Things are Hard to Do D Inter-POP Links Access Networks

  19. Fundamental Problem: Configurations Allow Too Many Degrees of Freedom • Computing configuration files that cause control plane to compute desired forwarding states is intractable • NP-hard in many cases • Requires predictive model of control plane behavior • Configurations files form a program that defines a set of forwarding states • Very hard to create program that permits only desired states, and doesn’t transit through bad ones Forwarding states allowed by configs Auto-adaptation leads to/thru bad states Direct Control avoids bad states

  20. Fundamental Problem: Conflation of Issues • Ideal case: all routing information flooded to all routers inside network • Robustness achieved via flooding • Reality: routing information filtered and aggregated extensively • Route filtering used to implement security and resource policies • Route aggregation used to achieve scalability

  21. 4D Separates Distributed Computing Issues from Networking Issues • Distributed computing issues ! protocols and network architecture • Overhead • Resiliency • Scalability • Networking issues ! management logic • Traffic engineering and service provisioning • Egress point selection • Reachability control (VPNs) • Precomputation of backup paths

  22. Future Work • Scalability • Evaluate over 1-10K switches, 10-100K routes • Networks with backbone-like propagation delays • Structuring decision logic • Arbitrate among multiple, potentially competing objectives • Unify control when some logic takes longer than others • Protocol improvements • Better dissemination and discovery planes • Deployment in today’s networks • Data center, enterprise, campus, backbone (RCP)

  23. Future Work • Experiment with network appliances • Traffic shapers, traffic scrubbers • Expand relationships with security • Using 4D as mechanism for monitoring/quarantine • Formulate models that establish bounds of 4D • Scale, latency, stability, failure models, objectives • Generate evidence to support/refute principles

  24. Questions?

  25. Direct Control Provides Complete Control • Zero device-specific configuration • Supports many models for “pushing” routes • Trivial push – convergence requires time for all updates to be receive and applied – same as today • Synchronized update – updates propagated, but not applied till agreed time in the future – clock skew defines convergence time • Controlled state trajectory – DE serializes updates to avoid all incorrect transient states

  26. Fundamental Problem: Wrong Abstractions interface Ethernet0 ip address 6.2.5.14 255.255.255.128 interface Serial1/0.5 point-to-point ip address 6.2.2.85 255.255.255.252 ip access-group 143 in frame-relay interface-dlci 28 router ospf 64 redistribute connected subnets redistribute bgp 64780 metric 1 subnets network 66.251.75.128 0.0.0.127 area 0 router bgp 64780 redistribute ospf 64 match route-map 8aTzlvBrbaW neighbor 66.253.160.68 remote-as 12762 neighbor 66.253.160.68 distribute-list 4 in access-list 143 deny 1.1.0.0/16 access-list 143 permit any route-map 8aTzlvBrbaW deny 10 match ip address 4 route-map 8aTzlvBrbaW permit 20 match ip address 7 ip route 10.2.2.1/16 10.2.1.7

  27. Fundamental Problem: Wrong Abstractions 2000 Size of configuration files in a single enterprise network (881 routers) Lines in config file 1000 0 0 881 Router ID (sorted by file size)

  28. Fundamental Problem: Conflating Distributed Systems Issues with Networking Issues Routing Process • Distributed Systems Concern: resiliency to link failures • Solution: multiple paths through routing process graph D left D D Routing Process D Routing Process D D left D left

  29. Fundamental Problem: Conflating Distributed Systems Issues with Networking Issues Routing Process • Distributed Systems Concern: resiliency to link failures • Solution: multiple paths through routing process graph D right D Routing Process D Routing Process D D left D left

  30. Fundamental Problem: Conflating Distributed Systems Issues with Networking Issues Routing Process Filter routes to D • Networking Concern: implement resource or security policy • Solution: restrict flow of routing information, filter routes, summarize/aggregate routes D left D D Routing Process D Routing Process D D left D left

  31. 4D Supports Network Evolution & Expansion • Decision logic can be upgraded as needed • No need for update of distributed protocols implemented in software distributed on every switch • Decision Elements can be upgraded as needed • Network expansion requires upgrades only to DEs, not every switch

  32. Reachability Example • Two locations, each with data center & front office • All routers exchange routes over all links R1 R2 Chicago (chi) New York (nyc) Data Center Front Office R5 R3 R4

  33. Reachability Example R1 R2 Chicago (chi) New York (nyc) Data Center Front Office R5 R3 R4 chi-DC chi-FO nyc-DC nyc-FO chi-DC chi-FO nyc-DC nyc-FO

  34. Reachability Example chi-DC chi-FO nyc-DC nyc-FO chi-DC chi-FO nyc-DC nyc-FO Packet filter: Drop nyc-FO -> * Permit * R1 R2 chi Data Center Front Office Packet filter: Drop chi-FO -> * Permit * R5 nyc R3 R4

  35. Reachability Example • A new short-cut link added between data centers • Intended for backup traffic between centers Packet filter: Drop nyc-FO -> * Permit * R1 R2 chi Data Center Front Office Packet filter: Drop chi-FO -> * Permit * R5 nyc R3 R4

  36. Reachability Example • Oops – new link lets packets violate security policy! • Routing changed, but • Packet filters don’t update automatically Packet filter: Drop nyc-FO -> * Permit * R1 R2 chi Data Center Front Office Packet filter: Drop chi-FO -> * Permit * R5 nyc R3 R4

  37. Prohibiting Packets from chi-FO to nyc-DC

  38. Reachability Example • Typical response – add more packet filters to plug the holes in security policy Packet filter: Drop nyc-FO -> * Permit * R2 R1 chi Data Center Front Office Packet filter: Drop chi-FO -> * Permit * R5 nyc R3 R4

  39. Reachability Example • Packet filters have surprising consequences • Consider a link failure • chi-FO and nyc-FO still connected Drop nyc-FO -> * R2 R1 chi Data Center Front Office R5 nyc Drop chi-FO -> * R3 R4

  40. Reachability Example • Network has less survivability than topology suggests • chi-FO and nyc-FO still connected • But packet filter means no data can flow! • Probing the network won’t predict this problem Drop nyc-FO -> * R2 R1 chi Data Center Front Office R5 nyc Drop chi-FO -> * R3 R4

  41. Allowing Packets from chi-FO to nyc-FO

  42. Multiple Interacting Routing Processes OSPF OSPF Internet FIB FIB Policy1 Policy2 EBGP OSPF BGP OSPF OSPF OSPF FIB FIB FIB Client Server

  43. The Routing Instance Graph of a 881 Router Network

  44. Reconvergence Time UnderSingle Link Failure

  45. Reconvergence Time When Master DE Crashes

  46. Reconvergence Time WhenNetwork Partitions

  47. Reconvergence Time WhenNetwork Partitions

  48. Many Implementations Possible Single redundant decision engine • Multiple decision engines • Hot stand-by • Divide network & load share • Distributed decision engines • Up to one per router • Choice can be based on reliability requirements • Dessim. Plane can be in-band, or leverage OOB links • Less need for distributed solutions (harder to reason about) • More focus on network issues, less on distributed protocols

More Related