1 / 18

Knowledge Plane -- Scaling of the WHY App

Knowledge Plane -- Scaling of the WHY App. Bob Braden, ISI 24 Sept 03. Scaling. [How] can we make KP services "scalable" (whatever that means)? Network traffic Processing Storage E.g., suppose that every end system uses WHY. This should be a good example of scaling issues in the KP

lelia
Download Presentation

Knowledge Plane -- Scaling of the WHY App

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Plane--Scaling of the WHY App Bob Braden, ISI 24 Sept 03 Bob Braden@ISI

  2. Scaling • [How] can we make KP services "scalable" (whatever that means)? • Network traffic • Processing • Storage • E.g., suppose that every end system uses WHY. • This should be a good example of scaling issues in the KP • To diagnose the cause of failure, typically need information that is available only in neighborhood of failure => wide-area problem. Bob Braden@ISI

  3. IP Path Diagnosis • Consider a subset of the WHY problem: diagnosis of an IP data path: • Can S send IP datagram to D, and if not, why not? • For the "cause", simply tell which node or link is broken. • Thus, ask for information currently provided by traceroute. Bob Braden@ISI

  4. A Simple, Analogous Scaling Problem • Let’s think about a connectivity testing tool that runs in the data/control-plane (not the KP): IPdiagnose. • Operates like traceroute, hop/hop along data path. • Returns path vector: list of router hops to failure point. • Want to make IPdiagnose scalable, in case all the users trigger it. • Purpose: • Insight into more general KP scaling • Insight into DDC's model Bob Braden@ISI

  5. Possible Approaches to IPdiagnose • Using vanilla tracerouteOH ~ w Ne l2 l = Path length (number of hops) • = Diagnostic frequency (WHY requests per sec per end node) Ne = number of end nodes that issue traceroutes. • Record-&-Return-Route (RRR) msg, processed in each router.OH ~ w Ne l S,1>D S>D message S,1,2>D S 1 2 3 X D S,1,2,3>D Bob Braden@ISI

  6. Make it Scalable Lower the overhead by decreasing average path length l. • Move (prior) results as close to end points as possible/practicable. This reduces the diagnostic traffic in the center of the network. • To achieve this, use: 3. "Aggregation" • If matching request for same D arrives while previous is pending, hold it and satisfy when reply comes back. 4. Demand-driven result caching • Cache results back along the path from S; • Use cached results to satisfy subsequent requests for same destination that come later. Bob Braden@ISI

  7. Result Caching • Search messages from S gather path vector in forward direction. • Return messages visit each node along return path to S and leave IPdiagnose result state there. S' message S,1>D S,1,2>D S 1 2 3 X D S,12,3>D S,12,3>D S,12,3>D {>D} {3>D} {2,3>D} State retained in node 1: If a later IPdiagnose S'->D reaches this node, return the path {S’,…,1, 2, 3 > D} Bob Braden@ISI

  8. This is not quite certain… • Note: cached path could be unreliable, but it generally works in the absence of policy routing. S,1>D S,1,2>D S 1 2 3 D X 2’ D’ Bob Braden@ISI

  9. More Scalability • Suppose have cached state for failed path S -> D. Does this help for another path S' -> D' that shares 1, 2, 3, …? • Suppose that routing in node 3 supplies an address range Dr that contains address D. • Cached results can contain Dr. • If D’ is contained in Dr, then node 1 can use cached state {2,3>Dr} to infer broken path {1,2,3>D’}. S 1 2 3 X D Dr {3>Dr} {2,3>Dr} D’ S’ Bob Braden@ISI

  10. Flushing the Cache • New requests matching cache inhibit timeout. • Some percent of matching requests will be forwarded anyway, as probe requests. • A node will initiate a reverse message towards all relevant senders to adjust/remove cached state, if: • Routing changes Dr, or • A probe request finds next hop info that differs from cached path. Bob Braden@ISI

  11. Relation to DDC Model in KP • Dest address D is the (only) variable of the “tuple” composing the request. • Forwarding is not offer-based (unless next-hop routing calculation is considered an “offer”) • Does not exactly match DDC's “Aggregation” story (?) • First request arrives: Don’t want to delay it to await a matching request, so cache and forward it. Is this an "aggregation"? • DDC's model does not have result caching. • In KP, must consider complexity caused by regions. • Sparse overlay mesh of TPs Bob Braden@ISI

  12. Other Approaches to IPdiagnose 4. Flooding (unconstrained diffusion) • Every diagnostic event (link-down event) is flooded out to edges, where it matches requests. • I am confused about scalability here. Intuitively this seems unscalable, but I don’t see how to justify that. • Flooding cost ~ O(#links * #faults) (one per fault per link) • Request cost ~ O(w * Ne ) (path length = 1) Bob Braden@ISI

  13. More Approaches to IPdiagnose 5. Directed Diffusion • Link state changes are flooded out towards edges in directions of significant fluxes of incoming WHY requests. • In sparse directions, use RRR messages or result- caching within the network, as discussed earlier. • This is reverse of Clark’s proposal – here the requests are creating a gradient to control the diffusion of satisfactions nearer to the users. Bob Braden@ISI

  14. (The End) Bob Braden@ISI

  15. Demand-Driven Result Caching • Creates a depth-first diffusion of IPdiagnose replies, triggered by requests for the same destination that share part of the same path. • Note that if path is not in fact broken, then nothing is cached and then scaling of IPdiagnose stinks. Bob Braden@ISI

  16. DDC’s Request Satisfaction Model • Route a request hop/hop (roughly) paralleling the data path to reach a Request Satisfier (RS) near failure node F. • Satisfaction: IP path vector from S to F. • Recursive induction step at node K (Assume RS is in each node) : • Request "(IPFAIL, D, (S, N1,…Nn))" arrives at node Nn. • Analysis: • “S cannot send datagrams to D, but packets from S to D reach me.. • The next-hop node towards D from is Nn+1. • I will test whether I can get to Nn+1 and, if so, pass request "(IPFAIL, D, (S, N1, … Nn+1) along to it. • If not, I will return path vector (S, N1, … Nn ) back to S." Bob Braden@ISI

  17. DDC’s Model … • More complex version of model: take into account the region structure of Internet. E.g., RS per region. • Request arrives at RSn; induction step of analysis is: • “Packets from S to D reach my (entry) edge node En.” • “I have evidence that packets are flowing from En to my appropriate (exit) edge node E’n. • The next-hop RS, in the next AS along the data path towards D, is RSn+1 and the next hop towards D from E’n is En+1. • I will test whether I can get to En+1, and if so, pass this request along to RSn+1, else I will return path vector (S, N1, … En, … E’n ) to S.” Bob Braden@ISI

  18. Result Caching … general case Any source S e ||S|| uses the broken link to reach any D e ||D|| INTERNET X ||D|| ||S|| Infer ||D|| from routing, Store information about broken link to ||D|| near every Se ||S|| Bob Braden@ISI

More Related