Alex Kesselman , MPI

Internet Algorithms: Design and Analysis Alex Kesselman, MPI MiniCourse, Oct. 2004

Algorithms for Networks • Networking provides a rich new context for algorithm design • algorithms are used everywhere in networks • at the end-hosts for packet transmission • in the network: switching, routing, caching, etc. • many new scenarios • and very stringent constraints • high speed of operation • large-sized systems • cost of implementation • require new approaches and techniques

Methods In the networking context • we also need to understand the “performance” of an algorithm: How well does a network or a component that uses a particular algorithm perform, as perceived by the user? • performance analysis is concerned with metrics like delay, throughput, loss rates, etc • metrics of the designer and of the theoretician not necessarily the same

Recent Algorithm Design Methods • Motivated by the desire • for simple implementations • and for robust performance • Several methods of algorithm design can be used in the networking context • randomized algorithms • approximation algorithms • online algorithms • distributed algorithms

In this Mini Course… • We will consider a number of problems in networking • Show various methods for algorithm design and for performance analysis

Network Layer Functions transport packet from sending to receiving hosts network layer protocols in every host, router important functions: path determination: route taken by packets from source to dest. switching: move packets from router’s input to appropriate router output network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical network data link physical application transport network data link physical application transport network data link physical

The Internet Edge Router The Internet Core

Internet Routing Algorithms Balaji Prabhakar

Network looks like Graph !

Routing Graph abstraction for routing algorithms: graph nodes are routers graph edges are physical links link cost: delay, $ cost, or congestion level 5 3 5 2 2 1 3 1 2 1 A D B E F C Routing protocol Goal: determine “good” path (sequence of routers) thru network from source to dest. • “good” path: • typically means minimum cost path • other def’s possible

Routing Algorithms Classification Global or decentralized information? Global: all routers have complete topology, link cost info “link state” algorithms Decentralized: router knows physically-connected neighbors, link costs to neighbors iterative process of info exchange with neighbors “distance vector” algorithms Static or dynamic? Static: routes change slowly over time Dynamic: routes change more quickly periodic update in response to link cost changes

Link-State Routing Algorithms: OSPF Compute least cost paths from a node to all other nodes using Dijkstra’s algorithm. advertisement carries one entry per neighbor router advertisements disseminated via flooding

Dijkstra’s algorithm: example 5 3 5 2 2 1 3 1 2 1 A D B E F C D(B),p(B) 2,A 2,A 2,A D(D),p(D) 1,A D(C),p(C) 5,A 4,D 3,E 3,E D(E),p(E) infinity 2,D Step 0 1 2 3 4 5 start N A AD ADE ADEB ADEBC ADEBCF D(F),p(F) infinity infinity 4,E 4,E 4,E

Route Optimization Improve user performance and network efficiency by tuning OSPF weights to the prevailing traffic demands. customers or peers AT&T backbone customers or peers

Route Optimization • Traffic engineering • Predict influence of weight changes on traffic flow • Minimize objective function (say, of link utilization) • Inputs • Networks topology: capacitated, directed graph • Routing configuration: routing weight for each link • Traffic matrix: offered load each pair of nodes • Outputs • Shortest path(s) for each node pair • Volume of traffic on each link in the graph • Value of the objective function

Example Links AB and BD are overloaded B 1 1 2 D E 1 2 A C Change weight of CD to 1 to improve routing (load balancing) !

References • Anja Feldmann, Albert Greenberg, Carsten Lund, Nick Reingold, Jennifer Rexford, and Fred True, "Deriving traffic demands for operational IP networks: Methodology and experience," IEEE/ACM Transactions on Networking, pp. 265-279, June 2001. • Bernard Fortz and Mikkel Thorup, "Internet traffic engineering by optimizing OSPF weights," in Proc. IEEE INFOCOM, pp. 519-528, 2000.

Distance Vector Routing: RIP Based on the Bellman-Ford algorithm At node X, the distance to Y is updated by where DX(Y) denote the distance at X currently from X to Y,N(X) is set of the neighbors of node X, and c(X, Z) is the distance of the direct link from X to Z

Distance Table: Example 1 7 2 8 1 2 A D E B C Below is just one step! The algorithm repeats for ever! distance tables from neighbors computation E’s distance table distance table E sends to its neighbors E D () A B C D A 0 7   1 c(E,A) B 7 0 1  8 c(E,B) A 1 8   D   2 0 2 c(E,D) A: 1 B: 8 C: 4 D: 2 E: 0 B 15 8 9  D   4 2 1, A 8, B 4, D 2, D destinations

Link Failure and Recovery • Distance vectors: exchanged every 30 sec • If no advertisement heard after 180 sec --> neighbor/link declared dead • routes via neighbor invalidated • new advertisements sent to neighbors • neighbors in turn send out new advertisements (if tables changed) • link failure info quickly propagates to entire net

The bouncing effect dest cost dest cost 1 A 1 A B B 1 C 1 C 2 1 25 C dest cost A 2 B 1

C sends routes to B dest cost dest cost A ~ A B B 1 C 1 C 2 1 25 C dest cost A 2 B 1

B updates distance to A dest cost dest cost A 3 A B B 1 C 1 C 2 1 25 C dest cost A 2 B 1

B sends routes to C dest cost dest cost A 3 A B B 1 C 1 C 2 1 25 C dest cost A 4 B 1

How are these loops caused? • Observation 1: • B’s metric increases • Observation 2: • C picks B as next hop to A • But, the implicit path from C to A includes itself!

Solutions • Split horizon/Poisoned reverse • B does not advertise route to C or advertises it with infinite distance (16) • Works for two node loops • does not work for loops with more nodes

Example where Split Horizon fails A B 1 1 1 C 1 D • When link breaks, C marks D as unreachable and reports that to A and B • Suppose A learns it first. A now thinks best path to D is through B. A reports a route of cost=3 to C. • C thinks D is reachable through A at cost 4 and reports that to B. • B reports a cost 5 to A who reports new cost to C. • etc...

Comparison of LS and DV algorithms Message complexity LS: with n nodes, E links, O(nE) msgs sent DV: exchange between neighbors only larger msgs Speed of Convergence LS: requires O(nE) msgs may have oscillations DV: convergence time varies routing loops count-to-infinity problem Robustness: what happens if router malfunctions? LS: node can advertise incorrect link cost each node computes only its own table DV: DV node can advertise incorrect path cost error propagates thru network

Hierarchical Routing scale: with 50 million destinations: can’t store all dest’s in routing tables! routing table exchange would swamp links! administrative autonomy internet = network of networks each network admin may want to control routing in its own network Our routing study thus far - idealization • all routers identical • network “flat” … not true in practice

Hierarchical Routing aggregate routers into regions, “autonomous systems” (AS) routers in same AS run same routing protocol “intra-AS” routing protocol special routers in AS run intra-AS routing protocol with all other routers in AS also responsible for routing to destinations outside AS run inter-AS routing protocol with other gateway routers gateway routers

Internet AS Hierarchy Inter-AS border (exterior gateway) routers Intra-ASinterior (gateway) routers

Intra-AS and Inter-AS routing Inter-AS routing between A and B b c a a C b B b c a d Host h1 A A.c A.a C.b B.a Host h2 Intra-AS routing within AS B Intra-AS routing within AS A

Peer-to-Peer Networks: Chord Balaji Prabhakar

A peer-to-peer storage problem • 1000 scattered music enthusiasts • Willing to store and serve replicas • How do you find the data?

The Lookup Problem N2 N1 N3 Key=“title” Value=MP3 data… Internet ? Client Publisher Lookup(“title”) N4 N6 N5

Centralized lookup (Napster) N2 N1 SetLoc(“title”, N4) N3 Client DB N4 Publisher@ Lookup(“title”) Key=“title” Value=MP3 data… N8 N9 N7 N6 Simple, but O(N) state and a single point of failure

Flooded queries (Gnutella) N2 N1 Lookup(“title”) N3 Client N4 Publisher@ Key=“title” Value=MP3 data… N6 N8 N7 N9 Robust, but worst case O(N) messages per lookup

Routed queries (Freenet, Chord, etc.) N2 N1 N3 Client N4 Lookup(“title”) Publisher Key=“title” Value=MP3 data… N6 N8 N7 N9

Chord Distinguishing Features • Simplicity • Provable Correctness • Provable Performance

Chord Simplicity • Resolution entails participation by O(log(N)) nodes • Resolution is efficient when each node enjoys accurate information about O(log(N)) other nodes

Chord Algorithms • Basic Lookup • Node Joins • Stabilization • Failures and Replication

Chord Properties • Efficient: O(log(N)) messages per lookup • N is the total number of servers • Scalable: O(log(N)) state per node • Robust: survives massive failures

Chord IDs • Key identifier = SHA-1(key) • Node identifier = SHA-1(IP address) • Both are uniformly distributed • Both exist in the same ID space • How to map key IDs to node IDs?

Consistent Hashing[Karger 97] • Target: web page caching • Like normal hashing, assigns items to buckets so that each bucket receives roughly the same number of items • Unlike normal hashing, a small change in the bucket set does not induce a total remapping of items to buckets

Consistent Hashing [Karger 97] Key 5 K5 Node 105 N105 K20 Circular 7-bit ID space N32 N90 A key is stored at its successor: node with next higher ID K80

Basic lookup N120 N10 “Where is key 80?” N105 N32 “N90 has K80” N90 K80 N60

Simple lookup algorithm Lookup(my-id, key-id) n = my successor if my-id < n < key-id call Lookup(id) on node n // next hop else return my successor // done • Correctness depends only on successors

“Finger table” allows log(N)-time lookups ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80

Finger i points to successor of n+2i N120 112 ½ ¼ 1/8 1/16 1/32 1/64 1/128 N80

Lookup with fingers Lookup(my-id, key-id) look in local finger table for highest node n s.t. my-id < n < key-id if n exists call Lookup(id) on node n // next hop else return my successor // done

Alex Kesselman , MPI

Alex Kesselman , MPI

Presentation Transcript

MPI

Harvey Kesselman Chief Executive Officer

MPI

MPI

MPI

Alex

Alex

Alex

Alex

MPI

MPI

MPI

MPI

MPI

MPI

Alex

MPI