Resilient overlay networks ron
This presentation is the property of its rightful owner.
Sponsored Links
1 / 47

Resilient Overlay Networks (RON) PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on
  • Presentation posted in: General

Resilient Overlay Networks (RON). Work by: andersen , balakrishnan , Kaashoek , and Morris Appeared In: SOSP Oct 2001 Presented By: Matt Trower and Mark Overholt Some Images Are Taken From Original Presentation. Background. Overlay network: Network built on top of a network

Download Presentation

Resilient Overlay Networks (RON)

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Resilient overlay networks ron

Resilient Overlay Networks (RON)

Work by: andersen, balakrishnan, Kaashoek, and Morris

Appeared In: SOSP Oct 2001

Presented By: Matt Trower and Mark Overholt

Some Images Are Taken From Original Presentation


Background

Background

  • Overlay network: Network built on top of a network

    • Examples: Internet (telephone), Gnutella (Internet), RON (Internet + Internet2)

  • AUP: Acceptable Use Policy (educational Internet2 can’t be used for commercial traffic)


Motivation

Motivation

  • BGP is slow to converge (~10 minutes)

    • TCP Timeout less than 512 seconds typically

  • End-to-End paths unavailable 3.3% of the time

  • 5% of outages last more than 2 hours

  • Failures caused by config errors, cut lines (ships), DOS attacks

  • What if we need more than 4 9’s of reliability?


Sidenote

Sidenote

Anderson started ISP in Utah before going to MIT

Industry

Research


Goals

Goals

  • Detect failures and recover in less than 20s

  • Integrate routing decisions with application needs

  • Expressive Policy Routing

    • Per-user rate controls OR packet-based AUPs


Big idea

Big Idea

Use inherent diversity of paths to provide link redundancy

Use failure detectors for small subset of nodes in network

Select best path from multiple options


Triangle inequality

Triangle Inequality

Best Path ≠ Direct Path


Design

Conduit

Conduit

Forwarder

Forwarder

Router

Prober

Router

Prober

Design

C++ Application level library (send/recv interface)

Performance

Database

Application-specific routing tables

Policy routing module


Failure detection

Failure Detection

  • Send probes over N-1 links O()

    • Probe interval: 12 seconds

    • Probe timeout: 3 seconds

    • Routing update interval: 14 seconds

  • Send 3 fast retries on failure


Overhead

Overhead

Acceptable for small cliques


Experimental setup

Experimental Setup

  • Two Configurations

    • Ron1: 12 hosts in US and Europe, mix of academic and industry

      • 64 hours collected in March 2001

    • Ron2: Original 12 hosts plus 4 new hosts

      • 85 hours collected in May 2001


Analysis packet loss

Analysis – Packet Loss

RON1

UDP


Analysis latency

Analysis - Latency

RON1

UDP


Analysis throughput

Analysis - Throughput

RON1

TCP


Resiliency to dos attacks

Resiliency to DOS Attacks

Done on 3-node system

Ack’s take different path

Utah’s Network

Emulation Testbed


Pros cons

Pros & Cons

  • Pros

    • Recovers from complete outages and severe congestion

    • Doubles throughput in 5% of samples

    • Single-hop redirection sufficient

  • Cons

    • Valid paths not always considered

      • Cisco->MIT->NC-Cable-> CMU

    • growth limits scalability

    • Route Ack’s


Discussion

Discussion

How does geographic distribution affect RON?

Why doesn’t BGP do this already?

What if everyone was part of a RON?

What if all latencies fall below 20ms?


A scalable content addressable network

A Scalable Content-Addressable Network

Work by: Ratnasamy, Francis, Handley, and Karp

Appeared In: SIGCOMM ‘01


Distributed hash table

Distributed Hash Table

A Decentralized system that provides key-value pair lookup service across a distributed system.

DHTs support Insert, Lookup, and Deletion of Data

Image from Wikipedia


Content addressable network

Content Addressable Network

CAN is a design for a distributed hash table.

CAN was one of the original four DHT proposals. It was introduced concurrently with Chord, Pastry, and Tapestry


Motivation1

Motivation

P2P Applications were the forefront in designing CAN.

P2P Apps were scalable in their file transfers, but not very scalable in their indexing of content.

Originally designed as a scalable index for P2P content.

Use of CAN was not limited to P2P apps however. Could also be used in large scale storage systems, content distribution systems, or DNS.


Can basic design

CAN: Basic Design

The Overlay Network is a Cartesian Coordinate Space on a d-torus (The coordinate system wraps around).

Each node of the CAN is assigned a “Zone” of the d-dimensional space to manage.

Each Node only has knowledge of nodes in Neighboring Zones.

Assume for now, that each zone has only 1 node.


Example

Example


Example1

Example

Neighbor Lists:

Zone 1 knows about Zones 4, 5, and 3

Zone 2 knows about Zones 4, 5, and 3

Zone 3 knows about Zones 5, 1, and 2

Zone 4 knows about Zones 2 and 1

Zone 5 knows about Zones 1, 2, and 3


Inserting data in a can

Inserting Data in a CAN

Given a (Key,Value) Pair, hash the Key d different ways, where d is the # of Dimensions

The resulting coordinate is mapped onto the Overlay.

The node responisible for that coordinate is the node that stores the (Key,Value) pair.


Example2

Example

  • Given (Key,Value):

    • HashX(Key) = Xcoord

    • HashY(Key) = Ycoord


Routing in a can

Routing in a CAN

A routing message hops from node to node,

Getting closer and closer to the Destination.

A node only knows about its immediate Neighbors

Routing Path Length is (d/4)(n1/d)

As d approaches log(n), the total

Path length goes to log(n).


Adding nodes in a can

Adding Nodes in a CAN

A new node, N inquires at a Bootstrap node for the IP of any node in the system, S.

Pick a random point, P, in the coordinate space, managed by Node D.

Using CAN Routing, route from S to D.

Node D splits its Zone and gives half to N to manage.

Update the Neighbor List in all Neighboring Nodes to D and N, including D and N.


Example3

Example

The Zone is split between the new

Node, N and the old node D.

Node, N, routes to the zone

containing Point P


Node removal

Node Removal

Need to repair the routing in case of a leaving node or a dead node.

If one of the neighboring zones can merge with the empty zone and maintain Rectilinear Integrity, it does so.

If not, the neighboring Node with the smallest zone attempts to Takeover the zone of the dead node.

Each node independently sends a “Takeover” message to its neighbors.

If a node receives a Takeover message, it cancels its timer if the sending zone is smaller, or it sends a takeover message of its own if it is bigger.


Proposed improvements

Proposed Improvements

Multi-Dimensioned Coordinate Spaces

Multiple Realities: Multiple Overlapping Coordinate Spaces

Better Routing Metrics

Multiple nodes per Zone

Multiple Hash Functions (replication)

Geographically sensitive overlay


Experimental data

Experimental Data


Comparisons

Comparisons

Comparing Basic CAN to “Knobs on Full” CAN


Discussion pros

Discussion - Pros

Using some of the improvement made CAN a very robust routing and storage protocol.

Using geographic location in the overlay creation would create smarter hops between close nodes. (But what about a geographically centralized disaster?)


Discussion cons

Discussion - Cons

Not much work on Load-Balancing the Keys

When all of the Extra Features are running at once, CAN becomes quite complicated.

Tough to guarantee uniform distribution of keys with hash functions on a large scale.

Query Correctness


Pastry

Pastry

Work by: Rowstron and Druschel

Appeared In: Middleware ‘01


The problem

The Problem

Maintain overlay network for both arrivals and failures

Load Balancing

Network proximity sensitive routing


Pastry1

Pastry

Lookup/insert O(logN)

Per-node state O(logN)

Network proximity-based routing


Design1

Design

O

2128-1

objId

nodeIds


Design2

Design

Owner of obj

O

2128-1

objId

nodeIds


Lookup table

Lookup Table

  • Prefix matching based on Plaxton Routing


Locality of search

Locality of Search

Search widens as prefix match becomes longer!


Parameters

Parameters

b: tradeoff between local storage and average hop count

L: resiliency of routing


Security

Security

Choose next hop for routes randomly amongst choices

Replicate data to nearby nodes


Scalability

Scalability


Distance

Distance


Discussion1

Discussion

What does bigger leafset gain you?

How do we decide proximity?

What other features might we want to create a lookup table based upon?


  • Login