1 / 72

ptp - PowerPoint PPT Presentation

  • Updated On :

ptp. Soo-Bae Kim 4/09/02 isdl@snu. What is peer to peer?. Communication without server advantage No single point of failure Easy data sharing disadvantage No service quality guarantee Increased network traffic . Outline. Chord? Chord protocol simulation and experiment result

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'ptp' - Michelle

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg


Soo-Bae Kim



What is peer to peer l.jpg
What is peer to peer?

  • Communication without server

  • advantage

    • No single point of failure

    • Easy data sharing

  • disadvantage

    • No service quality guarantee

    • Increased network traffic

Outline l.jpg

  • Chord?

  • Chord protocol

  • simulation and experiment result

  • conclusion

What is chord l.jpg
What is chord?

  • Provides fast distributed computation of a hash function mapping keys to nodes responsible for them.

  • Use a variant of a Consistent hashing

    • improve scalability:node needs routing information about only a few other nodes.

    • when new nodes join the system, only fraction of keys are moved to different location

  • Simplicity,provable correctness and provable performance

Related work l.jpg
Related work.

  • DNS(domain name server)

    • provide host name to IP address

  • Freenet peer to peer storage system

    • like chord,decentralized and automatically adapts when hosts join and leave.

    • Provide a degree of anonymity

    • ohaha system

Related work6 l.jpg
Related work.

  • Globe system

    • information about an object is stored in a particular leaf domain and pointer cached provide search.

  • Distributed data location protocol by plaxton

    • queries never travel further in network distance than node where the key is stored

    • Ocean store

Related work7 l.jpg
Related work.

  • CAN(content addressable network)

    • use a d-dimensional cartesian space to implement a distributed hash table that maps keys onto values.

Chord s merit l.jpg
Chord’s merit

  • Load balance

  • decentralization

  • scalability

  • availability

  • flexible naming

Example of chord application l.jpg
Example of chord application

  • Cooperative mirroring

  • time-shared storage

  • distributed indexes

  • large-scale combinatorial search

Base chord protocol l.jpg
Base chord protocol

  • how to find the location of keys.

    • Consistent hashing which has several good properties.

  • when new nodes join the system

    • only fraction of keys are moved

    • chord requires messages.

  • in N-node system,each node maintain information only aboutother nodes,and lookup requires messages

Consistent hashing l.jpg
Consistent hashing

  • Consistent hash function assigns each node and key an m-bit identifier using a base hash function

  • k is assigned to the first node(successor node)whose identifier is equal to or follows k in identifier space.

  • When a node n joins, certain keys previously assigned to n’s successor now become assigned to n.

Scalable key location l.jpg
Scalable key location

  • Resolution scheme is inefficient because it may require traversing all N nodes to find the appropriate mapping. so additional routing is needed.

Node join l.jpg
Node join

  • When nodes join(leave), chord preserve two invariant

    • each node’s successor is correctly maintained.

    • For every key k,node successor(k) is responsible for k.

  • Chord perform three tasks to preserve the invariants.

Concurrent operations and failure l.jpg
Concurrent operations and failure

  • Separate our correctness and performance goals.

  • Stabilization protocol is used to keep nodes’ successor pointers.

    • Preserve reachability of exiting nodes.

Failures and replication l.jpg
Failures and replication

  • When a node n fails,nodes whose finger table include n must find n’s successor.

  • The key step in failure recovery is maintaining correct successor pointers.

    • Using successor list

    • Even though,before stabilization,attempt to send requests through the failed node,lookup would be able to proceed by another path.

  • The successor list mechanism also helps higher layer software replicate data.

Simulation and experimental results l.jpg
Simulation and experimental results

  • Recursive chord protocol

    • intermediate node forwards a request to the next node until it reachs the successor.

  • Load balance

Path length l.jpg
Path length

  • Path length:the number of nodes traversed during lookup operation.

Path length20 l.jpg
Path length

  • The result show that the path length is about

    • since the distance is random,we expect half the log N bits to be one.

Simultaneous node failures l.jpg
Simultaneous node failures

  • Randomly select a fraction p of nodes that fail.

  • Since this is just a fraction of keys expected to be lost due to the failure of the responsible nodes,we conclude that there is no significant lookup failure in chord.

Lookup during stabilization l.jpg
Lookup during stabilization

  • Lookup during stabilization may fail two reasons.

    • The node responsible for the key may have failed.

    • Some nodes’ finger tables and predecessor pointers may be inconsistent.

Experimental results l.jpg
Experimental results

  • Measured latency

    • lookup latency grows slowly with total number of nodes.

Conclusion l.jpg

  • Many distributed peer to peer application need to determine the node that stores a data.The chord protocol solves this problem in decentralized manner.

  • Chord scales well with number of nodes.

  • Recovers from large number of simultaneous node failure and joins.

  • Answer most lookups correctly even during stabilization.

Scalable content addressable network l.jpg
Scalable content addressable network

  • Application of CAN

  • CAN design

  • design improvement

  • experimental result

Application of can l.jpg
Application of CAN?

  • CAN provides a scalable indexing mechanism

  • Most of the peer to peer designs are not scalable.

    • Napster: a user queries central server : not completely decentralized ==> expensive and vulnerable for scalability

    • Gnutella : using flooding for requesting ==> not scalable

Application of can27 l.jpg
Application of CAN?

  • For storage management system, CAN can serve efficient insertion and retrieval of content.

    • OceanStore,Farasite,Publius

  • DNS

Proposed can l.jpg
Proposed CAN

  • Composed of many individual nodes.

  • A node holds information about a small number of adjacent zone

  • Each node stores a entire hash table

  • completely distributed,scalable and fault - tolerant

Design can l.jpg
Design CAN

  • routing in CAN :using its neighbor

  • CAN construction

Node departure recovery and can maintenance l.jpg
Node departure,recovery and CAN maintenance

  • when node leaves a CAN or its neighbor has died.

    • if zone of one of neighbors can be merged, this hands over.

    • if not, zone is handed to neighbor whose current zone is smallest.

      • Using timer proportional volume.

Node departure recovery and can maintenance32 l.jpg
Node departure,recovery and CAN maintenance

  • Under normal condition, a node sends a periodic messages to each of its neighbors.

  • The prolonged absence of an updated message signals its failure.

Design improvement l.jpg
Design improvement

  • Multi-dimension

  • Realities

  • Better CAN routing metrics

  • overloading coordinate zones

  • multiple hash function

  • topologically-sensitive construction.

  • More uniform partioning

  • cashing and replication

Multi dimension coordinate spaces l.jpg

Increasing dimension =>reduce routing path length,path penalty

path length= hops

improve routing fault tolerance owing to having more next hops.

Multi-dimension coordinate spaces.

Realities multiple coordinate spaces l.jpg

Realities: assigned a different zone in each coordinate space.

Contents of the hash table are replicated on every realities.

Improve routing falult tolerance,data availability and and path length

Realities:multiple coordinate spaces

Multi dimension vs multi realities l.jpg
Multi dimension vs multi realities space.

  • Both results in shorter path length,but per-node neighbor state and maintenance traffic

  • better performance at multi-dimension

  • consider other benefits of multi realities

Better can routing metrics l.jpg
Better CAN routing metrics space.

  • Metric to better reflect the underlying IP = network level round trip time RTT before cartesian distance.

  • Avoid unnecessary long hops.

  • RTT-weighted routing aims to reducing the latency of individual hops.

  • Per-hop latency = overall path latency / path length

Overloading coordinate zone l.jpg
Overloading coordinate zone space.

  • Allow multiple node to share the same zone.(peer:node that share same zone)


  • node maintains a list of peer and neighbor.

  • When node A join, an existence B node check whether it has fewer than MAXPEER.

Overloading coordinate zone39 l.jpg
Overloading coordinate zone space.

  • If fewer,node A join.

  • If not,zone is split into half.

  • Advantage

    • reduced path length,path and per-hop latency

    • improve fault tolerance

  • Multiple hash function l.jpg

    Use k different hash function to map a single key onto k points.

    Reducing average query latency

    but increasing the size of the database and query traffic by a k factor.

    Multiple hash function

    Topologically sensitive construstion of can overlay network l.jpg

    Constrct CAN based on their relative distance from the landmarks.

    Latency stretch: ratio of the latency on the CAN network to the average latency on the IP network.

    Topologically-sensitive construstion of CAN overlay network

    More uniform partitioning l.jpg

    Achieve load lancing. landmarks.

    Not sufficient for true load balancing because some pairs will be more popular than others : hot spot

    V = (entire coordinate space:Vt) / (node:n)

    More uniform partitioning

    Caching and replication tech for hot spot management l.jpg
    Caching and replication tech. for “hot spot” management landmarks.

    • To make popular data keys widely available.

      • Caching : first check requested data key.

      • Replication : replicate the data key at each of its neighboring nodes.

    • Should have an associated time to live field and be eventually expired from the cache.

    Design review l.jpg
    Design review landmarks.

    • Metric

      • path length

      • neighbor-state

      • latency

      • volume

      • routing fault tolerance

      • hash table availability

    Design review45 l.jpg
    Design review landmarks.

    • Parameter

      • dimensionality of the virtual coordinate space : d

      • number of realities : r

      • number of peer nodes per zone : p

      • number of hash function : k

      • use of the RTT-weighted routing metric

      • use of the uniform partitioning

    Experiment l.jpg
    Experiment landmarks.

    Discussion l.jpg
    discussion landmarks.

    • Two keys : scalable routing and indexing

    • problem.

      • Resistant to denial of service attack.

      • Extension of CAN algorithm to handle mutable content and the design of search tech.

    Advantage l.jpg
    Advantage Internet Hosts

    • Send the user location-based targeted information on local events.

    • Classify users based on location.

    • Control the availability of data on user location.

    Ip2geo l.jpg
    IP2Geo Internet Hosts

    • method for determining the geographic location to Internet host.

      • GeoTrack : use DNS names

      • GeoPing : use network delay measurement

      • GeoCluster : use partial IP-to-location mapping information with BGP prefix information

    Alternative approach l.jpg
    Alternative approach Internet Hosts

    • Incorporating location information in DNS records.

      • Require a modification of the record structure of the DNS records.

    • Using Whois database

      • query to Whois server

      • widely used

      • IP2LL, NetGeo

    Alternative approach54 l.jpg
    Alternative approach Internet Hosts

    • Problem

      • recorded information may be inaccurate or stale

      • storage volume

  • Using traceroute

    • traceroute from a source to the target IP address and infer location information from DNS names of routers along path

    • problem : router name don’t always contain location

    • VisualRoute,Neotrace,Gtrace

  • Alternative approach55 l.jpg
    Alternative approach Internet Hosts

    • Doing an exhaustive tabulation IP address ranges and their corresponding location.

    Due to proxies l.jpg

    Many Web clients are behind proxies or IP address seen by the external network may actually correspond to a proxy.

    Whois,traceroute (GeoTrack) network delay measurement (Geoping)

    Due to proxies

    Experiment setup l.jpg
    Experiment setup address seen by the external network may actually correspond to a proxy.

    • Geographic setting

      • in USA

    • Probe machine

      • at 14 location

      • used to make delay measurement for GeoPing and to initiate traceroutes for GeoTrack.

    • BGP data

      • GeoCluster requires address prefix

    Experiment setup58 l.jpg
    Experiment setup address seen by the external network may actually correspond to a proxy.

    • Partial IP-to-location mapping information : GeoCluster

      • Hotmail

      • bCentral

      • FooTV

    Geotrack technique l.jpg
    GeoTrack technique address seen by the external network may actually correspond to a proxy.

    • Tries to infer location based on the DNS names of the host of interest or other nearby network nodes.

      • Recognizable

    • step

      • determines the network path

      • tracerouter reports DNS names

      • estimates the location of the target host as that of the last recognizable router.

    Performance geotrack netgeo l.jpg

    University host : don’t use proxy address seen by the external network may actually correspond to a proxy.

    FooTV host : behind proxy

    NetGeo : Whois based network mapping tool

    average error

    GeoTrack : 590 km

    NetGeo : 650 km

    Performance(GeoTrack & NetGeo)

    Geoping technique l.jpg
    GeoPing technique address seen by the external network may actually correspond to a proxy.

    • Determine location by exploiting the relationship between network delay and geographic distance.

    • Owing to congestion, gather several samples for delay and then pick the minimum among them.(10~15 delay samples)

    Consider component l.jpg

    Presence of circuitous geographic paths and bottlenecks. address seen by the external network may actually correspond to a proxy.

    But ,in recent years, provide richer connectivity with wide bandwidth.

    From several known locations to hosts in the univ.

    linearized distance:sum of the lengths of the individual hops

    Consider component

    Cdf of distance l.jpg

    Perform traceroute and pong measurement from 14 different sources to 256 university servers.

    For small delay value(<10ms),most of the hosts(>90%) correct.

    But for large value delay(>40ms), provide 70% confidence.(error of at least 300~400km)

    CDF of distance

    Nearest neighbor in delay space nnsd l.jpg
    Nearest neighbor in delay space(NNSD) sources to 256 university servers.

    • Motivated by the observation that hosts with similar network delays with respect to other fixed hosts tend to be located near each other.

    • Construct delay map

    • Each entry of the delay map

      • coordinates of a host a known location

      • delay vector :

    Slide65 l.jpg

    Given a new target host T,first measure the network delay from N probes.

    Next find the nearest neighbor in delay map using Eucledian distance

    several percent levels of error distance as a function of a number of probes

    Having 7 to 9 probes would be ideal for NNDS.

    Well-distributed probe yield better accuracy


    Geocluster technique l.jpg
    GeoCluster technique from N probes.

    • IP-to-location mapping

    • address form a geographic cluster

      • IP address space is broken up into a cluster.

    • The aggregation of location information enables us to identify and eliminate outliers caused by inaccuracies in the individual location data points.

    Identifying geographic clusters l.jpg
    Identifying Geographic Clusters from N probes.

    • Derive AP from BGP

      • Address prefixes(AP) enables us to identify topologically cluster.

    • These AP are geographic cluster.

    • If ISP only advertise large aggregate,this conjecture may not be correct.

      • Need sub-clustering

      • dispersion metric : quantify the geographic extent or spread of a cluster

    Experiment result l.jpg

    Locating host in univhost from N probes.

    Locating hosts in bCentral

    Experiment result

    Experiment result70 l.jpg

    Importance of sub-clustering from N probes.



    /24 cluster : ignore BGP

    BGP+subclustering is best performance

    Experiment result

    Discussion71 l.jpg
    Discussion from N probes.

    • How one can obtain IP-to-location mapping?

      • The likely location of a user can be infered from the king of information accessed or queries issued by the user.

      • Derived from accesses made by registered user

    • +GeoTrack and GeoPing

    Summary l.jpg
    Summary from N probes.

    • IP2Geo

      • GeoTrack

      • GeoPing

      • GeoCluster

        • best performance

        • dispersion metric

        • sub-clustering