slide1
Download
Skip this Video
Download Presentation
ptp

Loading in 2 Seconds...

play fullscreen
1 / 72

ptp - PowerPoint PPT Presentation


  • 189 Views
  • Uploaded on

ptp. Soo-Bae Kim 4/09/02 [email protected] What is peer to peer?. Communication without server advantage No single point of failure Easy data sharing disadvantage No service quality guarantee Increased network traffic . Outline. Chord? Chord protocol simulation and experiment result

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ptp' - Michelle


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

ptp

Soo-Bae Kim

4/09/02

[email protected]

what is peer to peer
What is peer to peer?
  • Communication without server
  • advantage
    • No single point of failure
    • Easy data sharing
  • disadvantage
    • No service quality guarantee
    • Increased network traffic
outline
Outline
  • Chord?
  • Chord protocol
  • simulation and experiment result
  • conclusion
what is chord
What is chord?
  • Provides fast distributed computation of a hash function mapping keys to nodes responsible for them.
  • Use a variant of a Consistent hashing
    • improve scalability:node needs routing information about only a few other nodes.
    • when new nodes join the system, only fraction of keys are moved to different location
  • Simplicity,provable correctness and provable performance
related work
Related work.
  • DNS(domain name server)
    • provide host name to IP address
  • Freenet peer to peer storage system
    • like chord,decentralized and automatically adapts when hosts join and leave.
    • Provide a degree of anonymity
    • ohaha system
related work6
Related work.
  • Globe system
    • information about an object is stored in a particular leaf domain and pointer cached provide search.
  • Distributed data location protocol by plaxton
    • queries never travel further in network distance than node where the key is stored
    • Ocean store
related work7
Related work.
  • CAN(content addressable network)
    • use a d-dimensional cartesian space to implement a distributed hash table that maps keys onto values.
chord s merit
Chord’s merit
  • Load balance
  • decentralization
  • scalability
  • availability
  • flexible naming
example of chord application
Example of chord application
  • Cooperative mirroring
  • time-shared storage
  • distributed indexes
  • large-scale combinatorial search
base chord protocol
Base chord protocol
  • how to find the location of keys.
    • Consistent hashing which has several good properties.
  • when new nodes join the system
    • only fraction of keys are moved
    • chord requires messages.
  • in N-node system,each node maintain information only aboutother nodes,and lookup requires messages
consistent hashing
Consistent hashing
  • Consistent hash function assigns each node and key an m-bit identifier using a base hash function
  • k is assigned to the first node(successor node)whose identifier is equal to or follows k in identifier space.
  • When a node n joins, certain keys previously assigned to n’s successor now become assigned to n.
scalable key location
Scalable key location
  • Resolution scheme is inefficient because it may require traversing all N nodes to find the appropriate mapping. so additional routing is needed.
node join
Node join
  • When nodes join(leave), chord preserve two invariant
    • each node’s successor is correctly maintained.
    • For every key k,node successor(k) is responsible for k.
  • Chord perform three tasks to preserve the invariants.
concurrent operations and failure
Concurrent operations and failure
  • Separate our correctness and performance goals.
  • Stabilization protocol is used to keep nodes’ successor pointers.
    • Preserve reachability of exiting nodes.
failures and replication
Failures and replication
  • When a node n fails,nodes whose finger table include n must find n’s successor.
  • The key step in failure recovery is maintaining correct successor pointers.
    • Using successor list
    • Even though,before stabilization,attempt to send requests through the failed node,lookup would be able to proceed by another path.
  • The successor list mechanism also helps higher layer software replicate data.
simulation and experimental results
Simulation and experimental results
  • Recursive chord protocol
    • intermediate node forwards a request to the next node until it reachs the successor.
  • Load balance
path length
Path length
  • Path length:the number of nodes traversed during lookup operation.
path length20
Path length
  • The result show that the path length is about
    • since the distance is random,we expect half the log N bits to be one.
simultaneous node failures
Simultaneous node failures
  • Randomly select a fraction p of nodes that fail.
  • Since this is just a fraction of keys expected to be lost due to the failure of the responsible nodes,we conclude that there is no significant lookup failure in chord.
lookup during stabilization
Lookup during stabilization
  • Lookup during stabilization may fail two reasons.
    • The node responsible for the key may have failed.
    • Some nodes’ finger tables and predecessor pointers may be inconsistent.
experimental results
Experimental results
  • Measured latency
    • lookup latency grows slowly with total number of nodes.
conclusion
conclusion
  • Many distributed peer to peer application need to determine the node that stores a data.The chord protocol solves this problem in decentralized manner.
  • Chord scales well with number of nodes.
  • Recovers from large number of simultaneous node failure and joins.
  • Answer most lookups correctly even during stabilization.
scalable content addressable network
Scalable content addressable network
  • Application of CAN
  • CAN design
  • design improvement
  • experimental result
application of can
Application of CAN?
  • CAN provides a scalable indexing mechanism
  • Most of the peer to peer designs are not scalable.
    • Napster: a user queries central server : not completely decentralized ==> expensive and vulnerable for scalability
    • Gnutella : using flooding for requesting ==> not scalable
application of can27
Application of CAN?
  • For storage management system, CAN can serve efficient insertion and retrieval of content.
    • OceanStore,Farasite,Publius
  • DNS
proposed can
Proposed CAN
  • Composed of many individual nodes.
  • A node holds information about a small number of adjacent zone
  • Each node stores a entire hash table
  • completely distributed,scalable and fault - tolerant
design can
Design CAN
  • routing in CAN :using its neighbor
  • CAN construction
node departure recovery and can maintenance
Node departure,recovery and CAN maintenance
  • when node leaves a CAN or its neighbor has died.
    • if zone of one of neighbors can be merged, this hands over.
    • if not, zone is handed to neighbor whose current zone is smallest.
      • Using timer proportional volume.
node departure recovery and can maintenance32
Node departure,recovery and CAN maintenance
  • Under normal condition, a node sends a periodic messages to each of its neighbors.
  • The prolonged absence of an updated message signals its failure.
design improvement
Design improvement
  • Multi-dimension
  • Realities
  • Better CAN routing metrics
  • overloading coordinate zones
  • multiple hash function
  • topologically-sensitive construction.
  • More uniform partioning
  • cashing and replication
multi dimension coordinate spaces
Increasing dimension =>reduce routing path length,path penalty

path length= hops

improve routing fault tolerance owing to having more next hops.

Multi-dimension coordinate spaces.
realities multiple coordinate spaces
Realities: assigned a different zone in each coordinate space.

Contents of the hash table are replicated on every realities.

Improve routing falult tolerance,data availability and and path length

Realities:multiple coordinate spaces
multi dimension vs multi realities
Multi dimension vs multi realities
  • Both results in shorter path length,but per-node neighbor state and maintenance traffic
  • better performance at multi-dimension
  • consider other benefits of multi realities
better can routing metrics
Better CAN routing metrics
  • Metric to better reflect the underlying IP = network level round trip time RTT before cartesian distance.
  • Avoid unnecessary long hops.
  • RTT-weighted routing aims to reducing the latency of individual hops.
  • Per-hop latency = overall path latency / path length
overloading coordinate zone
Overloading coordinate zone
  • Allow multiple node to share the same zone.(peer:node that share same zone)
  • MAXPEER
  • node maintains a list of peer and neighbor.
  • When node A join, an existence B node check whether it has fewer than MAXPEER.
overloading coordinate zone39
Overloading coordinate zone
    • If fewer,node A join.
    • If not,zone is split into half.
  • Advantage
    • reduced path length,path and per-hop latency
    • improve fault tolerance
multiple hash function
Use k different hash function to map a single key onto k points.

Reducing average query latency

but increasing the size of the database and query traffic by a k factor.

Multiple hash function
topologically sensitive construstion of can overlay network
Constrct CAN based on their relative distance from the landmarks.

Latency stretch: ratio of the latency on the CAN network to the average latency on the IP network.

Topologically-sensitive construstion of CAN overlay network
more uniform partitioning
Achieve load lancing.

Not sufficient for true load balancing because some pairs will be more popular than others : hot spot

V = (entire coordinate space:Vt) / (node:n)

More uniform partitioning
caching and replication tech for hot spot management
Caching and replication tech. for “hot spot” management
  • To make popular data keys widely available.
    • Caching : first check requested data key.
    • Replication : replicate the data key at each of its neighboring nodes.
  • Should have an associated time to live field and be eventually expired from the cache.
design review
Design review
  • Metric
    • path length
    • neighbor-state
    • latency
    • volume
    • routing fault tolerance
    • hash table availability
design review45
Design review
  • Parameter
    • dimensionality of the virtual coordinate space : d
    • number of realities : r
    • number of peer nodes per zone : p
    • number of hash function : k
    • use of the RTT-weighted routing metric
    • use of the uniform partitioning
discussion
discussion
  • Two keys : scalable routing and indexing
  • problem.
    • Resistant to denial of service attack.
    • Extension of CAN algorithm to handle mutable content and the design of search tech.
advantage
Advantage
  • Send the user location-based targeted information on local events.
  • Classify users based on location.
  • Control the availability of data on user location.
ip2geo
IP2Geo
  • method for determining the geographic location to Internet host.
    • GeoTrack : use DNS names
    • GeoPing : use network delay measurement
    • GeoCluster : use partial IP-to-location mapping information with BGP prefix information
alternative approach
Alternative approach
  • Incorporating location information in DNS records.
    • Require a modification of the record structure of the DNS records.
  • Using Whois database
    • query to Whois server
    • widely used
    • IP2LL, NetGeo
alternative approach54
Alternative approach
    • Problem
      • recorded information may be inaccurate or stale
      • storage volume
  • Using traceroute
    • traceroute from a source to the target IP address and infer location information from DNS names of routers along path
    • problem : router name don’t always contain location
    • VisualRoute,Neotrace,Gtrace
alternative approach55
Alternative approach
  • Doing an exhaustive tabulation IP address ranges and their corresponding location.
due to proxies
Many Web clients are behind proxies or firewalls.so IP address seen by the external network may actually correspond to a proxy.

Whois,traceroute (GeoTrack) network delay measurement (Geoping)

Due to proxies
experiment setup
Experiment setup
  • Geographic setting
    • in USA
  • Probe machine
    • at 14 location
    • used to make delay measurement for GeoPing and to initiate traceroutes for GeoTrack.
  • BGP data
    • GeoCluster requires address prefix
experiment setup58
Experiment setup
  • Partial IP-to-location mapping information : GeoCluster
    • Hotmail
    • bCentral
    • FooTV
geotrack technique
GeoTrack technique
  • Tries to infer location based on the DNS names of the host of interest or other nearby network nodes.
    • Recognizable
  • step
    • determines the network path
    • tracerouter reports DNS names
    • estimates the location of the target host as that of the last recognizable router.
performance geotrack netgeo
University host : don’t use proxy

FooTV host : behind proxy

NetGeo : Whois based network mapping tool

average error

GeoTrack : 590 km

NetGeo : 650 km

Performance(GeoTrack & NetGeo)
geoping technique
GeoPing technique
  • Determine location by exploiting the relationship between network delay and geographic distance.
  • Owing to congestion, gather several samples for delay and then pick the minimum among them.(10~15 delay samples)
consider component
Presence of circuitous geographic paths and bottlenecks.

But ,in recent years, provide richer connectivity with wide bandwidth.

From several known locations to hosts in the univ.

linearized distance:sum of the lengths of the individual hops

Consider component
cdf of distance
Perform traceroute and pong measurement from 14 different sources to 256 university servers.

For small delay value(<10ms),most of the hosts(>90%) correct.

But for large value delay(>40ms), provide 70% confidence.(error of at least 300~400km)

CDF of distance
nearest neighbor in delay space nnsd
Nearest neighbor in delay space(NNSD)
  • Motivated by the observation that hosts with similar network delays with respect to other fixed hosts tend to be located near each other.
  • Construct delay map
  • Each entry of the delay map
    • coordinates of a host a known location
    • delay vector :
slide65
Given a new target host T,first measure the network delay from N probes.

Next find the nearest neighbor in delay map using Eucledian distance

several percent levels of error distance as a function of a number of probes

Having 7 to 9 probes would be ideal for NNDS.

Well-distributed probe yield better accuracy

NNDS
geocluster technique
GeoCluster technique
  • IP-to-location mapping
  • address form a geographic cluster
    • IP address space is broken up into a cluster.
  • The aggregation of location information enables us to identify and eliminate outliers caused by inaccuracies in the individual location data points.
identifying geographic clusters
Identifying Geographic Clusters
  • Derive AP from BGP
    • Address prefixes(AP) enables us to identify topologically cluster.
  • These AP are geographic cluster.
  • If ISP only advertise large aggregate,this conjecture may not be correct.
    • Need sub-clustering
    • dispersion metric : quantify the geographic extent or spread of a cluster
experiment result70
Importance of sub-clustering

cthresh

fthresh

/24 cluster : ignore BGP

BGP+subclustering is best performance

Experiment result
discussion71
Discussion
  • How one can obtain IP-to-location mapping?
    • The likely location of a user can be infered from the king of information accessed or queries issued by the user.
    • Derived from accesses made by registered user
  • +GeoTrack and GeoPing
summary
Summary
  • IP2Geo
    • GeoTrack
    • GeoPing
    • GeoCluster
      • best performance
      • dispersion metric
      • sub-clustering
ad