comparing hybrid peer to peer systems
Skip this Video
Download Presentation
Comparing Hybrid Peer-to-Peer Systems

Loading in 2 Seconds...

play fullscreen
1 / 22

Comparing Hybrid Peer-to-Peer Systems - PowerPoint PPT Presentation

  • Uploaded on

Comparing Hybrid Peer-to-Peer Systems. based on an article by Hector Garcia-Molina Beverly Yang by Tudor Balan. P2P short survey. P2P advantages

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Comparing Hybrid Peer-to-Peer Systems' - axel

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
comparing hybrid peer to peer systems
Comparing Hybrid Peer-to-Peer Systems

based on an article by

Hector Garcia-Molina

Beverly Yang


Tudor Balan

p2p short survey
P2P short survey

P2P advantages

  • Resources of many computers might be gathered to form large pools of information and significantly computing power.
  • Network bandwidth significantly improves as computers directly communicate

P2P drawbacks

  • due to decentralized nature.
    • Ex. Gnutella(network flooding & no scallability)
  • Improvements
    • Ex. Napster (restricted server search, fractional indexing)


  • Study the functionality of P2P systems in order to understand their tradeoffs
  • Concentrate on data sharing and hybrid P2P systems.
data sharing overview
Data sharing overview

Pure data sharing systems

Data sharing systems

Hybrid data sharing systems

Hybrid data sharing systems hugely popular but …

well studied also?

  • Which is the best way to organize server nodes?
  • Should indexes be replicated?
  • Which are the common queries asked by users?
  • How to treat disconnected users?
problem analysis and treatment
Problem analysis and treatment
  • Present several architectures for P2P data sharing systems already used or to be.
  • Probabilistic model for user queries and for the result size
  • Illustrate a systems performance evaluating model
  • Based on above, let’s see some comparisons.
server architecture general concepts
Server architectureGeneral concepts
  • Login
    • library
    • connecting
    • metadata upload
    • index
    • connection information (client IP, line speed)
    • local server
    • remote servers
    • local users
  • Query
    • list of desired words
    • satisfied (max nr of results touched)
    • query processing way (retrieve and intersect lists for each query word)
  • Download
    • library enrichment notification
    • index update
    • server notification when remove/logoff comes up
server architectures login policies
Server architecturesLogin policies
  • Batch
    • Login entire library metadata upload
    • Logoff entire library metadata removed
    • Index={metadata of active users}
    • Advantages
      • Small index dimensions
      • Increased query efficiency
    • Disadvantages
      • Intense and expensive metadata update
  • Incremental
    • Metadata permanence
    • Difference update
    • Advantages
      • Less effort at login/logoff
    • Disadvantages
      • Increased memory requirements
      • Penalty on query efficiency
      • Need to connect to the same server(sometimes)
server architectures
Server architectures
  • Chained Architecure
    • Linked server nodes
    • Login
      • Local server metadata upload
      • Others server nodes unaffected
    • Query
      • Submitted to local server
      • While (not enough results OR all servers received and serviced the query)
        • local server contacts other servers
      • End While
    • Performance
      • Efficient login and downloads (local server conversation only)
      • Expensive query treatments (query forwarding, multiple query execution, results retrieval)
  • Full Replication Architecture
    • Intended to overcome previous disadvantages
    • Each server contains a complete index
    • Advantages
      • Single server queried
      • Login at any server (even in incremental policy case)
    • Disadvantages
      • Logins sent to all servers
      • High login/logoff frequency sensibility
server architectures1
Server architectures
  • Hash Architecture
    • Login
      • Metadata words hashed to # servers
      • A given server maintains the complete lists for a subset of all words
    • Query
      • Addressed to only one server
      • The addressed server ask other servers the lists for the words it doesn’t have
      • The addressed server merges all lists
    • Advantages
      • Limited nr of servers involved in each query processing
      • Limited nr of servers update metadata
      • No results traffic (only lists)
    • Disadvantages
      • High bandwidth for lists manipulations
  • Unchained architecture
    • Set of independent servers
    • Login
      • To one isolate server
      • No other servers are affected
    • Query
      • The server the user has logged on
    • Advantage
      • Scalability
    • Disadvantage
      • Partial functionality
      • Limited query results
query model
Query model
  • Needed for systems comparison
  • Goals
    • Number of query results estimation
    • Nr. of servers to process a query
  • Initial computations in Chained architecture (more complex)
  • Subsequent derived computations (relaxing or particularizing chained architecture conditions)
query model following chained architecture
Query model(following)Chained architecture
  • Assume a query universe q1,q2…
  • g = the probability function that describes the query popularity, i.e g(i) is the probability that a submitted query happens to be query I
  • f= the probability density function that describes the query selection power. If we take a given file in a user’s library, it will match query i with probability f(i)
query model following
Query model(following)
  • Full replication
    • ExServ=1  all results are local
    • ExRemoteResults=0
  • Unchained
    • ExServ=1all results are local
    • ExRemoteResults=0
  • Hash
    • ExRemoteResults=0

In case of music share g and f might be realistically taken as:

performance model
Performance model
  • Illustrates the way to measure the performance of a P2P system
  • NumServers (LAN, WAN)
  • Users (LAN, WAN)
  • {LAN,WAN} X {LAN, WAN}
  • Compute action costs in terms of:
    • CPU cycles
    • Interserver communication bandwidth
    • User-server communication bandwidth
cpu consumption
CPU consumption

CPU cost variations for chained architecture (batched and incremental)


  • CPU cost variations for other architectures (related to chained one)
  • Unchained & Full replication
    • query costs (batch & incremental) formula is the same
    • …but ExServ=1 and ExRemoteResults=0
  • Hash
    • additional cost for list transfer (for query costs)
network consumption
Network consumption

Client-Server byte costs

Interserver byte costs

  • Full replication
    • each server sees each Login, AddFile, RemoveFile
    • LAN  once broadcast each message
    • WAN  each message sent NumServers-1 times by local server
  • Hash
    • each of selected server sees each Login, AddFile, RemoveFile
    • LAN  once broadcast each message
    • WAN AddFile sent once for each server containing lists for words contained in the name of the file
  • Unchained
    • no interserver communication
    • 0 login costs
  • Chained
    • query interserver communication
    • no login interserver communication
    • 0 login costs
overall performance
Overall performance
  • Hypothesis: known formulas for each action cost
  • Performance metric: UsersPerServer
  • How to compute a global formula for UsersPerServer ? (direct? complex)
  • For each resource
    • Assume infinite resources of other 2 types
    • Compute UsersPerServer for current resource (UsersPerServeri)
  • Compute min(UsersPerServeri)


  • Results of performance studies
  • Music sharing systems
  • Sharing systems for domains others than music
  • Maximum number of users( throughput, not response time)
  • Architectures={CHN,FR,HASH,FR}
  • Login policies={batch, incremental}
  • Strategies=Architectures X Login policies
music share systems behaviour
For MaxResults=100:
  • QueryLoginRation
    • nr of logins/sec
    • users supported

available files

expected nr of results

Music share systems behaviour
  • Ex: For Query/Login ratio=1:
    • Incremental FR=54203
    • Batch FR=7281
  • QueryLoginRation increaseslogins/sec decreasesperformance increases
  • Incremental strategies outperform batch counterparts
  • CHN & UCH better than FR & HASH
  • UHCCHN(conserves performance but increases returned results)
  • Paradox: UCH more used than CHN
  • QueryLoginRatio sensitivity
memory analysis
Memory analysis

No previous treatment of memory implications

Batch strategies better than the incremental counterparts


NumServers , Memory (for FR)

Mem of incremental=1/ActiveFrac Mem of batch

ActiveFrac incr. strategies come closer to batch.

Memory price may eliminate worries about memory limitations

Small analysis

Ex1.QueryLoginRatio=.75(incr & batch CHN comparison) (69708,26828) vs (12268,28828)

take batch

Ex2. QueryLoginRatio=.25(incr & batch CHN comparison) (52088,9190) vs (12268,9190)

 take incremental

beyond music
Beyond music…
  • We can generally compute
    • Expected nr of results of a query
    • Expected nr of servers to satisfy the query
  • …using
    • g()  distribution of query frequency
    • f() distribution of selection power
  • f and g are input for the general query model
  • For music f, g exponential (positively correlated)  all precedent results( the more popular a query is, the greater the selection power is)
  • What if we have a stock?
    • Select * from Product where price>10 (rare query) return as much results as
    • Select * from Product (common query)
    • No correlation
  • What about archive-driven company?
    • Rare queries (for old articles) return good results
    • Frequent queries (for new articles) return few results
    • Negative correlation
final conclusions
Final Conclusions
  • Chained
    • Best for music today
    • Good login, least memory
    • Poor if many servers involved
  • Full replication
    • Potentially good in the future when more stable connections
  • Hash
    • Has high bandwidth requirements
    • Good in future or in systems when servers must not exchange large metadata amounts
  • Unchained
    • Not recommended
    • Few results for only small performance improvement
    • Good when nr of results is not important
  • Incremental policy
    • Good for systems with negative correlation