1 / 22

Comparing Hybrid Peer-to-Peer Systems

Comparing Hybrid Peer-to-Peer Systems. based on an article by Hector Garcia-Molina Beverly Yang by Tudor Balan. P2P short survey. P2P advantages

axel
Download Presentation

Comparing Hybrid Peer-to-Peer Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Comparing Hybrid Peer-to-Peer Systems based on an article by Hector Garcia-Molina Beverly Yang by Tudor Balan

  2. P2P short survey P2P advantages • Resources of many computers might be gathered to form large pools of information and significantly computing power. • Network bandwidth significantly improves as computers directly communicate P2P drawbacks • due to decentralized nature. • Ex. Gnutella(network flooding & no scallability) • Improvements • Ex. Napster (restricted server search, fractional indexing) Goal • Study the functionality of P2P systems in order to understand their tradeoffs • Concentrate on data sharing and hybrid P2P systems.

  3. Data sharing overview Pure data sharing systems Data sharing systems Hybrid data sharing systems Hybrid data sharing systems hugely popular but … well studied also? • Which is the best way to organize server nodes? • Should indexes be replicated? • Which are the common queries asked by users? • How to treat disconnected users?

  4. Problem analysis and treatment • Present several architectures for P2P data sharing systems already used or to be. • Probabilistic model for user queries and for the result size • Illustrate a systems performance evaluating model • Based on above, let’s see some comparisons.

  5. Server architectureGeneral concepts • Login • library • connecting • metadata upload • index • connection information (client IP, line speed) • local server • remote servers • local users • Query • list of desired words • satisfied (max nr of results touched) • query processing way (retrieve and intersect lists for each query word) • Download • library enrichment notification • index update • server notification when remove/logoff comes up

  6. Server architecturesLogin policies • Batch • Login entire library metadata upload • Logoff entire library metadata removed • Index={metadata of active users} • Advantages • Small index dimensions • Increased query efficiency • Disadvantages • Intense and expensive metadata update • Incremental • Metadata permanence • Difference update • Advantages • Less effort at login/logoff • Disadvantages • Increased memory requirements • Penalty on query efficiency • Need to connect to the same server(sometimes)

  7. Server architectures • Chained Architecure • Linked server nodes • Login • Local server metadata upload • Others server nodes unaffected • Query • Submitted to local server • While (not enough results OR all servers received and serviced the query) • local server contacts other servers • End While • Performance • Efficient login and downloads (local server conversation only) • Expensive query treatments (query forwarding, multiple query execution, results retrieval) • Full Replication Architecture • Intended to overcome previous disadvantages • Each server contains a complete index • Advantages • Single server queried • Login at any server (even in incremental policy case) • Disadvantages • Logins sent to all servers • High login/logoff frequency sensibility

  8. Server architectures • Hash Architecture • Login • Metadata words hashed to # servers • A given server maintains the complete lists for a subset of all words • Query • Addressed to only one server • The addressed server ask other servers the lists for the words it doesn’t have • The addressed server merges all lists • Advantages • Limited nr of servers involved in each query processing • Limited nr of servers update metadata • No results traffic (only lists) • Disadvantages • High bandwidth for lists manipulations • Unchained architecture • Set of independent servers • Login • To one isolate server • No other servers are affected • Query • The server the user has logged on • Advantage • Scalability • Disadvantage • Partial functionality • Limited query results

  9. Query model • Needed for systems comparison • Goals • Number of query results estimation • Nr. of servers to process a query • Initial computations in Chained architecture (more complex) • Subsequent derived computations (relaxing or particularizing chained architecture conditions)

  10. Query model(following)Chained architecture • Assume a query universe q1,q2… • g = the probability function that describes the query popularity, i.e g(i) is the probability that a submitted query happens to be query I • f= the probability density function that describes the query selection power. If we take a given file in a user’s library, it will match query i with probability f(i)

  11. Query model(following) • Full replication • ExServ=1  all results are local • ExRemoteResults=0 • Unchained • ExServ=1all results are local • ExRemoteResults=0 • Hash • ExRemoteResults=0

  12. Particularization In case of music share g and f might be realistically taken as:

  13. Performance model • Illustrates the way to measure the performance of a P2P system • NumServers (LAN, WAN) • Users (LAN, WAN) • {LAN,WAN} X {LAN, WAN} • Compute action costs in terms of: • CPU cycles • Interserver communication bandwidth • User-server communication bandwidth

  14. CPU consumption CPU cost variations for chained architecture (batched and incremental) Interpretation • CPU cost variations for other architectures (related to chained one) • Unchained & Full replication • query costs (batch & incremental) formula is the same • …but ExServ=1 and ExRemoteResults=0 • Hash • additional cost for list transfer (for query costs)

  15. Network consumption Client-Server byte costs Interserver byte costs • Full replication • each server sees each Login, AddFile, RemoveFile • LAN  once broadcast each message • WAN  each message sent NumServers-1 times by local server • Hash • each of selected server sees each Login, AddFile, RemoveFile • LAN  once broadcast each message • WAN AddFile sent once for each server containing lists for words contained in the name of the file • Unchained • no interserver communication • 0 login costs • Chained • query interserver communication • no login interserver communication • 0 login costs

  16. Overall performance • Hypothesis: known formulas for each action cost • Performance metric: UsersPerServer • How to compute a global formula for UsersPerServer ? (direct?...to complex) • For each resource • Assume infinite resources of other 2 types • Compute UsersPerServer for current resource (UsersPerServeri) • Compute min(UsersPerServeri)

  17. Experiments • Results of performance studies • Music sharing systems • Sharing systems for domains others than music • Maximum number of users( throughput, not response time) • Architectures={CHN,FR,HASH,FR} • Login policies={batch, incremental} • Strategies=Architectures X Login policies

  18. For MaxResults=100: • QueryLoginRation • nr of logins/sec • users supported available files expected nr of results Music share systems behaviour • Ex: For Query/Login ratio=1: • Incremental FR=54203 • Batch FR=7281 • QueryLoginRation increaseslogins/sec decreasesperformance increases • Incremental strategies outperform batch counterparts • CHN & UCH better than FR & HASH • UHCCHN(conserves performance but increases returned results) • Paradox: UCH more used than CHN • QueryLoginRatio sensitivity

  19. Memory analysis No previous treatment of memory implications Batch strategies better than the incremental counterparts Memory=f(NumServers,ActiveFrac) NumServers , Memory (for FR) Mem of incremental=1/ActiveFrac Mem of batch ActiveFrac incr. strategies come closer to batch. Memory price may eliminate worries about memory limitations Small analysis Ex1.QueryLoginRatio=.75(incr & batch CHN comparison) (69708,26828) vs (12268,28828) take batch Ex2. QueryLoginRatio=.25(incr & batch CHN comparison) (52088,9190) vs (12268,9190)  take incremental

  20. Beyond music… • We can generally compute • Expected nr of results of a query • Expected nr of servers to satisfy the query • …using • g()  distribution of query frequency • f() distribution of selection power • f and g are input for the general query model • For music f, g exponential (positively correlated)  all precedent results( the more popular a query is, the greater the selection power is) • What if we have a stock? • Select * from Product where price>10 (rare query) return as much results as • Select * from Product (common query) • No correlation • What about archive-driven company? • Rare queries (for old articles) return good results • Frequent queries (for new articles) return few results • Negative correlation

  21. Performance variation as function of correlation

  22. Final Conclusions • Chained • Best for music today • Good login, least memory • Poor if many servers involved • Full replication • Potentially good in the future when more stable connections • Hash • Has high bandwidth requirements • Good in future or in systems when servers must not exchange large metadata amounts • Unchained • Not recommended • Few results for only small performance improvement • Good when nr of results is not important • Incremental policy • Good for systems with negative correlation

More Related