1 / 33

On Improving the Performance Dependability of Unstructured P2P Systems via Replication

On Improving the Performance Dependability of Unstructured P2P Systems via Replication. ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA Institute of Industrial Science, University of Tokyo. E-mail: anirban@tkl.iis.u-tokyo.ac.jp. PRESENTATION OUTLINE. Introduction Related Work System Overview

gusty
Download Presentation

On Improving the Performance Dependability of Unstructured P2P Systems via Replication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Improving the PerformanceDependability of Unstructured P2PSystems via Replication ANIRBAN MONDAL YI LIFU MASARU KITSUREGAWA Institute of Industrial Science, University of Tokyo. E-mail: anirban@tkl.iis.u-tokyo.ac.jp

  2. PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work

  3. INTRODUCTION • P2P systems are becoming increasingly popular • A dependable P2P system is the need of the hour • Two perspectives of dependability • system reliability • the availability of the individual peers • system performance • data availability • We define a performance-dependable P2P system as one that the users can rely on for obtaining data files of their interest in real-time. • We focus on improving the performance-dependability of unstructured P2P systems via dynamic replication.

  4. Motivation • Free-riders • A majority of the peers typically download data from a small percentage of peers that offer data • High skews in the initial data distribution • A disproportionately high number of queries need to be answered by a few ‘hot’ peers • Severe load imbalance throughout the system. • Job queues of the ‘hot’ peers keep increasing • Increased waiting times  high response times

  5. Motivation • Free-riders • A majority of the peers typically download data from a small percentage of peers that offer data • High skews in the initial data distribution • A disproportionately high number of queries need to be answered by a few ‘hot’ peers • Severe load imbalance throughout the system. • Job queues of the ‘hot’ peers keep increasing • Increased waiting times  high response times This decreases the dependability of the system.

  6. The Challenges • Sheer size of P2P networks • Heterogeneity • CPU capacity • Available disk space • Transfer rate of connections • Dynamism of the environment • Peers joining / leaving the system • Hot data becoming cold and vice versa

  7. MAIN CONTRIBUTIONS • A dynamic data placement strategy involving data replication • Objective: to reduce the loads of the overloaded peers • A dynamic query redirection technique • Objective: to reduce response times

  8. PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work

  9. RELATED WORK • Broadcast (Gnutella) • Centralized (Napster) • Routing indices [Crespo2002] • Distributed hash tables • Chord [Stoica2001] • Pastry [Rowstron2001]

  10. RELATED WORK (CONT.) • [Kangasharju2002] • investigates optimal replication of content in P2P systems • adaptive, fully distributed algorithm that dynamically replicates content in a near-optimal manner • [Cohen2002, Lv2002] facilitate search via replication. • Dependability via load-balancing in structured P2P systems (using DHTs) • [Dabek2001] • [Rao2003] • [Triantafillou2003] • divides system into clusters based on semantic categories • discusses dependability via inter-cluster and intra-cluster load-balancing

  11. How this proposal differs from our previous spatial GRID proposal? • Our GRID-related work • Imposes structure on system • Data movement in KB range • Data scattering avoidance • Individual nodes are usually dedicated and expected to be available most of the time. • Main aim is load-balancing • This proposal • No structure imposed • Data movement in MB/GB • Data scattering is ok • Individual nodes may join/leave anytime. • Replication, not load-balancing

  12. PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work

  13. SYSTEM OVERVIEW • Each peer is assigned a globally unique identifier PID • Broadcast-based search • Every peer maintains its own access statistics • Number of accesses made to each of its data files. • List of peers which has downloaded each of its files • Given that very ‘hot’ files may be aggressively downloaded by hundreds of peers very quickly, a peer keeps track only of those peers which have directly downloaded from itself. • Every peer provides a certain amount Spaceof its disk space for replication. • LRU scheme deployed for Space • Periodic deletion of unused replicas • We sacrifice replica consistency for improving query response times.

  14. SYSTEM OVERVIEW (CONT.) • Distance between two peers: communication time between them • Two peers are regarded as neighbours if they are directly connected to each other. • Periodic exchange of status messages between neighbours • Load information • Available disk space information

  15. SYSTEM OVERVIEW (CONT.) • Load of a peer: number of queries waiting in peer’s job queue • Load normalized w.r.t. CPU capacity • Assumptions • Peers know transfer rates between themselves and other peers. • Every peer knows availability information of its neighbouring peers.

  16. PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work

  17. Replication Scheme • Each peer P periodically checks its neighbours’ loads • If P’s load exceeds the average loads of its neighbouring peers by 10%, replication is initiated. • Selection of hot data files • Using recent access statistics information • P sorts its files in desc. order of access frequencies • P traverses this sorted list of data files and selects as ‘hot’ files the top N files whose access frequency exceeds a pre-defined threshold Tfreq. • Number of replicas • For every Nd accesses to D, a new replica is created for D. • Tfreq and Nd are pre-specified at design time.

  18. Criteria for Selection of destination peer Dest for replication • Dest should have a high probability of being online. • PDest should have adequate available disk space. • Load difference with Dest should be significant. • Transfer time TRep with Dest should be minimized. • Dest should be chosen from the peers which have already downloaded that data file. • This makes TRep effectively equal to 0.

  19. Replication Strategy • For each ‘hot’ data file D, the ‘hot’ peer PHot sends a message to each peer which has downloaded D • The peers in which a copy of D exists reply to PHot with their respective load and available disk space • Only the peers with high availability and sufficient available disk space are candidates • Among these candidate peers, PHot first puts the peer MIN with the lowest load into a set Candidate. • Peers whose normalized load difference with MIN is less than δ are also put into Candidate. • δ is a small integer • The peer in Candidate whose available disk space is maximum is selected as the destination peer.

  20. Algorithm for selecting the destination peer

  21. Query Redirection to replicas • What happens when a peer PIssue issues a query Q for a data item D to a ‘hot’ peer PHot? • PHot needs to redirect Q to a peer REDIRECT containing Di’s replica, if any such replica exists. • Objective: To minimize Q’s response time • PHot checks the list of peers having Di’s replica • Selection criteria for query redirection • REDIRECT should be highly available. • Load difference between PHot and REDIRECT should be significant. • Transfer time between REDIRECT and PIssue should be low.

  22. Query Redirection (Cont.) • The ‘hot’ peer PHot first selects a set of peers • which contain a replica of the data file D • whose load difference with itself exceeds TDiff . • TDiff is a parameter which is application-dependent and subjective. • Among these selected peers, the peer with the maximum transfer rate with the query issuing peer PIssue is selected for query redirection.

  23. Query redirection algorithm

  24. PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work

  25. Performance Evaluation • Investigates the following • Effect of variations in workload skew • Effect of variations in number of peers • Performance metric: • Average Response Time

  26. PARAMETERS USED IN PERFORMANCE EVALUATION

  27. Average Responsetimes at the hot nodes

  28. Snapshot of Loaddistribution

  29. Snapshot of Loaddistribution

  30. Effect of varying the Workload Skew

  31. Effect of varying the number of peers

  32. PRESENTATION OUTLINE Introduction Related Work System Overview Proposed Replication Scheme Performance Evaluation Conclusion and Future Work

  33. CONCLUSION AND FUTURE WORK • We have proposed a strategy for enhancing the dependability of P2P systems via dynamic replication. • Our strategy takes free-riders into account. • Our performance evaluation demonstrates the effectiveness of our replication-based strategy. • Future Scope of Work • Dealing with very large data items e.g., video files • Cost-effective integration into existing P2P systems • Load-balancing

More Related