1 / 31

‘Small-World File-Sharing Communities’ Iamnitchi, A. Ripenau, M. Foster, I.

‘Small-World File-Sharing Communities’ Iamnitchi, A. Ripenau, M. Foster, I. İsmail GÜNES 2003700287. OVERVIEW. Introduction Intuition The Data-Sharing Graph Three Data-Sharing Communities Small-World Data-Sharing Graph Human Nature or Zipf ’s Law

edda
Download Presentation

‘Small-World File-Sharing Communities’ Iamnitchi, A. Ripenau, M. Foster, I.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ‘Small-World File-Sharing Communities’Iamnitchi, A. Ripenau, M. Foster, I. İsmail GÜNES 2003700287

  2. OVERVIEW • Introduction • Intuition • The Data-Sharing Graph • Three Data-Sharing Communities • Small-World Data-Sharing Graph • Human Nature or Zipf ’s Law • Small-World Data-Sharing Graph: Significance for Mechanism Design • Conclusion

  3. Introduction • To optimize ‘performance trade-off ’s, understand user behavior • Analyzing user behavior in 3 file-sharing communities to design efficient mechanisms • Propose a new structure(data-sharing graph) and justify it’s uses

  4. Intuition • Understanding the system may help efficient solution design; • Relationship between file popularity & cache size. • Search is guided first to the nodes with high degree. • Study of networks started with Euler’s solution, gained momentum with internet. • Recurring patterns in real networks; • Power-law distribution, • Small worlds

  5. The Data Sharing Graph • Capturing the virtual relationship between users who requests the same data. • Definition : Graph in which nodes are users and an edge connects 2 users with similar interests in data. • Analyzing the graphs of 3 file-sharing communities, • Discovering these graphs are small worlds, • Identify new structures by data-sharing graph, in real networks.

  6. Three Data-Sharing Communities • Three communities; • 1) A high energy physics collaboration, • 2) The web, • 3) The Kazaa, peer to peer file sharing system. • Description of each community and its traces, • The file popularity and user activity distributions of each trace have high impact; • A user with high activity Highly connected node, • Highly popular files Produce dense clusters.

  7. Three Data-Sharing Communities • The D0 Experiment: a High-Energy Physics Collaboration; • A virtual organization comprising hundreds of physicists from more than 70 institutions in 18 countries. • The purpose is to share the worldwide physics results. • Logs are analyzed over 6 months of 2002, about 23,000 jobs submitted by more than 300 users and involving more than 2,500,000 requests for about 200,000 distinct files.

  8. The D0 Experiment(Cont’d) • The distribution of the number of files per job and file popularity.

  9. The D0 Experiment(Cont’d) • The daily activity • In number of requests per day • user activity • In number of of requests submitted by each user during the 6-month interval • In D0, file popularity doesn’t fit the Zipf ’s law typical of web requests.

  10. The Web • A five-day record from May 1999 of all HTTP requests from a large organization(Boeing) to the web. • Consider a user as an IP address. • 60,826 users sent 16,5 million web requests, of which 4,7 million requests were distinct.

  11. The Kazaa Peer-to-Peer Network • A popular peer-to-peer file-sharing system with more than 4 million con-current users. • Kazaa nodes dynamically elect ‘supernodes’ • Regular nodes connect to super-nodes and act as querying clients to super-nodes • Control information is encrypted

  12. The Kazaa Peer-to-Peer Network • Only the information about the files requested for download can be gathered, the information about the files searched for can not be gathered. • The five days of Kazaa traffic, during which 14,404 users downloaded 976,184 files, of which 116,509 were distinct were accessed.

  13. SMALL-WORLD DATA-SHARING GRAPH • Users are nodes in the graph and 2 users are connected if they have similar interests in data • Similarity criteria: Size of the intersection of their request sets compared to some thresold • Similarity criterion has two degrees of freedom : • The length of the time interval • The thresold on the number of common requests

  14. Distribution of Weights • Think of data-sharing graphs as weighted graphs • 2 users are connected by an edge labeled with the number of shared requests during a period. • The distribution of weights highlights differences among the sharing communities;

  15. Degree Distribution • The Kazaa data-sharing graph is the closest to a power-law, while D0 graphs clearly are not power-law.

  16. Small-World Characteristics • Watts-Strogatz definition: A graph G(V,E) is a small world if it has small average path length and large clustering coefficient, much larger than that of a random graph with the same number of nodes and edges. • The Clustering Coefficient: A measure of how well connected a node’s neighbors are with each other. CCu= CC1 = CC2 =

  17. Small-World Characteristics(cont’d) =(Clustering Coefficient of a random graph) • Average Path Length:The average of all distances. • For large graphs, measuring all-pair distances is computationally expensive. • Approximation is made(%5); Ir = (Average Path Length)

  18. Small-World Characteristics(cont’d) • The data-sharing graphs for the three systems display small-world properties(large clustering coefficient, small average path length)

  19. Small-World Characteristics(cont’d)

  20. Small-World Characteristics(cont’d) • The data-sharing graphs with different durations and similarity criteria are all small worlds • Well connected clusters • Small path between any 2 nodes

  21. HUMAN NATURE OR ZIPF ’ S LAW ? • Question: Are the small-world consequences of previously documented patterns or do they reflect a new observation concerning user’s preferences in data? • 2 directions to answer the causality question: • Stress data-sharing graph and question the large clustering coefficient as a result of the graph definition-Affiliation networks • Analyze the effect of well-known patterns in file access(time locality, file popularity distribution)-Influences of zipf ’s law and time and space locality

  22. Affiliation(Preference) Networks • A social network in which the actors are linked by common membership in groups or clubs of some kind. • Collaboration networks, movie actors etc. • Bipartite graphs; • 2 types of vertices, for actors and groups • Edges link nodes of different types only • Unipartite projection; • Undirected edges that connect actors in the same group

  23. Affiliation Networks(cont’d) • Characteristics of projections of bipartite graphs: 1.Larger clustering coefficient than random graphs • Members of a group will form a complete subgraph in the one-mode projection 2.Degree distribution is far from the Poisson distribution of a random graph. - 2 degree distributions(of actors and of groups)

  24. Affiliation Networks(cont’d) • Consider a bipartite affiliation graph of N actors and M groups • Pj : The probability of that an actor is part of exactly j groups • Pk : The probability that a group consists of exactly k members 3 functions defined to compute avg. node degree and clustering coef. of unipartite affiliation network : f0(x) = AvgDegree = G’0(1) g0(x) = Clustering Coef.(C) = G0(x) = f0( g’0(x)/g’0(1))

  25. Affiliation Networks(cont’d) • Table confirms our inituition; • Difference between the values of measured and modeled parameters • Table shows 2 observations; • Actual clustering coefficient is always larger than theoritical one, The average degree is always smaller than theorotical one • We Can compare 3 communities by comparing distance from theoretical model

  26. Influences of Zipf ’s Law and Time and Space Locality • Event frequency follows a Zipf ’s distribution in many systems • Time Locality : Users are not uniformly active during a period, but follow some patterns(download more in weekends, holidays etc.) The Question is, “Are the patterns we identified in the data-sharing graph, especially the large clustering coefficient, an inherent consequence of these well-known behaviors?” - To answer, generate random traces that keep the documented characteristics but break the user-request association - By these synthetic traces build the resulting data-sharing graphs and analyze, compare their properties.

  27. Synthetic Traces • The content of traces are user ID, item requested and request time. • (1)User-Time: • (2)Request-Time: • (3)User-Request: • (4)User: • (5)Time: • (6)Request: • Aim is • To break the relationship (3), requires the break of (1) and (2), or both • To preserve the relationship (4), (5) and (6)

  28. Properties of Synthetic Data-Sharing Graphs: • Three characteristics of the synthetic data-sharing graphs are relevant: 1) The number of nodes in synthetic graphs is significantly different than in their corresponding real graphs 2) The synthetic data-sharing graphs are always connected 3) The synthetic data-sharing graphs are “less” small worlds than their corresponding real graphs These imply that user preferences for files have significant influence on the data-sharing graphs Identifying small-world properties is not sufficient to characterize the clustering of users.

  29. Small-World Data-Sharing Graph: Significance for Mechanism Design • The data-sharing graph can identify the structure of an organization by identifying interest-based clusters of users and then use this information to optimize an organization’s infrastructure (servers, network topology etc.) • Mechanism design of the data-sharing graph from 2 perspective: • Its structure • Its small-world properties

  30. Small-World Data-Sharing Graph: Significance for Mechanism Design • Relevance of the Graph Structure: • Efficient update • File replication • Job management • Relevance of the Small-World Feature: • File-location

  31. Conclusion • A new structure “Data Sharing Graph” is proposed • Acquires the relationship between users who request the same data • The properties of data sharing graphs in 3 communities are presented • The effects of zipf’s law and human nature on small-world characteristics are examined • The properties may be used for new peer-to-peer mechanism design

More Related