cdns content outsourcing via generalized communities n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CDNs Content Outsourcing via Generalized Communities PowerPoint Presentation
Download Presentation
CDNs Content Outsourcing via Generalized Communities

Loading in 2 Seconds...

play fullscreen
1 / 57

CDNs Content Outsourcing via Generalized Communities - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

CDNs Content Outsourcing via Generalized Communities. Dimitrios Katsaros , Ph.D. @ Dept . of Computer & Communication Engineering, University of Thessaly @ Dept . of Informatics, Aristotle University. Heraklion, March 20 th , 2008. Outline of the talk. A summary of my research

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

CDNs Content Outsourcing via Generalized Communities


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. CDNs Content Outsourcing via Generalized Communities Dimitrios Katsaros, Ph.D. @ Dept. of Computer & Communication Engineering, University of Thessaly @ Dept. of Informatics, Aristotle University Heraklion, March 20th, 2008

    2. Outline of the talk • A summary of my research • Latest results: “CDNs Content Outsourcing via Generalized Communities” • (IEEE Transactions on Knowledge & Data Engineering) • PRIMITIVE: Community Identification • METHOD: Content Outsourcing for CDNs • GOAL: Access Latency Reduction & Robustness

    3. INTELLIGENCE Research areas: Ultimately  ??? Mobile/Pervasive Computing Web Pervasive Web Overlay Nets Caching & Air-Indexing Peer-to-Peer Networks Caching & Prefetching & Replication & Semistructured Data & Web views Webcasting Content Distribution Networks Location Tracking Ad Hoc Content-Based MIR Broadcasting & Data Dissemination Web Ranking & Search Engines Cooperative Caching & Sensor Node Clustering & Distributed Indexing & Coverage/Connectivity & Flash storage & Social Network Analysis Information Retrieval Sensors

    4. Content Outsourcing • The problem: flash crowds • The solution: CDNs • Reactive vs proactive solutions • Community identification • The CiBC algorithm • Evaluation

    5. A problem… • Feb 3, 2004: Google linked banner to “julia fractals” • Users clicking directed to Australian University web site • …University’s network link overloaded, web server taken down temporarily…

    6. The problem strikes again! • Feb 4, 2004: Slashdot ran the story about Google • …Site taken down temporarily…again

    7. The response from down under… • later…Paul Bourke asks: “They have hundreds (thousands?) of servers worldwide that distribute their traffic load. If even a small percentage of that traffic is directed to a single server … what chance does it have?” → Help him ←

    8. Existing approaches • Client-side proxying • Squid, Summary Cache, hierarchical cache, CoDeeN, Squirrel, Backslash, PROOFS, … • Problem: Not 100% coverage • Throw money at the problem • Load-balanced servers, fast network connections • Problem: Can’t afford or don’t anticipate need • Content Distribution Networks (CDNs) • Akamai, Digital Island, Mirror Image, …

    9. Origin Server End User End User End User End User End User End User From Internet Mazes to …

    10. Stockholm Toronto London Seattle Amsterdam Boston Chicago New York Frankfurt San Jose Paris Denver Zurich WashingtonD.C. Los Angeles Tokyo Dallas Atlanta Hong Kong Singapore Miami Sydney Content distribution

    11. Content Distribution Network (CDNs)

    12. First proposed @ IEEE JSAC’03, and What is described here today Coral X Akamai pull push Types of CDNs cooperative uncooperative

    13. Comparison

    14. Cooperative push • What to push? • Frequently accessed content (IEEE JSAC’03) • Hard to predict what will be popular! • Popularity changes rapidly, too! • Request statistics? Reactive approach • Can we devise a proactive solution? • Where to store the pushed content? • Easy; a lot of replica placement algorithms

    15. Communities as “attractors”

    16. Web-site communities DO exist hollins.edu Antonis Sidiropoulos et al., WWW Journal, 11(1), 2008

    17. “Hard” (max-flow) communities • COMMUNITY: a subset of the nodes of a graph, with the property that: (for each node of the community) The number of links to other nodes belonging to the community is larger than the number of links to nodes NOT belonging to the community

    18. “Hard”, but inefficient

    19. Generalized communities … • COMMUNITY: a subset of the nodes of a graph, with the property that: (for each node of the community) The sum of all degrees within the community is larger than the sum of all degrees toward the rest of graph

    20. Social Network Analysis • A social network is a social structure to describe social relations (wikipedia) • History of Social Network is older than everybody who is here (more than 100 years – Cooley 1909, Durkheim 1893) [book: Stanley Wasserman & Katherine Faust] • Mathematical Representation • Structural & Locational Properties • Centrality • Betweenness centrality • Roles & Positions • Dyadic & Triadic Methods

    21. Betweenness Centrality • σuw= σwu : number of shortest paths from uV towV (σuu=0) • σuw(v) : number ofshortest paths from u to w that some vertex vV lies on • Betweenness CentralityNI(v) of a vertex v is:

    22. 13 6 8 12 15 5 7 14 20 18 2 16 4 9 11 19 3 17 10 1 Y X T A U P V C B R W Q Betweenness Centrality in sample graphs

    23. 13 (0) 6 (0) 8 (26) 12 (0) 15 (0) 5 (0) 7 (156) 14 (233) 20 (0) 18 (97) 2 (0) 16 (131) 4 (96) 9 (0) 11 (0) 19 (0) 17 (1) 3 (68) 10 (0) 1 (0) Y (0) X (0) T (1,33) A (6,67) U (54) P (41) V (1,33) C (0) B (13) R (9,33) W (3,33) Q (8) Betweenness Centrality in sample graphs • Nodes with large NI: • Articulation nodes (in bridges), e.g., 3, 4, 7, 16, 18 • With large fanout, e.g., 14, 8, U

    24. Betweenness centrality in … • [WEB] Performing graph clustering and recognizing communities in Web site graphs

    25. CiBCMethod • Target: is true • CiBC method: • Building “cliques” and clusters around representative (pole) nodes (with low CB)

    26. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 1: NI Computation -O(nm) Phase 2: Initialization of cliques O(n)

    27. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)

    28. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)

    29. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)

    30. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod Phase 2: Initialization of cliques O(n)

    31. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B Complexity: O(l2) l is the number of cliques C D

    32. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B 4 3 C D

    33. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B C

    34. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities B C

    35. 8 9 7 5 6 10 1 2 11 0 3 4 CiBCMethod A Phase 3: Clique Merging &Creation of Communities Phase4: Check constraints C

    36. Evaluation … Need for: • Web site graphs • CDN • Topology • Networking issues • Request streams • Roaming over the site graph Impossible to find real data for all these … • Simulators for each of them • To compensate for the lack of any of the above

    37. Simulators • Web site graphs • Simulating the growth process of the Web • Request streams • Random surfer (following links + teleportation) • CDN • CDNSim (http://oswinds.csd.auth.gr/~cdnsim/)

    38. Competing methods • Communities-based methods • Clique Percolation Method (CPM) • Correlation Clustering Communities identification method (C3i) • Simple Web Caching (LRU) • No CDN (only the origin server) • Full Replication

    39. Metrics • Mean Response Time (MRT): the expected time for a request to be satisfied • Response time CDF: the Cumulative Distribution Function (CDF) denotes the probability of having response times lower or equal to a given response time • Replica Factor (RF): the percentage of the number of replica objects to the whole CDN infrastructure w.r.t. the total outsourced objects • Byte Hit Ratio (BHR) • Independent parameters • a) Surrogates’ cache size b) graph assortativity

    40. Situations examined • Regular traffic • Network delay dominates the other components • Flash crowd event • TCP setup delay + network delay dominate the other components

    41. Regular traffic: MRT vs. comm. strength

    42. Regular traffic: BHR vs. comm. strength

    43. Regular traffic: MRT vs. cache size

    44. Surge of requests: CiBC

    45. Surge of requests: CPM

    46. Surge of requests: C3i

    47. Surge of requests: LRU

    48. Discussion • CDNs: industrial interest for them • Content outsourcing: significant issue • Proactive content outsourcing • Discovery of communities • Placement to surrogate servers • CiBC prevails

    49. References Our work • D. Katsaros, G. Pallis, K. Stamos, A. Sidiropoulos, A. Vakali, Y. Manolopoulos. “CDNs Content Outsourcing via Generalized Communities”. IEEE Transactions on Knowledge and Data Engineering, 2008. State-of-the-art competing method • [CPM community identification method] G. Palla, I.Derenyi, I.Farkas, and T.Vicsek. Uncovering the overlapping community structure of complex networks in nature and society. Nature, 435(7043):814–818, 2005.

    50. Thanks to my collaborators at A.U.Th Thank you for your attention! Questions?