sampling techniques for large dynamic graphs n.
Skip this Video
Loading SlideShow in 5 Seconds..
Sampling Techniques for Large, Dynamic Graphs PowerPoint Presentation
Download Presentation
Sampling Techniques for Large, Dynamic Graphs

Loading in 2 Seconds...

play fullscreen
1 / 16

Sampling Techniques for Large, Dynamic Graphs - PowerPoint PPT Presentation

  • Uploaded on

Sampling Techniques for Large, Dynamic Graphs. Daniel Stutzbach – University of Oregon Reza Rejaie – University of Oregon Nick Duffield – AT&T Labs—Research Subhabrata Sen – AT&T Labs—Research Walter Willinger – AT&T Labs—Research. Global Internet Symposium Barcelona, Spain

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Sampling Techniques for Large, Dynamic Graphs' - iliana-house

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
sampling techniques for large dynamic graphs

Sampling Techniques for Large, Dynamic Graphs

Daniel Stutzbach – University of Oregon

Reza Rejaie – University of Oregon

Nick Duffield – AT&T Labs—Research

Subhabrata Sen – AT&T Labs—Research

Walter Willinger – AT&T Labs—Research

Global Internet Symposium

Barcelona, Spain

April 28th, 2006

  • P2P systems are very popular in practice.
    • Several million simultaneous users collectively.
    • 60% of all Internet traffic [CacheLogic Research 2005]
  • Measurement studies aid understanding existing systems and user behavior.
  • Capturing global state is often infeasible.
    • P2P systems are large and rapidly changing.
  • Sampling is therefore a natural approach, and has been used in several earlier measurement studies.
  • But how do we know the samples are representative?
the problem
The Problem
  • We focus on sampling peer properties.
    • Peer degree
    • Link Bandwidth
    • Number of shared files
    • Remaining uptime
  • Sampling peer properties occurs in two steps:
    • Discover and select peers
    • Collect the measurements
  • Selecting peers uniformly at random is hard.
    • Peer dynamics can introduce bias.
    • The graph topology can introduce bias.
  • We examine these two problems separately.
temporal causes of bias
Temporal Causes of Bias
  • Define Vt as the set of peers present at time t.
  • We gather samples over a measurement window of length Δ.
  • The most common approach is to gather peers from the set present during the window:
example of bias towards short lived peers
Example of Bias towards Short-Lived Peers


Long-lived peer

Short-lived peers

  • Consider a simple two-peer system, containing:
    • One long-lived peer
    • One rapidly-changing short-lived peer
  • The common approach over-selects short-lived peers.
  • XXX I plan to update this slide with animation to show how a particular measurement window selects too many short-lived peers
handling temporal causes of bias
Handling Temporal Causes of Bias
  • The common approach is intuitive but incorrect.
  • Sampling peers is the wrong goal.
  • We want to sample peer properties.
  • Therefore, vi,t and vi,t’ are distinct, even though they come from the same peer.
  • Allow sampling the same peer more than once, at different points in time.
example of avoiding bias towards short lived peers
Example of avoiding bias towards Short-Lived Peers


Long-lived peer

Short-lived peers

  • Allowing re-selecting a peer solves the problem.
  • The long-lived peer will be selected half the time, reflecting the actual state of the system.
  • Now the problem remains, how do we select a peer uniformly a random at a particular moment?
  • XXX I plan to update this slide with animation
topological causes of bias
Topological Causes of Bias
  • Goal: Select a peer uniformly at time t
  • Begin with one peer.
  • Query peers to discover neighbors.
  • Prior work uses classic graph-discovery algorithms:
    • Breadth-First Search (BFS)
    • Depth-First Search (DFS)
  • Problems with these techniques:
    • Peers are correlated by their neighbor relationship
    • Peers with higher degree are more likely to be discovered.
    • A peer can only be selected once.
  • Random walks are a promising alternative.

XXX Some kind of animation here showing the discovery process (using breadth-first search)

random walks
Random Walks
  • Basic idea of the random walk:
    • Select a neighbor randomly to explore
    • Explore that neighbor and “forget” the previous peer
    • Only two pieces of state are maintained:
      • The current peer
      • The length of the walk
    • A subset of visited peers are selected for sampling
  • The basic random walk selects a peer every r steps.
    • Graph theory suggests r≥ log(|V|).
    • Walking r steps between samples eliminates correlations.
    • Peers are selected with probability proportional to degree.
    • Peers can be selected more than once.
variations on the random walk
Variations on the Random Walk
  • Fixing the degree bias (“Degree Correction”)
    • Select a candidate peer with probability
    • Pro: Should result in uniform selection of peers
    • Con: Decreases efficiency
  • Improving efficiency (“Random Stroll”)
    • After the first r steps, select every peer instead of every r peers
    • Pro: Increases efficiency
    • Con: Introduces slight correlations
  • We simulated different techniques over two types of graphs:
    • A snapshot of the Gnutella ultrapeer topology [Stutzbach 05 IMC]
    • Random graphs (with the same number of vertices and edges as the Gnutella topology)
  • Metrics:
    • Bias: Is peer A more likely to be selected than peer B?
    • Correlation: If we select peer A, are we more likely to select peer B?
    • Efficiency: How easily can we collect a sample?
  • Techniques:
    • Oracle (uniformly random)
    • Breadth-First Search (BFS)
    • Random Walk (RW)
    • Random Walk with Degree Correction (RWDC)
    • Random Stroll (RS)
    • Random Stroll with Degree Correction (RSDC)
  • Collect k|V| samples and compare with Oracle.
  • Most peers should be selected around k times.
  • RSDC appears unbiased in both cases.
  • RWDC performs well, but exhibits slight bias on Gnutella.
  • BFS, RS, and RW are heavily biased.

Figures 1(a) and 1(b) go here

  • Even if unbiased, a technique may exhibit correlations.
  • We define a sampling session as 1,000 consecutive samples.
  • For pair (A, B), if A is selected, how often is B also selected?
  • A long tail indicates correlation.
  • RWDC and RSDC appear uncorrelated.
  • RW and RS exhibit slight correlations.
  • BFS exhibits strong correlation.

Figures 2(a) and 2(b) go here

  • The basic operation is the neighbors-query.
  • Efficiency is:
  • BFS and RS are close to 100% efficient.
    • Unfortunately, they are also heavily biased.
  • RW, RWDC, and RSDC are 2% to 8% efficient.
  • RSDC is twice as efficient as RWDC (4% vs. 2%).
  • However, even the inefficient techniques are O(log |V|).
summary of results and lessons learned
Summary of Results and Lessons Learned
  • Addressing temporal causes of bias
    • Avoid gathering a set of peers and collecting measurements in separate passes.
    • Select a peer, then collect the measurement.
    • Repeat and allow re-selecting the same peer.
  • Addressing topological causes of bias
    • Be careful to avoid bias towards high-degree.
    • Consider using a random walk or random stroll with degree correction.
ongoing work
Ongoing Work
  • This work is preliminary.
  • Additional types of random walks:
    • Weighting the selection of the next hop
  • Additional types of graphs:
    • Power-law
    • Small world
  • We have examined temporal and topological causes of bias separately.
    • To examine them concurrently, we are creating a dynamic overlay simulator.
  • XXX This slide feels too much like a laundry list