Measuring OSNs:Things I’d Like to Know Nick FeamsterGeorgia Tech
Why Measure Social Networks? • Trustworthy Applications • Secure Channels [Authenticatr, Lockr] • Spam filters and whitelists [Re:, LineUp] • Automated backup systems [Friendstore] • Anti-censorship [Anti-Blocker] • Advertising and Relationship Management • Real-world Social Networking • Real-world socializing [Serendipity, aka-aki] • Public health applications
What We Need to Know • Structure: Where are links/nodes in the graph? • Semantics: What does a “link” imply? • Visibility: Are there unknown links? • Dynamics: How do graphs evolve? • Invariants: (How) do OSNs differ? Sounds familiar…
Structure • Problem: Where are links/edges in the graph? • Application specific metrics are more interesting than high-level properties • Example #1: Anti-censorship • Want to find the existence of “rings” in the social network topology • The graph structure will determine what we can use for a “deniable” clickstream • Example #2: Collaborative measurement • Graph structure determines vantage points/nework graph that each user has
Semantics • Problem: In a social network, what determines weight/trust? • Frequency of communication • Type of communication • Common interests • Some other graphs: the semantics are more clear because there is a notion of “weight” • Links may not directly reflect network behavior • What are the sources/catalysts for link formation? • Getting Closer or Drifting Apart? Mobius et al.
Visibility • Problem: How complete are graph measurements? • Many social networks prevent “scraping” • Aspects of profile are restricted/not public • May make it difficult to see some “links” • This sounds familiar, too: Analogous to hidden peering links in AS graph?
Real-world Interactions Evolution of OSN Graph Dynamics • Problem: How does the network evolve over time? • Serendipity Project • Real-world interactions create links in social graph • New OSN links create interactions in the real world • Challenges: • Understanding graph evolution may rely on exogenous factors that are difficult to measure
Invariants • What constitutes a “representative” data set? • Graph properties may vary by application (PGP keys, email, Facebook, YouTube, etc.) • Suppose that you are an advertiser, application builder, etc. • What conclusions can be drawn from a measurement study on one social network?
Can We Avoid Repeating Mistakes? • Separation of exogenous factors • Explanatory/evocative models • Exploration of why certain links form • Impact on applications • Closing the loop • Effects on real-world behavior