1 / 34

Stochastic Models of User-Contributory Web Sites

This presentation focuses on the use of stochastic models to understand and predict the behavior of user-participatory web sites, such as Digg. The models enable the prediction of trends and behaviors, as well as the design of effective web sites and incentives for users. The presentation includes an illustration of a stochastic model of Digg's voting behavior.

greenlee
Download Presentation

Stochastic Models of User-Contributory Web Sites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Stochastic Models of User-Contributory Web Sites Tad Hogg HP Labs Kristina Lerman USC Information Sciences Institute

  2. The Social Web Bugzilla essembly delicious “wisdom of crowds”

  3. Activities • View existing content • Rate existing content • simple: vote • complex: write a review • Add new content • Link to other users focus of this presentation

  4. Aggregate group behavior • Determines structure and usefulness of user-participatory sites • Models enable • Predicting trends or behaviors • E.g., which newly contributed content will become popular • Designing web sites • E.g., productive information displays • Altering user incentives • E.g., improve content quality or participation

  5. Stochastic Modeling summary • Start with individual user behavior • Specify states and transitions between states • Determine collective behavior • Aggregate behavior of interest • Individual user behaviors create transitions among aggregate states • Rate equations give dynamics • How average collective behavior changes in time • How collective behavior depends on user characteristics

  6. Illustration – Stochastic Model of Digg • Phenomenology of Digg • Users submit and vote on news stories • Digg promotes popular stories to front page • Digg allows social networking • Users can designate Friends • and view their friends’ activity on Digg • Directed social network • Friends of user A are everyone A is watching • Fans of A are all users who are watching A Alice’s friend Bob Alice Bob’s fan

  7. Lifecycle of a story • User submits a story to the Upcoming Stories queue • Others vote on (digg) the story • If story accumulates enough votes in short time, it is promoted to the Front page • The Friends Interface lets users see • Stories friends submitted • Stories friends voted on, …

  8. Model of Digg voting behavior • Stochastic model based on Digg user interface • visibility and interestingness  votes • Extension to prior model: [Lerman 2007] • “law of surfing” for viewing web pages [Huberman et al, 1998] • instead of geometric distribution • incremental average growth in number of voters’ fans • i.e., people who can see story via friends interface • Related work: aggregate phenomenological models • behavior for Digg, Wikipedia, YouTube, …. • e.g., [Wu & Huberman 2007; Crane & Sornette 2008; Wilkinson 2008]

  9. see the story? user comes to Digg vote on the story? yes Voting on stories • combination of • visibility: does user see the story? • user interface • browse • recommended by friends • search • interest: does user like the story? • novelty, …

  10. Story location • Digg shows stories as lists • most recent first • 15 stories per page • user must click to view subsequent pages • visibility decreases with distance from top of list • A given story • moves down the list as new stories added • eventually moves to later pages • switches from upcoming to top of front page if promoted

  11. visibility interest User behavioral model upcomingq upcoming1 … r c n r front1 frontp Ø … vote wS r friends

  12. Dynamical model of aggregate behavior • How number of votes Nvote(t) for a story changes • nf - rate users find story on the front page queue • nu - rate users find story on the upcoming stories queue • nfriends - rate users find story through the friends interface • r – fraction of users who see the story choose to vote for it visibility

  13. Estimating model parameters • Need model parameters for • Story visibility • Story interestingness • Estimate from behavior of sample of users

  14. Digg data set • Stories from front and upcoming pages • number of votes vs. time since submission • for several days in May 2006 • prior to availability of Digg API • sampled more extensively from front than upcoming pages • Number of fans for active users • 2152 stories with at least 4 observations • submitted by 1212 distinct users • 510 of these stories promoted to front page

  15. Story visibility • User viewing behavior not available: • which stories users look at • how they find stories • front page, friends interface, … • Estimate indirectly from models & data

  16. Modeling story visibility • Story location • Navigating web sites • Number of fans

  17. upcoming q(t) front page p(t) Story location vs. time in each list • For upcoming and front page lists: • location on page (1 to 15), which page (1st, 2nd, …) • distance from top of list increases linearly with time • Rate story position increases: • front page: ~0.2 pages/hr • upcoming: ~4 pages/hr • 1/15th the rates new stories are • promoted to front page (~3/hr) • submitted as new stories (~60/hr) • since each page holds 15 stories • Averages over hourly variation • [Szabo & Huberman 2008] examples

  18. Story location: promotion to front page • Digg promotion decision algorithm not public • based on popularity expressed by user votes • Approximation from data: • story promoted if • at least 40 votes within 24 hours of submission

  19. Modeling story visibility • Story location • Navigating web sites • Number of fans

  20. Navigating through a web site • Empirical model of user following links on a Web site • “law of surfing” [Huberman et al. 1998] • Inverse Gaussian distribution of #pages viewed before leaving web site few users go beyond 1st page parameters estimated from Digg data & model

  21. Modeling story visibility • Story location • Navigating web sites • Number of fans: visibility via friends interface

  22. Story visibility via friends interface • Each voter enables their fans to see story • via friends interface • Model of number of fans not yet viewing story, s(t) • based on number of votes on the story • story visible to submitter’s fans at submission time: s(0) fans of prior voters visit Digg new fans from new votes

  23. Story interestingness • Reasons users vote for story not available, e.g., • topic • novelty [Wu & Huberman 2007] • popularity (determining interest, not just visibility) • e.g., “cool” fashion or gadgets • … • One approach: web-based experiments • e.g., [Salganik et al. 2006] • Estimate from models & data • from vote history after accounting for visibility

  24. Model results

  25. Solutions: votes vs. time model vs. observations for 6 stories • model captures qualitative features • slow growth initially • influence of fans on promotion • rapid growth if story promoted (much more visible to users)

  26. promotion time number of votes number of fans not yet seeing story 40-vote promotion threshold Model: requirements for promotion • Values of S and r to get the story on front page

  27. Promotion to front page: model prediction vs. data: 95% accurate promotion threshold from model logarithmic scale most stories not promoted, and from people with no fans

  28. Additional model insights • Heterogeneity • users activity • content quality (“interestingness”) • Predictability from early reactions to new story

  29. quantile-quantile plot shows good fit lognormal fit distribution of estimated interestingness values good fit with Kolmogorov-Smirnov test Story interestingness • Long-tail distribution (lognormal) • a few stories much more interesting than average • after accounting for visibility via user interface part of model • Open question: why? • A multiplicative process underlying user interests?

  30. Predictions from early behavior • Estimate story interestingness • from full history, or • using initial votes • Behavior predictable from early reaction to story • also with YouTube • e.g., [Crane & Sornette 2008; Lerman & Galstyan 2008; Szabo & Huberman 2008] example: use first 4 observations r estimates correlate 0.9 with those based on full history prediction of final votes account for 75% of variance rms prediction error: 244 votes

  31. see the story? user comes to Digg vote on the story? yes Model based on votes only? • Estimate based on initial votes only • not including visibility model • i.e., ignore effects of ‘law of surfing’ and social network

  32. Model based on votes only? full model is better than not including visibility (differences significant, p-value <10-4)

  33. Future work on models of activities: new content & links • View existing content • Rate existing content • Add new content • What motivates high-quality contribution? • Link to other users • How do users chose who to link to? • What does link signify? • common interests? • trust in recommendations? focus of this presentation

  34. Conclusion • Stochastic process approach • connect user and system behaviors • Applicability: • users have limited information and actions • limited use of personalized history • e.g., user communities on the web • not face-to-face small group interactions • Example: news aggregator Digg • votes from visibility + interestingness • user model from info and actions provided by Digg UI

More Related