information cascades n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Information Cascades PowerPoint Presentation
Download Presentation
Information Cascades

Loading in 2 Seconds...

play fullscreen
1 / 36

Information Cascades - PowerPoint PPT Presentation


  • 146 Views
  • Uploaded on

Information Cascades. Cascades. Information/behavior spreading through a network Useful for studying Actual viral contagion Technology diffusion, adoption of new products Cascading failures (e.g. power grids) Spread of information/rumor, viral marketing. How to model diffusion.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Information Cascades' - brendy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cascades
Cascades
  • Information/behavior spreading through a network
  • Useful for studying
    • Actual viral contagion
    • Technology diffusion, adoption of new products
    • Cascading failures (e.g. power grids)
    • Spread of information/rumor, viral marketing
how to model diffusion
How to model diffusion
  • Initial models
    • Assumed that everyone has global knowledge of what fraction has adopted
  • First mathematical models for local information
    • [Schelling '70/'78, Granovetter '78]
  • Large body of subsequent work:
    • [Rogers '95, Valente '95, Wasserman/Faust '94]
  • Probabilistic models
    • with each neighbor that has the contagion, with some prob. the user could have it too
    • Ex: disease
  • Decision based models
    • Each neighbor typically has their own threshold. Makes decision based on how many neighbors have contagion.
    • Ex: adopting a product; Joining demonstrations
decision based model two states a b
Decision based model: two states A, B
  • Payoff for two linked nodes (x, y)
    • Both nodes play A => (a, a). Both play B => (b, b). Else (0, 0)
  • In a large network, consider each node playing this game with each of its neighbors
  • Assume infinite graph
    • initialization is some mix of A and B
  • When will any node x choose B over A?
    • q = a/(a+b)
    • when fraction of neighbors playing B is > q*d(x)
definitions
Definitions
  • Starting with set S, continue the above process k times
    • be the current set of nodes adopting B
  • Non-progressive
    • Nodes can switch back:
  • Progressive
    • Nodes cannot switch back:
  • Contagion threshold of a graph
    • the maximum q for which there exists an infinite cascade
    • is a property of the graph only
simple example
Simple example
  • q = ½ (break ties by adopting B)
  • Case 1: S={0} adopts B
    • {0} -> {-1, 1} -> {0, 2, -2} -> …
  • Case 2: S={-1, 0, 1} starts with B
    • {-1, 0, 1} -> ??
    • For S = {0,1}?
  • Contagion threshold for G is ½
    • why?
contagion threshold
Contagion Threshold
  • Do the progressive and non-progressive models have different thresholds?
  • Can the threshold be arbitrary in [0,1]?
contagion threshold1
Contagion Threshold
  • Do the progressive and non-progressive models have different thresholds?
    • Nope [Mor00]
  • Can the threshold be arbitrary in [0,1]?
    • It is always <= ½ for any graph G [Mor00]
progressive vs nonprogressive sketch
Progressive vsNonprogressive:sketch
  • S = contagious wrt q in progressive model
  • Build T that is contagious; S1 = S + neighborhood of S
  • T is “robust” enough that the non-progressive model proceeds to infinity
    • through induction prove that henceforth the two processes are identical

S

S1

contagion threshold2
Contagion Threshold
  • For any graph G, threshold <= ½
  • Suppose not, and S is contagious for q > ½ in G
  • For any set X, define potential(X) = outgoing-degree(X)
  • Claim: potential of active set decreases at every step
    • Only nodes that switch have majority of neighbors in active set
    • Can only decrease a finite amount of times. Hence finite steps!

X

viral marketing
Viral marketing
  • Optimization formulation
    • Bounded marketing budget, how to spend it best
    • Want to utilize “network effects”
  • At least two different variants
    • Pay a small set of users to start a cascade. Maybe their friends will listen to them?
      • How to choose this set of users?
    • Offer incentives to whoever buys, if they recommend to their friends
viral marketing empirical study i leskovec adamic huberman
Viral marketing: empirical Study I(Leskovec,Adamic, Huberman)
  • Recommendation incentive variant
  • Online store data on various categories (DVD, books, cds..)
    • 16 M recommendations
    • 4M users, 0.5M items
    • users who buy items can recommend to friends
    • both users get discount if results in buys
  • Some data issues regarding observed reward
    • sometimes inferred
dvd recommendation
DVD recommendation
  • Majority does not cause purchases (only 7% does)
  • Many star patterns and disconnected components
  • Giant component has 19% of nodes
  • Cascades form by chains of recommend-buy-recommend
multiple recommendations
Multiple recommendations
  • Latter recommendations matter less (on avg)
  • recall similar result on group affiliation in LiveJournal
  • We only see user receive recommendation and then purchase product
  • Do not know:
    • How long it took to act
    • Whether there were other effects
    • When did user become aware of friend’s recommendations
    • Is the average representative of individual users?
other observations
Other observations
  • Success depends most on the type of product
    • Books : rate 3%; DVDs: 7%. Anime DVDs: 29%
  • Sending more recommendations does result in more purchases (dvds)
    • Strategy of what a user should do to maximize reward incentive
  • However, repeated recommendations to one person causes decrease in success probability
seeding variant finding good set of seeds
Seeding variant:Finding good set of seeds
  • If we select a small set of nodes that are paid to spread information, how should we select them ?
  • Heuristics methods:
    • degree, random, some “centrality” notion?
  • Need a little more stylized influence model [KKT’03]
  • Suppose f(S) is the set of nodes reached when cascade starts with S
linear threshold model
Linear Threshold Model
  • A node v has random threshold θv ~ U[0,1]
  • A node v is influenced by each neighbor w according to a weight bvwsuch that
  • A node v becomes active when at least

(weighted) θv fraction of its neighbors are active

independent cascade model
Independent Cascade Model
  • When node v becomes active, it has a single chance of activating each currently inactive neighbor w.
  • The activation attempt succeeds with probability pvw
submodularity
Submodularity
  • fis submodular if
  • Example: C1, C2,…Cnare sets
    • is submodular
  • Bad news: maximizing f(S), when submodular is NP hard
  • Note: f(S) is actually the expected number of nodes reached
good news
Good News
  • When monotone, we can use Greedy Algorithm!
    • Start with an empty set S
    • For k iterations:

Add node v to S that maximizes f(S +v) - f(S).

  • How good (bad) it is?
    • Theorem: The greedy algorithm is a (1 – 1/e) approximation.
    • The resulting set S activates at least (1- 1/e) > 63% of the number of nodes that any size-k set S could activate.
submodularity for independent cascade

0.6

0.2

0.2

0.3

0.1

0.4

0.5

0.3

0.5

Submodularity for Independent Cascade
  • Coins for edges are flipped during activation attempts.
submodularity for independent cascade1
Submodularity for Independent Cascade

0.6

  • Coins for edges are flipped during activation attempts.
  • Can pre-flip all coins and reveal results immediately.

0.2

0.2

0.3

0.1

0.4

0.5

0.3

0.5

  • Active nodes in the end are reachable via green paths from initially targeted nodes.
  • Study reachability in green graphs
submodularity fixed graph
Submodularity, Fixed Graph
  • Fix “green graph” G. g(S) are nodes reachable from S in G.
  • Submodularity:
  • g(T +v) - g(T) <= g(S +v) - g(S) when S T.
  • g(S +v) - g(S): nodes reachable from S + v, but not from S.
  • From the picture: g(T +v) - g(T) <= g(S +v) - g(S) when
  • S T
submodularity of the function
Submodularity of the Function

Fact: A non-negative linear combination of submodular functions is submodular

  • gG(S): nodes reachable from S in G.
  • Each gG(S): is submodular (previous slide).
  • Probabilities are non-negative.
submodularity for linear thresholds
Submodularity for Linear Thresholds
  • Use similar “green graph” idea.
  • Once a graph is fixed, “reachability” argument is identical.
  • How do we fix a green graph now?
  • Each node picks at most one incoming edge, with probabilities proportional to edge weights.
  • Equivalent to independent cascade model (trickier proof).
evaluating f s
Evaluating f(S)
  • How to evaluate ƒ(S)?
  • Still an open question of how to compute efficiently
  • But: very good estimates by simulation
    • repeating the diffusion process often enough (polynomial in n; 1/ε)
    • Achieve (1± ε)-approximation to f(S).
  • Generalization of Nemhauser/Wolsey proof shows: Greedy algorithm is now a (1-1/e- ε′)-approximation.
more in the paper kkt 03
More in the paper [KKT’03]
  • More general model that captures both
  • Experimental results that show performance on greedy
    • For simulated cascades
  • Choosing for the non-progessive case
  • More realistic marketing scenarios
    • Likelihood of initial activation depends on amount spent
experimental results kkt
Experimental Results (KKT)
  • To test efficacy of greedy against other algorithms
  • Co-authorship data
  • Linear Threshold Model: multiplicity of edges as weights
    • weight(v→ω) = Cvw / dv, weight(ω→v) = Cwv / dw
  • Independent Cascade Model:
    • Case 1: uniform probabilities p on each edge
    • Case 2: edge from v to ω has probability 1/ d(w)of activating ω.
  • Compare with other 3 common heuristics
    • (in)degree centrality, distance centrality, random nodes.
  • Simulate the cascades a number of times…
facebook study on contagions sun rosenn marlow lento
Facebook study on contagions(Sun, Rosenn, Marlow, Lento)
  • Diffusion on FB
    • Pages “liked” by users
    • Diffusion happens through newsfeed
  • How do the cascades look like?
    • Distribution of sizes, connectedness
    • Small seed?
    • Any way to distinguish the seed nodes?
  • Dataset
    • sample set of pages and all associated fans
    • seeding variant
cascade structure
Cascade structure

Bosnia

  • Large connected clusters
    • median page had 70% of fans in one component
    • second largest comp. much smaller
  • Multiple chains merge to form cluster

Slovenia

Croatia

cascade starters
Cascade starters
  • Large number of starters
    • 46% of entire set of users; 17% of users in largest component
    • belies the typical assumption that large cascades start from a small set of nodes; however, it does not say that it cannot
    • maximum chain length can be large ~80
  • Tried to predict chain length by looking at different properties of the starter
    • age, gender, FB friends/activity/age, feed exposure, popularity
    • after controlling for popularity and friend-count, other variables do not have impact
  • Main takeaway
    • Contagions typically have lots of start-points
    • Looking at contagion w/o the effect of external sources is inadequate
structure of diffusions goel watts goldstein 12
Structure of diffusions(Goel, Watts, Goldstein’12)
  • Study on multiple domains
    • Yahoo! Kindness; Twitter; Zync; Secretary game…
  • Main difference with previous studies is
    • Multiple platforms
    • Takes very large number of diffusion events
  • Interested in studying the structure of the “typical” diffusion
    • Are the cascades large?
    • Are the trees interesting?
so then
So then..
  • Nice theory problems associated with simple models, but…
  • Empirical studies show simple viral model not accurate
    • Large cascades often need multiple starters
    • Even when propagation happens it is well approximated by one-step process
  • What lessons to take away?
    • Incorporate external channels?
    • Not trust cascade model too much?
    • Verdict is not clear yet: in some domains (e.g. RDS, computer virus infection) cascades do happen. Is there some missing characteristic?
      • (financial incentives in RDS, not voluntary in virus spread?)