- 141 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Information Cascades' - brendy

Download Now**An Image/Link below is provided (as is) to download presentation**

Download Now

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Cascades

- Information/behavior spreading through a network
- Useful for studying
- Actual viral contagion
- Technology diffusion, adoption of new products
- Cascading failures (e.g. power grids)
- Spread of information/rumor, viral marketing

How to model diffusion

- Initial models
- Assumed that everyone has global knowledge of what fraction has adopted
- First mathematical models for local information
- [Schelling '70/'78, Granovetter '78]
- Large body of subsequent work:
- [Rogers '95, Valente '95, Wasserman/Faust '94]
- Probabilistic models
- with each neighbor that has the contagion, with some prob. the user could have it too
- Ex: disease
- Decision based models
- Each neighbor typically has their own threshold. Makes decision based on how many neighbors have contagion.
- Ex: adopting a product; Joining demonstrations

Decision based model: two states A, B

- Payoff for two linked nodes (x, y)
- Both nodes play A => (a, a). Both play B => (b, b). Else (0, 0)
- In a large network, consider each node playing this game with each of its neighbors
- Assume infinite graph
- initialization is some mix of A and B
- When will any node x choose B over A?
- q = a/(a+b)
- when fraction of neighbors playing B is > q*d(x)

Definitions

- Starting with set S, continue the above process k times
- be the current set of nodes adopting B
- Non-progressive
- Nodes can switch back:
- Progressive
- Nodes cannot switch back:
- Contagion threshold of a graph
- the maximum q for which there exists an infinite cascade
- is a property of the graph only

Simple example

- q = ½ (break ties by adopting B)
- Case 1: S={0} adopts B
- {0} -> {-1, 1} -> {0, 2, -2} -> …
- Case 2: S={-1, 0, 1} starts with B
- {-1, 0, 1} -> ??
- For S = {0,1}?
- Contagion threshold for G is ½
- why?

Contagion Threshold

- Do the progressive and non-progressive models have different thresholds?
- Can the threshold be arbitrary in [0,1]?

Contagion Threshold

- Do the progressive and non-progressive models have different thresholds?
- Nope [Mor00]
- Can the threshold be arbitrary in [0,1]?
- It is always <= ½ for any graph G [Mor00]

Progressive vsNonprogressive:sketch

- S = contagious wrt q in progressive model
- Build T that is contagious; S1 = S + neighborhood of S
- T is “robust” enough that the non-progressive model proceeds to infinity
- through induction prove that henceforth the two processes are identical

S

S1

Contagion Threshold

- For any graph G, threshold <= ½
- Suppose not, and S is contagious for q > ½ in G
- For any set X, define potential(X) = outgoing-degree(X)
- Claim: potential of active set decreases at every step
- Only nodes that switch have majority of neighbors in active set
- Can only decrease a finite amount of times. Hence finite steps!

X

Viral marketing

- Optimization formulation
- Bounded marketing budget, how to spend it best
- Want to utilize “network effects”
- At least two different variants
- Pay a small set of users to start a cascade. Maybe their friends will listen to them?
- How to choose this set of users?
- Offer incentives to whoever buys, if they recommend to their friends

Viral marketing: empirical Study I(Leskovec,Adamic, Huberman)

- Recommendation incentive variant
- Online store data on various categories (DVD, books, cds..)
- 16 M recommendations
- 4M users, 0.5M items
- users who buy items can recommend to friends
- both users get discount if results in buys
- Some data issues regarding observed reward
- sometimes inferred

DVD recommendation

- Majority does not cause purchases (only 7% does)
- Many star patterns and disconnected components
- Giant component has 19% of nodes
- Cascades form by chains of recommend-buy-recommend

Multiple recommendations

- Latter recommendations matter less (on avg)
- recall similar result on group affiliation in LiveJournal
- We only see user receive recommendation and then purchase product
- Do not know:
- How long it took to act
- Whether there were other effects
- When did user become aware of friend’s recommendations
- Is the average representative of individual users?

Other observations

- Success depends most on the type of product
- Books : rate 3%; DVDs: 7%. Anime DVDs: 29%
- Sending more recommendations does result in more purchases (dvds)
- Strategy of what a user should do to maximize reward incentive
- However, repeated recommendations to one person causes decrease in success probability

Seeding variant:Finding good set of seeds

- If we select a small set of nodes that are paid to spread information, how should we select them ?
- Heuristics methods:
- degree, random, some “centrality” notion?
- Need a little more stylized influence model [KKT’03]
- Suppose f(S) is the set of nodes reached when cascade starts with S

Linear Threshold Model

- A node v has random threshold θv ~ U[0,1]
- A node v is influenced by each neighbor w according to a weight bvwsuch that
- A node v becomes active when at least

(weighted) θv fraction of its neighbors are active

Independent Cascade Model

- When node v becomes active, it has a single chance of activating each currently inactive neighbor w.
- The activation attempt succeeds with probability pvw

Submodularity

- fis submodular if
- Example: C1, C2,…Cnare sets
- is submodular
- Bad news: maximizing f(S), when submodular is NP hard
- Note: f(S) is actually the expected number of nodes reached

Good News

- When monotone, we can use Greedy Algorithm!
- Start with an empty set S
- For k iterations:

Add node v to S that maximizes f(S +v) - f(S).

- How good (bad) it is?
- Theorem: The greedy algorithm is a (1 – 1/e) approximation.
- The resulting set S activates at least (1- 1/e) > 63% of the number of nodes that any size-k set S could activate.

0.2

0.2

0.3

0.1

0.4

0.5

0.3

0.5

Submodularity for Independent Cascade- Coins for edges are flipped during activation attempts.

Submodularity for Independent Cascade

0.6

- Coins for edges are flipped during activation attempts.
- Can pre-flip all coins and reveal results immediately.

0.2

0.2

0.3

0.1

0.4

0.5

0.3

0.5

- Active nodes in the end are reachable via green paths from initially targeted nodes.
- Study reachability in green graphs

Submodularity, Fixed Graph

- Fix “green graph” G. g(S) are nodes reachable from S in G.
- Submodularity:
- g(T +v) - g(T) <= g(S +v) - g(S) when S T.

- g(S +v) - g(S): nodes reachable from S + v, but not from S.
- From the picture: g(T +v) - g(T) <= g(S +v) - g(S) when
- S T

Submodularity of the Function

Fact: A non-negative linear combination of submodular functions is submodular

- gG(S): nodes reachable from S in G.
- Each gG(S): is submodular (previous slide).
- Probabilities are non-negative.

Submodularity for Linear Thresholds

- Use similar “green graph” idea.
- Once a graph is fixed, “reachability” argument is identical.
- How do we fix a green graph now?
- Each node picks at most one incoming edge, with probabilities proportional to edge weights.
- Equivalent to independent cascade model (trickier proof).

Evaluating f(S)

- How to evaluate ƒ(S)?
- Still an open question of how to compute efficiently
- But: very good estimates by simulation
- repeating the diffusion process often enough (polynomial in n; 1/ε)
- Achieve (1± ε)-approximation to f(S).
- Generalization of Nemhauser/Wolsey proof shows: Greedy algorithm is now a (1-1/e- ε′)-approximation.

More in the paper [KKT’03]

- More general model that captures both
- Experimental results that show performance on greedy
- For simulated cascades
- Choosing for the non-progessive case
- More realistic marketing scenarios
- Likelihood of initial activation depends on amount spent

Experimental Results (KKT)

- To test efficacy of greedy against other algorithms
- Co-authorship data
- Linear Threshold Model: multiplicity of edges as weights
- weight(v→ω) = Cvw / dv, weight(ω→v) = Cwv / dw
- Independent Cascade Model:
- Case 1: uniform probabilities p on each edge
- Case 2: edge from v to ω has probability 1/ d(w)of activating ω.
- Compare with other 3 common heuristics
- (in)degree centrality, distance centrality, random nodes.
- Simulate the cascades a number of times…

Facebook study on contagions(Sun, Rosenn, Marlow, Lento)

- Diffusion on FB
- Pages “liked” by users
- Diffusion happens through newsfeed
- How do the cascades look like?
- Distribution of sizes, connectedness
- Small seed?
- Any way to distinguish the seed nodes?
- Dataset
- sample set of pages and all associated fans
- seeding variant

Cascade structure

Bosnia

- Large connected clusters
- median page had 70% of fans in one component
- second largest comp. much smaller
- Multiple chains merge to form cluster

Slovenia

Croatia

Cascade starters

- Large number of starters
- 46% of entire set of users; 17% of users in largest component
- belies the typical assumption that large cascades start from a small set of nodes; however, it does not say that it cannot
- maximum chain length can be large ~80
- Tried to predict chain length by looking at different properties of the starter
- age, gender, FB friends/activity/age, feed exposure, popularity
- after controlling for popularity and friend-count, other variables do not have impact
- Main takeaway
- Contagions typically have lots of start-points
- Looking at contagion w/o the effect of external sources is inadequate

Structure of diffusions(Goel, Watts, Goldstein’12)

- Study on multiple domains
- Yahoo! Kindness; Twitter; Zync; Secretary game…
- Main difference with previous studies is
- Multiple platforms
- Takes very large number of diffusion events
- Interested in studying the structure of the “typical” diffusion
- Are the cascades large?
- Are the trees interesting?

So then..

- Nice theory problems associated with simple models, but…
- Empirical studies show simple viral model not accurate
- Large cascades often need multiple starters
- Even when propagation happens it is well approximated by one-step process
- What lessons to take away?
- Incorporate external channels?
- Not trust cascade model too much?
- Verdict is not clear yet: in some domains (e.g. RDS, computer virus infection) cascades do happen. Is there some missing characteristic?
- (financial incentives in RDS, not voluntary in virus spread?)

Download Presentation

Connecting to Server..