Information Cascades

Information Cascades

Cascades • Information/behavior spreading through a network • Useful for studying • Actual viral contagion • Technology diffusion, adoption of new products • Cascading failures (e.g. power grids) • Spread of information/rumor, viral marketing

How to model diffusion • Initial models • Assumed that everyone has global knowledge of what fraction has adopted • First mathematical models for local information • [Schelling '70/'78, Granovetter '78] • Large body of subsequent work: • [Rogers '95, Valente '95, Wasserman/Faust '94] • Probabilistic models • with each neighbor that has the contagion, with some prob. the user could have it too • Ex: disease • Decision based models • Each neighbor typically has their own threshold. Makes decision based on how many neighbors have contagion. • Ex: adopting a product; Joining demonstrations

Decision based model: two states A, B • Payoff for two linked nodes (x, y) • Both nodes play A => (a, a). Both play B => (b, b). Else (0, 0) • In a large network, consider each node playing this game with each of its neighbors • Assume infinite graph • initialization is some mix of A and B • When will any node x choose B over A? • q = a/(a+b) • when fraction of neighbors playing B is > q*d(x)

Definitions • Starting with set S, continue the above process k times • be the current set of nodes adopting B • Non-progressive • Nodes can switch back: • Progressive • Nodes cannot switch back: • Contagion threshold of a graph • the maximum q for which there exists an infinite cascade • is a property of the graph only

Simple example • q = ½ (break ties by adopting B) • Case 1: S={0} adopts B • {0} -> {-1, 1} -> {0, 2, -2} -> … • Case 2: S={-1, 0, 1} starts with B • {-1, 0, 1} -> ?? • For S = {0,1}? • Contagion threshold for G is ½ • why?

Contagion Threshold • Do the progressive and non-progressive models have different thresholds? • Can the threshold be arbitrary in [0,1]?

Contagion Threshold • Do the progressive and non-progressive models have different thresholds? • Nope [Mor00] • Can the threshold be arbitrary in [0,1]? • It is always <= ½ for any graph G [Mor00]

Progressive vsNonprogressive:sketch • S = contagious wrt q in progressive model • Build T that is contagious; S1 = S + neighborhood of S • T is “robust” enough that the non-progressive model proceeds to infinity • through induction prove that henceforth the two processes are identical S S1

Contagion Threshold • For any graph G, threshold <= ½ • Suppose not, and S is contagious for q > ½ in G • For any set X, define potential(X) = outgoing-degree(X) • Claim: potential of active set decreases at every step • Only nodes that switch have majority of neighbors in active set • Can only decrease a finite amount of times. Hence finite steps! X

Viral marketing • Optimization formulation • Bounded marketing budget, how to spend it best • Want to utilize “network effects” • At least two different variants • Pay a small set of users to start a cascade. Maybe their friends will listen to them? • How to choose this set of users? • Offer incentives to whoever buys, if they recommend to their friends

Viral marketing: empirical Study I(Leskovec,Adamic, Huberman) • Recommendation incentive variant • Online store data on various categories (DVD, books, cds..) • 16 M recommendations • 4M users, 0.5M items • users who buy items can recommend to friends • both users get discount if results in buys • Some data issues regarding observed reward • sometimes inferred

DVD recommendation • Majority does not cause purchases (only 7% does) • Many star patterns and disconnected components • Giant component has 19% of nodes • Cascades form by chains of recommend-buy-recommend

Multiple recommendations • Latter recommendations matter less (on avg) • recall similar result on group affiliation in LiveJournal • We only see user receive recommendation and then purchase product • Do not know: • How long it took to act • Whether there were other effects • When did user become aware of friend’s recommendations • Is the average representative of individual users?

Other observations • Success depends most on the type of product • Books : rate 3%; DVDs: 7%. Anime DVDs: 29% • Sending more recommendations does result in more purchases (dvds) • Strategy of what a user should do to maximize reward incentive • However, repeated recommendations to one person causes decrease in success probability

Seeding variant:Finding good set of seeds • If we select a small set of nodes that are paid to spread information, how should we select them ? • Heuristics methods: • degree, random, some “centrality” notion? • Need a little more stylized influence model [KKT’03] • Suppose f(S) is the set of nodes reached when cascade starts with S

Linear Threshold Model • A node v has random threshold θv ~ U[0,1] • A node v is influenced by each neighbor w according to a weight bvwsuch that • A node v becomes active when at least (weighted) θv fraction of its neighbors are active

Independent Cascade Model • When node v becomes active, it has a single chance of activating each currently inactive neighbor w. • The activation attempt succeeds with probability pvw

Submodularity • fis submodular if • Example: C1, C2,…Cnare sets • is submodular • Bad news: maximizing f(S), when submodular is NP hard • Note: f(S) is actually the expected number of nodes reached

Good News • When monotone, we can use Greedy Algorithm! • Start with an empty set S • For k iterations: Add node v to S that maximizes f(S +v) - f(S). • How good (bad) it is? • Theorem: The greedy algorithm is a (1 – 1/e) approximation. • The resulting set S activates at least (1- 1/e) > 63% of the number of nodes that any size-k set S could activate.

Key 1: Prove submodularity

0.6 0.2 0.2 0.3 0.1 0.4 0.5 0.3 0.5 Submodularity for Independent Cascade • Coins for edges are flipped during activation attempts.

Submodularity for Independent Cascade 0.6 • Coins for edges are flipped during activation attempts. • Can pre-flip all coins and reveal results immediately. 0.2 0.2 0.3 0.1 0.4 0.5 0.3 0.5 • Active nodes in the end are reachable via green paths from initially targeted nodes. • Study reachability in green graphs

Submodularity, Fixed Graph • Fix “green graph” G. g(S) are nodes reachable from S in G. • Submodularity: • g(T +v) - g(T) <= g(S +v) - g(S) when S T. • g(S +v) - g(S): nodes reachable from S + v, but not from S. • From the picture: g(T +v) - g(T) <= g(S +v) - g(S) when • S T

Submodularity of the Function Fact: A non-negative linear combination of submodular functions is submodular • gG(S): nodes reachable from S in G. • Each gG(S): is submodular (previous slide). • Probabilities are non-negative.

Submodularity for Linear Thresholds • Use similar “green graph” idea. • Once a graph is fixed, “reachability” argument is identical. • How do we fix a green graph now? • Each node picks at most one incoming edge, with probabilities proportional to edge weights. • Equivalent to independent cascade model (trickier proof).

Evaluating f(S) • How to evaluate ƒ(S)? • Still an open question of how to compute efficiently • But: very good estimates by simulation • repeating the diffusion process often enough (polynomial in n; 1/ε) • Achieve (1± ε)-approximation to f(S). • Generalization of Nemhauser/Wolsey proof shows: Greedy algorithm is now a (1-1/e- ε′)-approximation.

More in the paper [KKT’03] • More general model that captures both • Experimental results that show performance on greedy • For simulated cascades • Choosing for the non-progessive case • More realistic marketing scenarios • Likelihood of initial activation depends on amount spent

Experimental Results (KKT) • To test efficacy of greedy against other algorithms • Co-authorship data • Linear Threshold Model: multiplicity of edges as weights • weight(v→ω) = Cvw / dv, weight(ω→v) = Cwv / dw • Independent Cascade Model: • Case 1: uniform probabilities p on each edge • Case 2: edge from v to ω has probability 1/ d(w)of activating ω. • Compare with other 3 common heuristics • (in)degree centrality, distance centrality, random nodes. • Simulate the cascades a number of times…

Facebook study on contagions(Sun, Rosenn, Marlow, Lento) • Diffusion on FB • Pages “liked” by users • Diffusion happens through newsfeed • How do the cascades look like? • Distribution of sizes, connectedness • Small seed? • Any way to distinguish the seed nodes? • Dataset • sample set of pages and all associated fans • seeding variant

Cascade structure Bosnia • Large connected clusters • median page had 70% of fans in one component • second largest comp. much smaller • Multiple chains merge to form cluster Slovenia Croatia

Cascade starters • Large number of starters • 46% of entire set of users; 17% of users in largest component • belies the typical assumption that large cascades start from a small set of nodes; however, it does not say that it cannot • maximum chain length can be large ~80 • Tried to predict chain length by looking at different properties of the starter • age, gender, FB friends/activity/age, feed exposure, popularity • after controlling for popularity and friend-count, other variables do not have impact • Main takeaway • Contagions typically have lots of start-points • Looking at contagion w/o the effect of external sources is inadequate

Structure of diffusions(Goel, Watts, Goldstein’12) • Study on multiple domains • Yahoo! Kindness; Twitter; Zync; Secretary game… • Main difference with previous studies is • Multiple platforms • Takes very large number of diffusion events • Interested in studying the structure of the “typical” diffusion • Are the cascades large? • Are the trees interesting?

Structure of diffusions 1

Structure of diffusions 2

So then.. • Nice theory problems associated with simple models, but… • Empirical studies show simple viral model not accurate • Large cascades often need multiple starters • Even when propagation happens it is well approximated by one-step process • What lessons to take away? • Incorporate external channels? • Not trust cascade model too much? • Verdict is not clear yet: in some domains (e.g. RDS, computer virus infection) cascades do happen. Is there some missing characteristic? • (financial incentives in RDS, not voluntary in virus spread?)

Information Cascades

Information Cascades

Presentation Transcript

ENZYME CASCADES: BLOOD CLOTTING

Collisional Cascades

Class 10: Robustness Cascades

Cascades, Islands and Streams

Network Theory and Dynamic Systems Information Cascades

Human Dimensions of Trophic Cascades

The Cascades

Simple Game Development in Cascades

Network Theory and Dynamic Systems Information Cascades - Bayes

Vitality and Renewal In Cascades

East Cascades

Exotic Cascades at NA49

Information Cascades

Les grandes cascades du monde.

The West Cascades

Columbia Cascades Traverse (CCT)

Period Doubling Cascades

Network Theory and Dynamic Systems Information Cascades

Secondary Flows in Turbine Cascades

Theory of Turbine Cascades

The West Cascades

Development of Turbine Cascades