370 likes | 382 Views
This lecture discusses reverse sampling in models of influence diffusion, focusing on the greedy algorithm and maximizing influence spread. It also explores the challenges and computational complexity of the problem.
E N D
Lecture 2-7 Reverse Sampling Ding-Zhu Du University of Texas at Dallas
Outline • Greedy • Reverse Sampling
Models of Influence Diffusion • Two basic classes of probabilistic diffusion models: • thresholdand cascade • General operational view: • A social network is represented as a directed graph, with each person (customer) as a node. • Nodes start either active or inactive. • An active node may trigger activation of neighboring nodes • Monotonicity assumption: active nodes never deactivate.
Influence Maximization Problem • Influence spread of node set S: σ(S) • expected number of active nodes at the end of diffusion process, if set S is the initial active set. • Problem Definition (by Kempe et al., 2003): (Influence Maximization). Given a directed and edge-weighted social graph G = (V,E, p), a diffusion model m, and an integer k ≤ |V |,find a set S ⊆ V , |S| = k, such that the expected influence spread σm(S) is maximum.
Known Results • Bad news: NP-hard optimization problem for both IC and LT models. • Good news: • σm(S) is monotone and submodular. • We can use Greedy algorithm! • Theorem: The resulting set S activates at least (1-1/e) (>63%) of the expected number of nodes that any size-k set could activate .
Disadvantage • Lack of efficiency. • Computing σm(S) is # P-hard under both IC and LT models. • Selecting a new vertex u that provides the largest marginal gain σm(S+u) - σm(S), which can only be approximated by Monte-Carlo simulations (10,000 trials). • Assume a weighted social graph as input. • How to learn influence probabilities from history?
What’s running time? • Let rbe the number of samplings for computing σm(S+u) - σm(S). • It runs k iterations. • Each iteration requires estimating the expected spread of O(n) node sets S+u. • Each estimation of expected spread takes measurements on r graphs, and each measurement needs O(m) time. • Total running time O(kmnr).
Comments • Waste time on sampling because every randomly generated graph is used only once for a value of objective function
Outline • Greedy • Reverse Sampling Analysis: part 1-sampling part 2-submodular max part3-parameter
Smart Way • Step 1. Randomly generates ƟRR sets. • Step 2. Find k nodes to hit maximum number of RR sets.
Outline • Greedy • Reverse Sampling Analysis: part 1-sampling part 2-submodular max part3-parameter
Outline • Greedy • Reverse Sampling Analysis: part 1-sampling part 2-submodular max part3-parameter
Step 2. Max Coverage Given a collection C of subsets of a set E, find a subset S of E, with |S|<k, to maximize the number of subsets in C hit(covered) by S . Subsets in C = RR set S = seed set
Step 2. Max Coverage Given a collection C of subsets of a set E, find a subset S of E, with |S|<k, to maximize the number of subsets in C hit(covered) by S .
Performance Ratio Theorem (Nemhauser et al. 1978)
Theorem Proof
Outline • Greedy • Reverse Sampling Analysis: part 1-sampling part 2-submodular max part3-parameter estimation
Breath-first search (BFS) • For generation of RR set, a randomized BFS is employed.
A New Springer Journal ComputationalSocial Networks Editor-in-Chief: Ding-Zhu Du My T. Thai Welcome to Submit Papers
Markov's inequality Proof.