Modeling Information Diffusion in Networks with Unobserved Links

1 / 19

# Modeling Information Diffusion in Networks with Unobserved Links - PowerPoint PPT Presentation

Modeling Information Diffusion in Networks with Unobserved Links. Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan. Networks with unobserved links. Links help to model how information diffuses from one node to another

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Modeling Information Diffusion in Networks with Unobserved Links' - sammy

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Modeling Information Diffusion in Networks with Unobserved Links

Quang Duong Michael P. Wellman Satinder Singh

Computer Science and Engineering

University of Michigan

• Links help to model how information diffuses from one node to another
• Real-world agents/nodes have connections unobserved by third parties
Problem Overview

Given: a network (with missing links) and snapshots of the network states over time.

Objective:model information diffusions on networks

We examine two different approaches:

• Learning the underlying network, upon which a diffusion model is built (similar to some previous work’s approach)
• Building a flexible model without learning the missing links
Problem Overview (cont.)

Formalism

• A node/agent is in state st= 1 if infected, and -1 otherwise, at timet (infection persists)
• A diffusion instance/trace srecords snapshots of the network’s states over time
• Underlying network G*
• Input network G (G* with missing edges)
• Niis the neighborhood of iin G (including i itself)

• The probability of infection is proportional with the number of infected neighbors
• The model’s parameters determine: (a) the diffusion rate and (b) the spontaneous infection rate.
Problem Summary

Capturing diffusion dynamics: log likelihood of diffusion traces L(s) Objective function

1. Structure learning approach

Learn network G’

Learn parameters for a cascade model built on G’

1. Network

G

2. A set of diffusion traces s.

(training)

Evaluation on testing sets of diffusion traces

2. Graphical model approach

Learn parameters for a graphical multiagent model built on G

MaxInf algorithm (maxC)

• Assumption: nodes can be infected by multiple neighbors, as in the cascade model
• Objective function: likelihood of traces L(s)
• Outline:
• learning model parameters after each addition that increase the objective function the most
• Repeat until the objective function starts to decrease

Related work: NetInf [Gomez-Rodriguez et al. ’10].

Approach 2: History-Dependent Graphical Multiagent Model
• hGMM [Duong, Wellman, Singh, and Vorobeychik AAMAS’10]
• Directed edges from node Nidto i: how neighbors’ past states affect i’s present state.
• Undirected edges define Niu: correlations/interdependencies among nodes the same time t.

(*) Cascade and many others assume conditional independence given history (Niucontainsiitselfonly)

(**) For simplicity, we assume Ni = Nid= Niu

Approach 2: hGMM (cont.)

Each neighborhood is associated with a potential function πithat represents the unnormalizedlikelihood of the joint statessNi

• potential of neighborhood’s joint states at t

Joint probability distribution of system’s states at time t

neighborhood-relevant abstracted history

abstracted history

Approach 2: hGMM (cont.)
• hGMMs allow reasoning about state correlations between neighbors who appear disconnected in the input graphical structure
• Example: hGMMs could use the potential function of node 2 to express correlations between nodes 1 and 3 to compensate for the missing edge (1, 3).

4

4

4

2

2

2

1

1

1

3

3

3

Approach 2: hGMM (cont.)

A. Tabular hGMM(taG): potential πi of each neighborhood is a function of 5 features:

• number of agents infected at t-1,
• number of agents becoming infected at t,
• neighborhood size,
• i’s state at t (present)
• i’s state at t-1 (past)
Approach 2: hGMM (cont.)

B. ParametrichGMM(paG): based on the cascade model and our empirical study of taG, πiisthe product of three components:

(Recalπirepresents the unnormalizedlikelihood of the joint statessNi)

• [1] probability of node i’s infection as in the cascade model
• [2] joint probability of c nodes in N’i=Ni\{i} becoming infected
• [3] joint probability of (|N’i| - c) nodes staying uninfected
Approach 2: hGMM (cont.)

Component [2]: joint probability of c nodes in N’i=Ni\{i} becoming infected

• if assuming independence of c agent states in N’i, component [2] is simply a product of infection probability of c nodes.
• If capturing the correlation among infections: component [2] is a product of infection of |c-γ|N’i|| “nodes,” where γ captures state correlations/interdependence
Empirical Study
• Generate graphs G* (random ER and preferential attachment PA) of 30 and 100 nodes
• Randomly delete 1/2 edges in creating G
• Generate cascades with the parameters learned from empirical data by Stonedahl et al. (’10);
• 2 domains: fast and normal
• Generative model (on fully observed graphs): C on G*
• Vary training data amount (25 and 100 cascades):
• paG (parametric hGMM on the given graph G): learn parameters
• maxC (cascade model with G’ learned by MaxInf): learn parameters + connections
• netC (cascade model with G’ learned by NetInf’): learn connections (given the generative model’s parameters)
Evaluation Metrics
• Capturing diffusion dynamics: log likelihood of diffusion traces Objective function
• Predicting the fraction of infected nodes: KL (skewed) divergence between the predicted and actual distributions of fractions of infected nodes
• Structural difference between the learned and actual graphs (only applicable for the structure learning approach)
Detailed Prediction Results
• Legend paG: parametric hGMM on G
• maxC: cascade model with G’ learned by MaxInf
• C: generative cascade model on G
• Model 1 vs. Model 2:
• Black: 1 outperforms 2 (p < 0.05)
• White: 2 outperforms 1 (p < 0.05)
• Grey: otherwise
• Summary: With sufficient data, paG is the best model. In some fast diffusion cases, maxC outperforms paG. C is the best model when the graph is fully observed
Aggregate Prediction Results

KL divergence: better performing models have lower divergence

Graph Results
• NetInf’ discovers more missing edges than MaxLInf, but adds more spurious edges than MaxLInf.
• paG’s learned parameters help to detect if the given network has missing edges
Conclusions

Contributions

• We introduce two solutions: learning an hGMM on the given network structure, and directly discovering the missing connections.
• Our approaches can improve prediction over existing methods in various settings with a considerable number of missing edges.

Future work

• Improve scalability (treating undirected and directed edges differently)
• Develop more systematic analysis to detect if there’re missing edges
• More effective interleaving between learning graph and model parameters

THANK YOU!

qduong@umich.edu

http://eecs.umich.edu/~qduong