modeling information diffusion in networks with unobserved links n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Modeling Information Diffusion in Networks with Unobserved Links PowerPoint Presentation
Download Presentation
Modeling Information Diffusion in Networks with Unobserved Links

Loading in 2 Seconds...

play fullscreen
1 / 19

Modeling Information Diffusion in Networks with Unobserved Links - PowerPoint PPT Presentation


  • 119 Views
  • Uploaded on

Modeling Information Diffusion in Networks with Unobserved Links. Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University of Michigan. Networks with unobserved links. Links help to model how information diffuses from one node to another

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Modeling Information Diffusion in Networks with Unobserved Links' - sammy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
modeling information diffusion in networks with unobserved links

Modeling Information Diffusion in Networks with Unobserved Links

Quang Duong Michael P. Wellman Satinder Singh

Computer Science and Engineering

University of Michigan

networks with unobserved links
Networks with unobserved links
  • Links help to model how information diffuses from one node to another
  • Real-world agents/nodes have connections unobserved by third parties
problem overview
Problem Overview

Given: a network (with missing links) and snapshots of the network states over time.

Objective:model information diffusions on networks

We examine two different approaches:

  • Learning the underlying network, upon which a diffusion model is built (similar to some previous work’s approach)
  • Building a flexible model without learning the missing links
problem overview cont
Problem Overview (cont.)

Formalism

  • A node/agent is in state st= 1 if infected, and -1 otherwise, at timet (infection persists)
  • A diffusion instance/trace srecords snapshots of the network’s states over time
  • Underlying network G*
  • Input network G (G* with missing edges)
  • Niis the neighborhood of iin G (including i itself)

Underlying diffusion process: cascade

  • The probability of infection is proportional with the number of infected neighbors
  • The model’s parameters determine: (a) the diffusion rate and (b) the spontaneous infection rate.
problem summary
Problem Summary

Capturing diffusion dynamics: log likelihood of diffusion traces L(s) Objective function

1. Structure learning approach

Learn network G’

Learn parameters for a cascade model built on G’

1. Network

G

2. A set of diffusion traces s.

(training)

Evaluation on testing sets of diffusion traces

2. Graphical model approach

Learn parameters for a graphical multiagent model built on G

approach 1 learning missing links
Approach 1: Learning Missing Links

MaxInf algorithm (maxC)

  • Assumption: nodes can be infected by multiple neighbors, as in the cascade model
  • Objective function: likelihood of traces L(s)
  • Outline:
    • greedily adding edges
    • learning model parameters after each addition that increase the objective function the most
    • Repeat until the objective function starts to decrease

Related work: NetInf [Gomez-Rodriguez et al. ’10].

  • Adopted version NetInf’ (netC)
approach 2 history dependent graphical multiagent model
Approach 2: History-Dependent Graphical Multiagent Model
  • hGMM [Duong, Wellman, Singh, and Vorobeychik AAMAS’10]
  • Directed edges from node Nidto i: how neighbors’ past states affect i’s present state.
  • Undirected edges define Niu: correlations/interdependencies among nodes the same time t.

(*) Cascade and many others assume conditional independence given history (Niucontainsiitselfonly)

(**) For simplicity, we assume Ni = Nid= Niu

approach 2 hgmm cont
Approach 2: hGMM (cont.)

Each neighborhood is associated with a potential function πithat represents the unnormalizedlikelihood of the joint statessNi

  • potential of neighborhood’s joint states at t

Joint probability distribution of system’s states at time t

neighborhood-relevant abstracted history

abstracted history

approach 2 hgmm cont1
Approach 2: hGMM (cont.)
  • hGMMs allow reasoning about state correlations between neighbors who appear disconnected in the input graphical structure
  • Example: hGMMs could use the potential function of node 2 to express correlations between nodes 1 and 3 to compensate for the missing edge (1, 3).

4

4

4

2

2

2

1

1

1

3

3

3

approach 2 hgmm cont2
Approach 2: hGMM (cont.)

A. Tabular hGMM(taG): potential πi of each neighborhood is a function of 5 features:

  • number of agents infected at t-1,
  • number of agents becoming infected at t,
  • neighborhood size,
  • i’s state at t (present)
  • i’s state at t-1 (past)
approach 2 hgmm cont3
Approach 2: hGMM (cont.)

B. ParametrichGMM(paG): based on the cascade model and our empirical study of taG, πiisthe product of three components:

(Recalπirepresents the unnormalizedlikelihood of the joint statessNi)

  • [1] probability of node i’s infection as in the cascade model
  • [2] joint probability of c nodes in N’i=Ni\{i} becoming infected
  • [3] joint probability of (|N’i| - c) nodes staying uninfected
approach 2 hgmm cont4
Approach 2: hGMM (cont.)

Component [2]: joint probability of c nodes in N’i=Ni\{i} becoming infected

  • if assuming independence of c agent states in N’i, component [2] is simply a product of infection probability of c nodes.
  • If capturing the correlation among infections: component [2] is a product of infection of |c-γ|N’i|| “nodes,” where γ captures state correlations/interdependence
empirical study
Empirical Study
  • Generate graphs G* (random ER and preferential attachment PA) of 30 and 100 nodes
  • Randomly delete 1/2 edges in creating G
  • Generate cascades with the parameters learned from empirical data by Stonedahl et al. (’10);
    • 2 domains: fast and normal
    • Generative model (on fully observed graphs): C on G*
  • Vary training data amount (25 and 100 cascades):
    • paG (parametric hGMM on the given graph G): learn parameters
    • maxC (cascade model with G’ learned by MaxInf): learn parameters + connections
    • netC (cascade model with G’ learned by NetInf’): learn connections (given the generative model’s parameters)
evaluation metrics
Evaluation Metrics
  • Capturing diffusion dynamics: log likelihood of diffusion traces Objective function
  • Predicting the fraction of infected nodes: KL (skewed) divergence between the predicted and actual distributions of fractions of infected nodes
  • Structural difference between the learned and actual graphs (only applicable for the structure learning approach)
detailed prediction results
Detailed Prediction Results
  • Legend paG: parametric hGMM on G
  • maxC: cascade model with G’ learned by MaxInf
  • C: generative cascade model on G
  • Model 1 vs. Model 2:
    • Black: 1 outperforms 2 (p < 0.05)
    • White: 2 outperforms 1 (p < 0.05)
    • Grey: otherwise
    • Summary: With sufficient data, paG is the best model. In some fast diffusion cases, maxC outperforms paG. C is the best model when the graph is fully observed
aggregate prediction results
Aggregate Prediction Results

KL divergence: better performing models have lower divergence

graph results
Graph Results
  • NetInf’ discovers more missing edges than MaxLInf, but adds more spurious edges than MaxLInf.
  • paG’s learned parameters help to detect if the given network has missing edges
conclusions
Conclusions

Contributions

  • We introduce two solutions: learning an hGMM on the given network structure, and directly discovering the missing connections.
  • Our approaches can improve prediction over existing methods in various settings with a considerable number of missing edges.

Future work

  • Improve scalability (treating undirected and directed edges differently)
  • Develop more systematic analysis to detect if there’re missing edges
  • More effective interleaving between learning graph and model parameters
slide19

THANK YOU!

qduong@umich.edu

http://eecs.umich.edu/~qduong