slide1
Download
Skip this Video
Download Presentation
Community Detection Algorithm and Community Quality Metric

Loading in 2 Seconds...

play fullscreen
1 / 28

Community Detection Algorithm and Community Quality Metric - PowerPoint PPT Presentation


  • 175 Views
  • Uploaded on

Community Detection Algorithm and Community Quality Metric. Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic Institute. Community Structure. Many networks display community structure

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Community Detection Algorithm and Community Quality Metric' - fauna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Community Detection Algorithm and Community Quality Metric

MingmingChen & Boleslaw K. Szymanski

Department of Computer Science

Rensselaer Polytechnic Institute

slide2
Community Structure
  • Many networks display community structure
    • Groups of nodes within which connections are denser than between them

Community detection algorithms

Community quality metrics

slide3
Two Related Community Detection Topics
  • Community detection algorithm
    • LabelRank: a stabilized label propagation community detection algorithm
    • LabelRankT: extended algorithm for dynamic networks based on LabelRank
  • A new community quality metric solving two problems of Modularity
    • M. E. J. Newman, 2006;
    • Newman and Girvan, 2004.

Xie and Symanski, 2013.

Xie, Chen, and Symanski, 2013.

slide4
LabelRank Algorithm
  • Four operators applied to the labels
    • Label propagation operator
    • Inflation operator
    • Cutoff operator
    • Conditional update operator

No

No

Question: NP=P ?

Node 1: No;

Node 2: No;

Node 3: No;

Node 4: Yes.

2

3

1

1

No

1

1

97

1

P1 (No)=3/100;

P1 (Yes)=97/100.

P1 (No)=3/4;

P1 (Yes)=1/4.

Yes

4

Node 1: No.

Node 1: Yes.

slide5
Label Propagation Operator
  • where W is the n x n weighted adjacent matrix. P is the n x n label probability distribution matrix which is composed of n (1 x n) row vectors Pi, one for each node
  • Each element Pi(c) holds the current estimation of probability of node i observing label , where C is the set of labels (here, suppose C={1, 2, …, n})
      • Ex. Pi=(0.1, 0.2, …, 0.05, …)
  • To initialize P, each node is assigned a distribution of probabilities of all incoming edges
slide6
Label Propagation Operator
  • Each node receives the label probability distribution from its neighbors and computes the new distribution

P3= (0.25, 0, 0.25, 0, 0, 0, 0.25, 0.25, 0, 0)

P1= (0.25, 0.25, 0.25, 0.25, 0, 0, 0, 0, 0, 0)

P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625)

P2= (0.25, 0.25, 0, 0, 0.25, 0.25, 0, 0, 0, 0)

P4= (0.25, 0, 0, 0.25, 0, 0, 0, 0, 0.25, 0.25)

slide7
Inflation Operator
  • Each element Pi(c) rises to the inthpower:
  • It increases probabilities of labels with high probability but decreases that of labels with low probabilities during label propagation.

P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625)

P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)

slide8
Cutoff Operator
  • The cutoff operator on P removes labels that are below the threshold with the help from Inflation Operator that decreases probabilities of labels with low probabilities during propagation.
  • efficiently reduces the space complexity from quadratic to linear.

P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)

With r = 0.1, the average number of labels in each node is less than 3.

P1= (0.129)

slide9
Conditional Update Operator
  • At each iteration, it updates a node i only when it is significantly different from its incoming neighbors in terms of labels:
    • where is the set of maximum probability labels at node i at the last step. returns 1 if and 0 otherwise. ki is the node degree and q∈[0,1].
  • isSubset can be viewed as a measure of similarity between two nodes.
slide11
Running time of LabelRank
  • O(Tm): m is the number of edges and T is the number of iterations.

LabelRank is a linear algorithm

slide13
LabelRankT
  • It is a LabelRank with one extra conditional update rule by which only nodes involved changes will be updated. Changes are handled by comparing neighbors of node i at two consecutive steps, and .
slide14
Two Problems of Modularity Maximization
  • Split large communities
    • Favor small communities
  • Resolution limit problem
    • Modularity optimization may fail to discover communities smaller than a scale even in cases where communities are unambiguously defined.
    • This scale depends on the total number of edges in the network and the degree of interconnectedness of the communities.
    • Favor large communities

Fortunato et al, 2008; Li et al, 2008; Arenas et al, 2008; Berry et al, 2009; Good et al, 2010; Ronhovde et al, 2010; Fortunato, 2010; Lancichinetti et al, 2011; Traag et al, 2011; Darst et al, 2013.

slide15
Modularity
  • Modularity (Q): the fraction of edges falling within communities minus the expected value in an equivalent network with edges placed at random
  • Equivalent definition

M. E. J. Newman, 2006.

Newman and Girvan, 2004.

slide16
Modularity with Split Penalty
  • Modularity (Q): the modularity of the community detection result
  • Split penalty (SP): the fraction of edges that connect nodes of different communities
  • Qs= Q – SP: solving the problem, favoring small communities, of Modularity
slide17
Qs with Community Density
  • Resolution limit: Modularity optimization may fail to detect communities smaller than a scale
  • Intuitively, put density into Modularity and Split Penalty to solve the resolution limit problem
  • Equivalent definition
slide24
Example of One Complete Graph

Community Quality on a complete graph with 8 nodes

slide26
5-clique Example

∆Qs=(0.8424-0.7848)=0.0576 > ∆Q=(0.8879-0.8758)=0.0121

slide27
Thanks!

Q & A

ad