1 / 28

# Community Detection Algorithm and Community Quality Metric - PowerPoint PPT Presentation

Community Detection Algorithm and Community Quality Metric. Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic Institute. Community Structure. Many networks display community structure

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Community Detection Algorithm and Community Quality Metric' - fauna

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

MingmingChen & Boleslaw K. Szymanski

Department of Computer Science

Rensselaer Polytechnic Institute

• Many networks display community structure

• Groups of nodes within which connections are denser than between them

Community detection algorithms

Community quality metrics

• Community detection algorithm

• LabelRank: a stabilized label propagation community detection algorithm

• LabelRankT: extended algorithm for dynamic networks based on LabelRank

• A new community quality metric solving two problems of Modularity

• M. E. J. Newman, 2006;

• Newman and Girvan, 2004.

Xie and Symanski, 2013.

Xie, Chen, and Symanski, 2013.

LabelRank Algorithm

• Four operators applied to the labels

• Label propagation operator

• Inflation operator

• Cutoff operator

• Conditional update operator

No

No

Question: NP=P ?

Node 1: No;

Node 2: No;

Node 3: No;

Node 4: Yes.

2

3

1

1

No

1

1

97

1

P1 (No)=3/100;

P1 (Yes)=97/100.

P1 (No)=3/4;

P1 (Yes)=1/4.

Yes

4

Node 1: No.

Node 1: Yes.

• where W is the n x n weighted adjacent matrix. P is the n x n label probability distribution matrix which is composed of n (1 x n) row vectors Pi, one for each node

• Each element Pi(c) holds the current estimation of probability of node i observing label , where C is the set of labels (here, suppose C={1, 2, …, n})

• Ex. Pi=(0.1, 0.2, …, 0.05, …)

• To initialize P, each node is assigned a distribution of probabilities of all incoming edges

• Each node receives the label probability distribution from its neighbors and computes the new distribution

P3= (0.25, 0, 0.25, 0, 0, 0, 0.25, 0.25, 0, 0)

P1= (0.25, 0.25, 0.25, 0.25, 0, 0, 0, 0, 0, 0)

P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625)

P2= (0.25, 0.25, 0, 0, 0.25, 0.25, 0, 0, 0, 0)

P4= (0.25, 0, 0, 0.25, 0, 0, 0, 0, 0.25, 0.25)

• Each element Pi(c) rises to the inthpower:

• It increases probabilities of labels with high probability but decreases that of labels with low probabilities during label propagation.

P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625)

P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)

• The cutoff operator on P removes labels that are below the threshold with the help from Inflation Operator that decreases probabilities of labels with low probabilities during propagation.

• efficiently reduces the space complexity from quadratic to linear.

P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)

With r = 0.1, the average number of labels in each node is less than 3.

P1= (0.129)

• At each iteration, it updates a node i only when it is significantly different from its incoming neighbors in terms of labels:

• where is the set of maximum probability labels at node i at the last step. returns 1 if and 0 otherwise. ki is the node degree and q∈[0,1].

• isSubset can be viewed as a measure of similarity between two nodes.

Running time of LabelRank

• O(Tm): m is the number of edges and T is the number of iterations.

LabelRank is a linear algorithm

Performance of LabelRank

• It is a LabelRank with one extra conditional update rule by which only nodes involved changes will be updated. Changes are handled by comparing neighbors of node i at two consecutive steps, and .

• Split large communities

• Favor small communities

• Resolution limit problem

• Modularity optimization may fail to discover communities smaller than a scale even in cases where communities are unambiguously defined.

• This scale depends on the total number of edges in the network and the degree of interconnectedness of the communities.

• Favor large communities

Fortunato et al, 2008; Li et al, 2008; Arenas et al, 2008; Berry et al, 2009; Good et al, 2010; Ronhovde et al, 2010; Fortunato, 2010; Lancichinetti et al, 2011; Traag et al, 2011; Darst et al, 2013.

• Modularity (Q): the fraction of edges falling within communities minus the expected value in an equivalent network with edges placed at random

• Equivalent definition

M. E. J. Newman, 2006.

Newman and Girvan, 2004.

• Modularity (Q): the modularity of the community detection result

• Split penalty (SP): the fraction of edges that connect nodes of different communities

• Qs= Q – SP: solving the problem, favoring small communities, of Modularity

Qs with Community Density

• Resolution limit: Modularity optimization may fail to detect communities smaller than a scale

• Intuitively, put density into Modularity and Split Penalty to solve the resolution limit problem

• Equivalent definition

Example of One Complete Graph

Community Quality on a complete graph with 8 nodes

∆Qs=(0.8424-0.7848)=0.0576 > ∆Q=(0.8879-0.8758)=0.0121

Q & A