Community Detection Algorithm and Community Quality Metric
Download
1 / 28

Community Detection Algorithm and Community Quality Metric - PowerPoint PPT Presentation


  • 175 Views
  • Uploaded on

Community Detection Algorithm and Community Quality Metric. Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic Institute. Community Structure. Many networks display community structure

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Community Detection Algorithm and Community Quality Metric' - fauna


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Community Detection Algorithm and Community Quality Metric

MingmingChen & Boleslaw K. Szymanski

Department of Computer Science

Rensselaer Polytechnic Institute


Community Structure

  • Many networks display community structure

    • Groups of nodes within which connections are denser than between them

Community detection algorithms

Community quality metrics


Two Related Community Detection Topics

  • Community detection algorithm

    • LabelRank: a stabilized label propagation community detection algorithm

    • LabelRankT: extended algorithm for dynamic networks based on LabelRank

  • A new community quality metric solving two problems of Modularity

    • M. E. J. Newman, 2006;

    • Newman and Girvan, 2004.

Xie and Symanski, 2013.

Xie, Chen, and Symanski, 2013.


LabelRank Algorithm

  • Four operators applied to the labels

    • Label propagation operator

    • Inflation operator

    • Cutoff operator

    • Conditional update operator

No

No

Question: NP=P ?

Node 1: No;

Node 2: No;

Node 3: No;

Node 4: Yes.

2

3

1

1

No

1

1

97

1

P1 (No)=3/100;

P1 (Yes)=97/100.

P1 (No)=3/4;

P1 (Yes)=1/4.

Yes

4

Node 1: No.

Node 1: Yes.


Label Propagation Operator

  • where W is the n x n weighted adjacent matrix. P is the n x n label probability distribution matrix which is composed of n (1 x n) row vectors Pi, one for each node

  • Each element Pi(c) holds the current estimation of probability of node i observing label , where C is the set of labels (here, suppose C={1, 2, …, n})

    • Ex. Pi=(0.1, 0.2, …, 0.05, …)

  • To initialize P, each node is assigned a distribution of probabilities of all incoming edges


  • Label Propagation Operator

    • Each node receives the label probability distribution from its neighbors and computes the new distribution

    P3= (0.25, 0, 0.25, 0, 0, 0, 0.25, 0.25, 0, 0)

    P1= (0.25, 0.25, 0.25, 0.25, 0, 0, 0, 0, 0, 0)

    P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625)

    P2= (0.25, 0.25, 0, 0, 0.25, 0.25, 0, 0, 0, 0)

    P4= (0.25, 0, 0, 0.25, 0, 0, 0, 0, 0.25, 0.25)


    Inflation Operator

    • Each element Pi(c) rises to the inthpower:

    • It increases probabilities of labels with high probability but decreases that of labels with low probabilities during label propagation.

    P1= (0.25, 0.125, 0.125, 0.125, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625, 0.0625)

    P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)


    Cutoff Operator

    • The cutoff operator on P removes labels that are below the threshold with the help from Inflation Operator that decreases probabilities of labels with low probabilities during propagation.

    • efficiently reduces the space complexity from quadratic to linear.

    P1= (0.129, 0.0323, 0.0323, 0.0323, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806, 0.00806)

    With r = 0.1, the average number of labels in each node is less than 3.

    P1= (0.129)


    Conditional Update Operator

    • At each iteration, it updates a node i only when it is significantly different from its incoming neighbors in terms of labels:

      • where is the set of maximum probability labels at node i at the last step. returns 1 if and 0 otherwise. ki is the node degree and q∈[0,1].

    • isSubset can be viewed as a measure of similarity between two nodes.



    Running time of LabelRank

    • O(Tm): m is the number of edges and T is the number of iterations.

    LabelRank is a linear algorithm


    Performance of LabelRank


    LabelRankT

    • It is a LabelRank with one extra conditional update rule by which only nodes involved changes will be updated. Changes are handled by comparing neighbors of node i at two consecutive steps, and .


    Two Problems of Modularity Maximization

    • Split large communities

      • Favor small communities

    • Resolution limit problem

      • Modularity optimization may fail to discover communities smaller than a scale even in cases where communities are unambiguously defined.

      • This scale depends on the total number of edges in the network and the degree of interconnectedness of the communities.

      • Favor large communities

    Fortunato et al, 2008; Li et al, 2008; Arenas et al, 2008; Berry et al, 2009; Good et al, 2010; Ronhovde et al, 2010; Fortunato, 2010; Lancichinetti et al, 2011; Traag et al, 2011; Darst et al, 2013.


    Modularity

    • Modularity (Q): the fraction of edges falling within communities minus the expected value in an equivalent network with edges placed at random

    • Equivalent definition

    M. E. J. Newman, 2006.

    Newman and Girvan, 2004.


    Modularity with Split Penalty

    • Modularity (Q): the modularity of the community detection result

    • Split penalty (SP): the fraction of edges that connect nodes of different communities

    • Qs= Q – SP: solving the problem, favoring small communities, of Modularity


    Qs with Community Density

    • Resolution limit: Modularity optimization may fail to detect communities smaller than a scale

    • Intuitively, put density into Modularity and Split Penalty to solve the resolution limit problem

    • Equivalent definition








    Example of One Complete Graph

    Community Quality on a complete graph with 8 nodes



    5-clique Example

    ∆Qs=(0.8424-0.7848)=0.0576 > ∆Q=(0.8879-0.8758)=0.0121


    Thanks!

    Q & A



    ad