Relative validity criteria for community mining evaluation
This presentation is the property of its rightful owner.
Sponsored Links
1 / 14

Relative Validity Criteria for Community Mining Evaluation PowerPoint PPT Presentation


  • 39 Views
  • Uploaded on
  • Presentation posted in: General

Relative Validity Criteria for Community Mining Evaluation. ASONAM 2012. Reihaneh Rabbany , Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo J. G. B. Campello Department of Computing Science, University of Alberta, Edmonton, Canada. Aug 2012. Motivation.

Download Presentation

Relative Validity Criteria for Community Mining Evaluation

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Relative validity criteria for community mining evaluation

Relative Validity Criteria for Community Mining Evaluation

ASONAM 2012

Reihaneh Rabbany, Mansoreh Takaffoli, Justin Fagnan, Osmar R. Zaϊane and Ricardo J. G. B. Campello

Department of Computing Science,

University of Alberta,

Edmonton, Canada

Aug 2012


Motivation

Motivation

Applications in different domains; sociology, criminology

  • Module identification in Biological Networks

    Clusters in Protein-Protein Interaction Networks Protein complexes and parts of pathways; Clusters in a protein similarity network protein families. (R Guimerà et al., Functional cartography of complex metabolic networks, Nature 433, 2005)

    Prerequisite of further analysis; Targeted advertising, link prediction, recommendation

  • Social Networks: personalized news feed, easier privacy settings

    Gmail's "Don't Forget Bob!" and "Got the Wrong Bob?" features (M Roth et al., Suggesting Friends Using the Implicit Social Graph, KDD 2010)

  • Citation network of scholars

    Paper and collaborator recommendation, Network visualization and Navigation; e.g. CiteULike, Arnet Miner and Microsoft Academic

  • Hyperlinks between web pages - WWW

    Detecting Group of closely related topics to refined search results(J Chen et al., An Unsupervised Approach to Cluster Web Search Results Based on Word Sense Communities. Web Intelligence 2008)

1


Community

Community

Loosely defined as groups of nodes that have relatively more links between themselves than to the rest of the network

  • Nodes that have structural similarity(SCAN, Xu et al. 2007)

  • Nodes that are connected with cliques(CFinder by Palla et al. 2005)

  • Nodes that a random walk is likely to trap within them (MCL by Dongen, Walktrap by Pons and Latapy)

  • Nodes that follow the same leader (TopLeaders, 2010)

  • Nodes that make the graph compress efficiently (Infomap, Infomod, Rosvall and Bergstrom, 2011)

  • Nodes that are separated from the rest by min cut, conductance (flow based methods, e.g. Kernighan-Lin (KL), betweenness of Newman)

  • Nodes that number of links between them is more than chance (Newman's Q modularity, FastModularity, Blondel et al.’s Louvain)

2


Evaluation overlooked

Evaluation; overlooked

Internal Evaluation

Predefined quality/structure for the communities

  • Graph partitioning measures (density, conductance)

    External Evaluation

    Agreement between the results and a given known ground-truth

  • A clustering similarity/agreement indexes; Rand Index, Jaccard

  • Benchmarks with ground truth; GN(2002), LFR(2008)

  • The community structure is not known beforehand

    No ground truth

    No large data set with known ground truth

    The synthetic benchmarks disagree with some real network characteristics

    Karate

    GN

    LFR

    3


    Relative validity criteria

    Relative Validity Criteria

    Validity criteria defined for clustering evaluation; compares different clusterings of a same data set

    We altered criteria

    • Generalized distance; graph distance measures

    • Generalized mean/centroid notion; averaging v.s. medoid

      e.g. Variance Ratio Criterion (VRC)

      Same for: Dunn index, Silhouette Width Criterion (SWC), Alternative Silhouette, PBM, C-Index, Z-Statistics, Point-Biserial (PB)

      Distance Alternatives: Edge Path (ED), Shortest Path Distance (SPD), Adjacency Relation Distance (ARD), Neighbour Overlap Distance (NOD), Pearson Correlation Distance (PCD), ICloseness Distance (ICD)

    4


    Correlation with external index

    Correlation with External Index

    Correlation of relative criteria and external scores on different clusterings of same data set

    random clusterings that range from very close to very far from ground truth

    For karate;

    5


    Correlation with external index1

    Correlation with External Index

    Correlation of relative criteria and external scores on different clusterings of same data set

    random clusterings that range from very close to very far from ground truth

    For karate;

    5


    Ranking of criteria on real world benchmarks

    Ranking of Criteria on Real World Benchmarks

    Difficulty Analysis

    Data set statistics

    Overall Ranking

    6


    Ranking of criteria on synthetic benchmarks

    Ranking of Criteria on Synthetic Benchmarks

    Ranking for well separated communities

    Data set statistics

    Overall ranking for very mixed communities

    7


    Ranking varies

    Ranking varies

    Criteria Ranking is affected by:

    • Choice of benchmarks, synthetic generator and its parameters

    • Choice of External agreement Index; ARI, NMI, AMI, Jacard

    • Choice of correlation measure; Pearson & Spearman correlation

    • Choice of clustering randomization

      Get the ranking in your setting

      www.cs.ualberta.ca/~rabbanyk/criteriaComparison

    8


    Future works

    Future Works

    Evaluation Issues

    • Community mining specific agreement measure

    • Realistic synthetic benchmarks

      Extensions of criteria

    • Incorporating attributes; combine clustering and community mining for cases for which we have both attributes and relations

    • Incorporating uncertainty and edges with probability

    • ...

    9


    Relative validity criteria for community mining evaluation

    End

    Questions?

    10


    Alternative distances

    Alternative Distances

    • Edge Path (ED),

    • Shortest Path Distance (SPD),

    • Adjacency Relation Distance (ARD),

    • Neighbour Overlap Distance (NOD),

    • Pearson Correlation Distance (PCD),

    • ICloseness Distance (ICD)

    A


    Relative criteria

    Relative criteria

    • VarianceRatioCriterion (VRC)

    • Dunn index,

    • Silhouette Width Criterion (SWC),

    • Alternative Silhouette,

    • PBM,

    • Davies-Bouldin

    • C-Index,

    • Point-Biserial (PB)

    B


  • Login