1 / 15

Cluster Evaluation

Cluster Evaluation. Metrics that can be used to evaluate the quality of a set of document clusters. Precision Recall & FScore. From Zhao and Karypis, 2002 These metrics are computed for every (class,cluster) pair. Terms: class L r of size n r cluster S i if size n i

skylar
Download Presentation

Cluster Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster Evaluation Metrics that can be used to evaluate the quality of a set of document clusters.

  2. Precision Recall & FScore • From Zhao and Karypis, 2002 • These metrics are computed for every (class,cluster) pair. • Terms: • class Lr of size nr • cluster Si if size ni • nri documents in Si from class Lr

  3. Precision • Loosely equated to accuracy • Roughly answers the question: “How many of the documents in this cluster belong there?” • P(Lr, Si) = nri/ni

  4. Recall • Roughly answers the question: “Did all of the documents that belong in this cluster make it in?” • P(Lr, Si) = nri/nr

  5. FScore • Harmonic Mean of Precision and Recall • Tries to give a good combination of the other 2 metrics • Calculated with the equation:

  6. FScore - Entire Solution • We calculate a per-class FScore: • We then combine these scores into a weighted average:

  7. FScore Caveats • The Zhao, Karypis paper focused on Hierarchical clustering, so the definitions of Precision/Mean and FScore might not apply as well to “flat” clustering. • The metrics rely on the use of class labels, so they can not be applied in situations were there is no labeled data.

  8. Possible Modifications • Calculate a per-cluster (not per class FScore: • Combine these scores into a weighted average:

  9. Rand Index • Yeung, et al., 2001 • Measure of partition agreement • Answers the question “How similar are these two ways of partitioning the data?” • To evaluate clusters, we compute the Rand Index between actual labels and clusters

  10. Rand Index • a = # pairs of documents that are in the same Si and Lr • b = # pairs of documents that are in the same Lr, but not the same Si • c = # pairs of documents in the same Si, but not the same Lr • d = # pairs of documents that are not in the same Lr nor Si.

  11. Adjusted Rand Index • The Rand index has a problem, the expected value for any 2 random partitions is relatively high, we’d like it to be close to 0. • Adjusted Rand index puts the expected value at 0, gives a more dynamic range and is probably a better metric. • See appendix B of Yeung, et al., 2001.

  12. Rand Index Caveat • Penalizes good, but finer grained clusters: imagine a sports class that produces 2 clusters, one for ball sports and one for track sports. • To fix that issue, we could hard label each cluster and treat all clusters with the same label as the same (clustering the clusters).

  13. Problems • The metrics so far depend on class labels. • They also give undeserved high scores as k approaches n, because almost all instances end up alone in a cluster.

  14. Label Entropy • My idea? (I haven’t seen it anywhere else) • Calculate an entropy value per cluster: • Combine entropies (weighted average):

  15. Log Likelihood of Data • Calculate the log likelihood of the Data according to the clusterers model. • If the clusterer doesn’t have an explicit model, treat clusters as classes and train a class conditional model of the data based on these class labelings. Use the new model to calculate log likelihood.

More Related