1 / 22

Clustering Ensembles Using Ant Algorithms

Clustering Ensembles Using Ant Algorithms. Javad Azimi , Paul Cull, Xiaoli Fern { azimi,pc,xfern }@ eecs.oregonstate.edu Oregon State University Presenting by: Paul Cull. Outline. Clustering Ensembles Ant Clustering Proposed Method Experimental Results. Clustering.

kennice
Download Presentation

Clustering Ensembles Using Ant Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering Ensembles Using Ant Algorithms JavadAzimi, Paul Cull, Xiaoli Fern {azimi,pc,xfern}@eecs.oregonstate.edu Oregon State University Presenting by: Paul Cull

  2. Outline • Clustering Ensembles • Ant Clustering • Proposed Method • Experimental Results

  3. Clustering • Grouping data into similar groups without prior knowledge of the clustering. • There is no clustering algorithm that performs best for all data sets. • Clustering ensembles combine outputs of multiple clustering methods to perform a better clustering.

  4. Clustering Ensembles • Generally consists of two main steps: • Generating several partitions using different clustering algorithms with different initialization. • Using a consensus function to generate the final partition.

  5. Clustering Ensemble Steps Final Clusters Use a combiner to obtain final results. Consensus Function Performing different clustering algorithms with different specifications. Results 1 Results 2 ……………. Results n Generating different subsets . Data subset 1 Data subset 2 …………… Data subset n Data Set

  6. Proposed method • We used Ant clustering algorithm as our consensus function which has the following properties: • Appropriate clustering accuracy. • Extracting the number of clusters automatically. • Detecting outlier and marginal samples.

  7. Proposed method framework • First, we run the k-means algorithm as our basic clustering algorithms several times • Then we used the co-association matrix to assess the similarity over initial clustering outputs.

  8. Co-association matrix • Co-association matrix is m*m matrix which m is the number of samples. • Suppose we have B initial clustering results, the entry (x,y) in co-association matrix is calculated as: • Where Pi(z) is the cluster of object z in iteration i. Co-association (x, y) = 1 if a = b 0 if a ≠ b

  9. Example(co-association matrix) • An example of a co-association matrix with 21 initial clustering outputs.

  10. Ant Clustering based on co-association matrix (big picture) • We apply the ant clustering algorithm based on the co-association matrix • Initially, each object is in a singleton cluster. • A fixed number of ants randomly move the objects based on some probabilities. • The ants typically do two different operations: • Picking up • Dropping off

  11. Picking up • Select a non-empty cluster at random. There are three different cases: • The cluster contains only one object. • Action: pick up the object. • The cluster contains two objects. • Action: pick up one of the objects at random. • The cluster Cj contains more than two objects. • Select the most dissimilar object xifrom the cluster Cj with minimum value: : • If S(xi) is less than Premove, pick up the xi from cluster Cj. • Premoveis set at 0.5. The most dissimilar object will not be picked up if it has been clustered together with other objects in more than half of the initial clusterings.

  12. Dropping off • Select a non-empty cluster at random which can encounter two different cases: • The cluster contains one or two object(s). • Action: Drop off the object if its average similarity with the object(s) in the cluster is more than Pcreate. • Pcreateis set as 0.5 which requires the object to have been clustered together with the other object s in more than half of the initial clusterings. • The cluster contains more than 2 objects. • Action: Drop off the object if the object is more similar to the cluster than the cluster’s most dissimilar object.

  13. Sweeping • It is possible to have some singleton or very small clusters after some iterations. • To fix this, run sweeping procedure. • In sweeping procedures, the objects in all clusters that are singletons or contain too few objects are assigned to the most similar clusters.

  14. Outlier and marginal samples detection • Outliers objects: The objects which are far from the center of their clusters. • Marginal objects: The objects that border two or more clusters, and as a results change their cluster membership frequently. • The outlier and marginal objects can mislead the clustering procedure. • Misplace the center of the clusters. • Merging two cluster which share marginal objects.

  15. Marginal Objects Detection • The marginal objects have lower average similarity with the other objects in their clusters. • Therefore, they will be picked up frequently by ants during the clustering. • Also, the ants will likely fail to find an alternative cluster to drop them. • We compute the marginality index which records how many times the object was picked up and then returned to the original cluster. • The larger marginally index, the more likely we consider the object to be marginal.

  16. Outlier Objects Detection • The outlier objects are usually far from the center of their cluster. • Some virtual ants are employed with following properties: • The ants pick up an object which is far from the center of its cluster based on Euclidean distance instead of using co-association matrix. • The ants use the same procedure to drop off the objects. • If an object that is picked up by an virtual ant could not be dropped in any cluster and has to be returned to its original cluster, it is likely an outlier. • We use outlier index to distinguish outlier objects.

  17. Experimental Results • We use 6*n ants in each data set where n is the number of objects in our data set. • We run 100 independent runs of k-means algorithm with different initializations to generate the co-association matrix.

  18. Data Sets

  19. The accuracy of proposed method vs CSPA and ALA • CSPA and ALA are two popular graph based and hierarchical consensus function.

  20. The ISODATA Result • ISODATA is a popular method in detecting the number of clusters and to detect the outlier objects.

  21. Outlier and Marginal objects Detection

  22. Conclusion • We introduced a method which can use the benefits of both clustering ensembles and ant clustering. • Clustering Ensembles: Robustness and high quality. • Ant algorithm : Appropriate classification accuracy and accurate detection of the number of clusters . • Also it detects, marginal and outlier objects with a little computation cost.

More Related