1 / 35

AntClass: Discovery of Clusters in Numeric Data by Ant Colony and K-Means Hybridization

This paper presents AntClass, an algorithm that combines ant colony optimization with the k-means algorithm to automatically discover clusters in numeric data without requiring prior knowledge of the number of clusters or an initial partition. The algorithm addresses challenges in ant-based clustering, such as assigning "free" objects. The paper discusses the motivation, objectives, and basic notations and heuristics of AntClass, and compares it to existing artificial ant-based approaches to clustering.

salliej
Download Presentation

AntClass: Discovery of Clusters in Numeric Data by Ant Colony and K-Means Hybridization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AntClass: discovery of cluster in numeric data by an hybridization of an ant colony with the k-means Advisor:Dr. Hsu Graduate:Ching-Lung Chen IDSL,Intelligent Database System Lab

  2. Outline • Motivation • Objective • Introduction • Current artificial ant based approaches to clustering • Basic notation and heuristics of AntClass • Hybridization • Hierarchical clustering • Conclusion • Personal Opinion IDSL,Intelligent Database System Lab

  3. Motivation • In data clustering , many algorithms require that an initial partition, this is one major drawback for these methods. • The algorithm provide by Lumer and Faieta 1994, that some objects are not assigned to any heaps when the ant algorithms stops. IDSL,Intelligent Database System Lab

  4. Objective • Let algorithm discovers automatically cluster without prior knowledge of a possible number of class、any initial partition and complex parameter setting. • To improve the algorithm provide by Lumer and Faieta 1994. IDSL,Intelligent Database System Lab

  5. Introduction • AntClass algorithm uses more robust “ant-like heuristics” than in previous approaches. • It encompasses an hybridization with the k-means algorithm in order to solve various problems inherent to ant-based clustering, like assigning “free” objects. • AntClass consists of four different steps: • Ant-based clustering of objects to create an initial relevant partition. • K-mean clustering • The same ant-based step but using hierarchical clustering on heaps of objects • K-means once more IDSL,Intelligent Database System Lab

  6. Current artificial ant based approaches to clustering • For a clustering problem which is solved by artificial ants. Ants try to pick up/drop objects on a 2D board according to a local density measure of similar objects. Results can be evaluated with a spatial entropy measure. • In this paper, introducing more robust ant-like heuristics, dealing with “unassigned objects”, speeding up convergence with the k-means algorithm. IDSL,Intelligent Database System Lab

  7. Basic notations and heuristics of AntClass 1/6 Figure 1 :The format of the data sets that AntClass will deal with • The missing values are represented as “?” • Use euclidean distance between two vectors, denoted by D in the follow. • Dmax will denote the maximum distance between two object of E IDSL,Intelligent Database System Lab

  8. Basic notations and heuristics of AntClass 2/6 • The ant-based algorithm of clustering is uses a 2D matrix C of m * mcells, where this matrix is a toroidal . • We have chosen the following relation between m and n : m2=n*4 • A heap H is considered to be a collection of at least two objects. • A heap is located on a given single cell, and is not a spatial pattern as explained in figure2. IDSL,Intelligent Database System Lab

  9. Basic notations and heuristics of AntClass 3/6 IDSL,Intelligent Database System Lab

  10. Basic notations and heuristics of AntClass 4/6 • The major advantage of this improvement compared to (Lumer and Faieta 1994) is that a heap or cluster can be easily identified, while in previous work spatial patterns of objects may “touch” each other, thus making the identification of clusters difficult. IDSL,Intelligent Database System Lab

  11. Basic notations and heuristics of AntClass 5/6 • For a given heap H of nH objects: • Dmax(H) is the maximum distance between two objects of H: • Ocenter(H) is center of mass of all objects in H: • Objects in this case are considered as vectors of k numerical values. IDSL,Intelligent Database System Lab

  12. Basic notations and heuristics of AntClass 6/6 • Odissim(H) is the most disimilar object in H, which maximizes D(.,Ocenter(H)) • Dmean(H) is the mean distance between the objects of H and the center of mass Ocenter(H): IDSL,Intelligent Database System Lab

  13. The ants – the colony • The colony consists of p ants ant1,…, antp. Each ant is located on one cell of the board. Initially this position is generated randomly and uniformly. IDSL,Intelligent Database System Lab

  14. Main ant-based algorithm in AntClass • The move is not totally random. Initially, anti selects a random direction among the 8 possible ones. • Then, anti has a probability Pdirection to further continue in this direction when moving next, else it generates randomly a new direction. • Each ant also has a speed parameter which tell of how many steps it will move in the selected direction before stopping on again. • The stopping criterion of this algorithm is simply the number of iterations. IDSL,Intelligent Database System Lab

  15. Picking up an object • When the ant is not carrying any object, it looks at 8 cell, then three cases have to be considered. • One object alone: the ant has a fixed probability to pick up the object. • A heap of two objects: D(Odissim(H), Ocenter(H))=Dmean(H). We give the ant a probability Pdestory to pick up any of the two objects which results in destroying the heap. • A heap of more than two objects: the ant picks up the most dissimilar object In the heap provided that its “dissimilarity” is above a given threshold Tremove. IDSL,Intelligent Database System Lab

  16. Picking up an object IDSL,Intelligent Database System Lab

  17. Dropping an object • When the ant is carrying an object, it looks at 8 cell , three cases have to be consider on again: • The cell is empty: the ant has simply a constant probability Pdrop to drop the object. • The cell contains one object only: the ant will drop the object and will thus create a heap of two objects but provided that the carried object is sufficiently similar to the one already in the cell. • The cell contain a heap: the ant will add its carried object to the heap provided that is is closer to H’s center than the most dissimilar object of H. IDSL,Intelligent Database System Lab

  18. Dropping an object • In order to avoid that an ant carries an object for a too long time, in the case of a very dissimilar object compared to the others, the ant will drop this object automatically after Maxcarry iterations on the first empty cell it encounter. IDSL,Intelligent Database System Lab

  19. Dropping an object IDSL,Intelligent Database System Lab

  20. Ants local memory • Since real ants have possibility to memorize several sites in their environment, we add a memory to each ant in order to speed up the classification. • When the ant is carrying an object, it searches in its memory for a heap H on which it could drop the object. • If it finds one, the memory of this heap is activated and the ant will goto H location • If it has not dropped the object on its way to H, the ant will drop the object on H provided that H is still valid. • Ants have only four slots in their memory IDSL,Intelligent Database System Lab

  21. Ants heterogeneous parameters • To solve complex parameter settings problem in this paper is get inspired from real ants and to have an heterogeneous population of ants with different behaviors. • Setting the same parameters for all ants have two major drawbacks: • It is difficult to find the optimal parameters • If the parameters are not the optimal ones, then the result have a chance to be poor. IDSL,Intelligent Database System Lab

  22. Ants heterogeneous parameters IDSL,Intelligent Database System Lab

  23. Hybridization • The previous algorithm has two important problems remain: • Due to the fact that some objects are not assigned to any heaps when the ant algorithm stops, we call in this paper “free objects” • If an object has been assigned to a wrong heap then it can take a long time until the object is transported to the right cluster. • We combine ant-based clustering and k-means algorithm to solve this problem. • First uses stochastic exploratory principle, then uses deterministic/heuristic principles. IDSL,Intelligent Database System Lab

  24. Hybridization IDSL,Intelligent Database System Lab

  25. Hierarchical clustering • In two previous steps of AntClass (ants+k-mean) the number of classes is always over estimated, it’s generate many small heaps but which are very homogeneous. • Consider those small and homogeneous heaps as objects themselves or “building blocks”, and to perform another ant-based step but on those newly defined objects. IDSL,Intelligent Database System Lab

  26. Clustering heaps of objects with ants In order to let the ants deal with heaps of objects: • Ants will be able to carry an entire heap of objects. • The algorithm for picking up a heap is globally the same as for objects. • Ants will pick up a heap with the same probability Pload. • Added another mechanism in order to avoid that ants carry all heaps at the same time. • Ants drop a heap H1 onto another heap H2 provided that: IDSL,Intelligent Database System Lab

  27. AntClass final hybrid and hierarchical algorithm • The previous step approximate well the number of class but may introduce small misclassification errors. • In order to remove these errors, we use once more the k-means algorithm on objects as previous. • Finally, we should add that all values in the data set are normalized. IDSL,Intelligent Database System Lab

  28. Experimental results IDSL,Intelligent Database System Lab

  29. Experimental results IDSL,Intelligent Database System Lab

  30. Experimental results IDSL,Intelligent Database System Lab

  31. Experimental results IDSL,Intelligent Database System Lab

  32. Experimental results IDSL,Intelligent Database System Lab

  33. Conclusions • We have presented in this paper a new hybrid and ant-based algorithm named AntClass for data clustering in a knowledge discovery context. • AntClass deals with numerical databases. • AntClass hierarchical clustering where ants may carry heaps of objects and not just objects. • AntClass uses an heterogeneous population of ants in order to avoid complex parameter settings. IDSL,Intelligent Database System Lab

  34. Personal Opinion • The AntClass can combine our concept hierarchy tree to solve category data problem. • We can use in distance compute and k-means algorithm separately . IDSL,Intelligent Database System Lab

  35. Review • Ant-based clustering of objects to create an initial relevant partition. • K-mean clustering • The same ant-based step but using hierarchical clustering on heaps of objects • K-means once more IDSL,Intelligent Database System Lab

More Related