Download Presentation
Numerical Method for AI

Loading in 2 Seconds...

1 / 20

# Numerical Method for AI - PowerPoint PPT Presentation

Clustering NG GIM MENG WEK070100 Tan Wei Ren WEK Tan Chin Keong WEK. Numerical Method for AI. Clustering a method of unsupervised learning, it’s not trained with examples of correct answers. a common technique for statistical data analysis used in many

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about 'Numerical Method for AI' - eliza

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Clustering

NG GIM MENG WEK070100

Tan Wei Ren WEKTan Chin Keong WEK

### Numerical Method for AI

Clustering

• a method of unsupervised learning, it’s not trained with
• examples of correct answers.
• a common technique for statistical data analysis used in many
• fields, including machine learning, data mining, pattern
• recognition, image analysis and bioinformatics.
• It’s like classification but the groups are not predefined, so the
• algorithm will try to group similar items together.

### Numerical Method for AI

Introduction

Types of clustering

• Hierarchical algorithms find successive clusters using previously established clusters.
• Partitional algorithms typically determine all clusters at once, but can also be used as divisive algorithms in the hierarchical clustering.
• Density-based clustering algorithms are devised to discover arbitrary-shaped clusters.

### Numerical Method for AI

Introduction

An important step in any clustering is to select a distance measure, which will determine how the similarity of two elements is calculated.

This will influence the shape of the clusters, as some elements may be close to one another according to one distance and farther away according to another.

Distance measure

### Numerical Method for AI

Distance measure

Hierarchical clustering creates a hierarchy of clusters which may be represented in a tree structure called a dendrogram. The root of the tree consists of a single cluster containing all observations, and the leaves correspond to individual observations.Algorithms for hierarchical clustering are generally either agglomerative, in which one starts at the leaves and successively merges clusters together; or divisive, in which one starts at the root and recursively splits the clusters.

### Numerical Method for AI

i.Hierarchical clustering

For example, suppose this data is to be clustered, and the euclidean distance is the distance metric.

Raw data Traditional representation

### Numerical Method for AI

i.Hierarchical clustering

Each agglomeration occurs at a greater distance between clusters than the previous agglomeration, and one can decide to stop clustering either when the clusters are too far apart to be merged (distance criterion) or when there is a sufficiently small number of clusters (number criterion).

### Numerical Method for AI

i.Hierarchical clustering

Each agglomeration occurs at a greater distance between clusters than the previous agglomeration, and one can decide to stop clustering either when the clusters are too far apart to be merged (distance criterion) or when there is a sufficiently small number of clusters (number criterion).

### Numerical Method for AI

i.Hierarchical clustering

1. k-meansclustering

The k-means algorithm assigns each point to the cluster whose center (alsso called centroid) is nearest. The center is the average of all the points in the cluster - that is, its coordinates are the arithmetic mean for each dimension separately over all the points in the cluster.Example : The data set has three dimensions and the cluster has two points : X = (x1, x2, x3) and Y = (y1, y2, y3). Then the centroidZbecomesZ = (z1, z2, z3), where z1 = (x1 + y1)/2 and z2 = (x2 + y2)/2 and z3 = (x3 + y3)/2.

### Numerical Method for AI

ii.Partitional Clustering

1. k-meansclustering

• The algorithm steps are :
• Choose the number of clusters, k.
• Randomly generate k clusters and determine the cluster
• centers, or directly generate k random points as cluster centers.
• Assign each point to the nearest cluster center.
• Recompute the new cluster centers.
• Repeat the two previous steps until some convergence criterion
• is met (usually that the assignment hasn't changed).

### Numerical Method for AI

ii.Partitional Clustering

1. k-meansclustering

Advantages : Its simplicity and speed which allows it to run on large datasets.Disadvantage :It does not yield the same result with each run, since the resulting clusters depend on the initial random assignments.The requirement for the concept of a mean to be definable which is not always the case. For such datasets the k-medoids variant is appropriate.

### Numerical Method for AI

ii.Partitional Clustering

1. k-meansclustering

Advantages : Its simplicity and speed which allows it to run on large datasets.Disadvantage :It does not yield the same result with each run, since the resulting clusters depend on the initial random assignments.The requirement for the concept of a mean to be definable which is not always the case. For such datasets the k-medoids variant is appropriate.

### Numerical Method for AI

ii.Partitional Clustering

A.K. Jain M.N. Murty and P.J. Flin:'Data Clustering: a review’ ACM Computing survey vol 31 (3), sept 1999.

This paper presents an overview of pattern clustering methods, with a goal of providing useful advice and references to fundamental concepts accessible to the broad community of clustering practitioners. It presents taxonomy of clustering techniques, and recent advances. It also describes some important applications of clustering algorithms such as image segmentation, object recognition, and information retrieval.

### Numerical Method for AI

Literature Review

Components of a Clustering Task

• Typical pattern clustering activity involves the following steps [Jain and Dubes 1988]:
• (1) Pattern representation,
• (2) Definition of a pattern proximity measure appropriate to the data domain,
• (3) Clustering or grouping,
• (4) Data abstraction (if needed),
• (5) Assessment of output (if needed).

### Numerical Method for AI

Literature Review

Pattern representationrefers to the number of classes, the number of available patterns, and the number, type, and scale of the features available to the clustering algorithm. Some of this information may not be controllable.

• Pattern proximitydetermines how the similarity of two elements is calculated. It is usually measured by a distance function defined on pairs of patterns. A variety of distance measures are in use in the various communities.
• The grouping step can be performed in a number of ways. The output clustering (or clusterings) can be hard (a partition of the data into groups) or fuzzy (where each pattern has a variable degree of membership in each of the output clusters).
• Data abstractionis the process of extracting a simple and compact representation of a data set - a compact description of each cluster.

### Numerical Method for AI

Literature Review

Data mining

Many data mining applications involve partitioning data items into related subsets; the marketing applications discussed above represent some examples.Grouping of Shopping Items

Clustering can be used to group all the shopping items available on the web into a set of unique products. For example, all the items on eBay can be grouped into unique products. (eBay doesn't have the concept of a Stock-Keeping Unit(SKU))

### Numerical Method for AI

Application

Social network analysis

In the study of social networks, clustering may be used to recognize communities within large groups of people.

Medicine

In medical imaging, such as PET scans, cluster analysis can be used to differentiate between different types of tissue and blood in a three dimensional image.Market research

Market researchers use cluster analysis to partition the general population of consumers into market segments and to better understand the relationships between different groups of consumers/potential customers.

### Numerical Method for AI

Application

Comparisons between data clusterings

There have been several suggestions for a measure of similarity between two clusterings. Such a measure can be used to compare how well different data clustering algorithms perform on a set of data.

Recently, a new Mallows Distance based metric was also proposed for soft clustering comparisons.

Conclusion