Jan Flusser
This presentation is the property of its rightful owner.
Sponsored Links
1 / 93

Institute of Information Theory and Automation Introduction to Pattern Recognition PowerPoint PPT Presentation


  • 91 Views
  • Uploaded on
  • Presentation posted in: General

Jan Flusser flusser @utia.cas.cz. Institute of Information Theory and Automation Introduction to Pattern Recognition. Pattern Recognition. Recognition (classification) = assigning a pattern/object to one of pre-defined classes. Pattern Recognition.

Download Presentation

Institute of Information Theory and Automation Introduction to Pattern Recognition

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Institute of information theory and automation introduction to pattern recognition

Jan Flusser

[email protected]

Institute of Information Theory and Automation Introduction to Pattern Recognition


Institute of information theory and automation introduction to pattern recognition

Pattern Recognition

  • Recognition (classification) = assigning a pattern/object to one of pre-defined classes


Institute of information theory and automation introduction to pattern recognition

Pattern Recognition

  • Recognition (classification) = assigning a pattern/object to one of pre-defined classes

  • Statictical (feature-based) PR - the pattern is described by features (n-D vector in a metric space)


Institute of information theory and automation introduction to pattern recognition

Pattern Recognition

  • Recognition (classification) = assigning a pattern/object to one of pre-defined classes

  • Syntactic (structural) PR - the pattern is described by its structure. Formal language theory (class = language, pattern = word)


Institute of information theory and automation introduction to pattern recognition

Pattern Recognition

  • Supervised PR

    – training set available for each class


Institute of information theory and automation introduction to pattern recognition

Pattern Recognition

  • Supervised PR

    – training set available for each class

  • Unsupervised PR (clustering)

    – training set not available, No. of classes may not be known


Institute of information theory and automation introduction to pattern recognition

PR system - Training stage


Institute of information theory and automation introduction to pattern recognition

Desirable properties of the features

  • Invariance

  • Discriminability

  • Robustness

  • Efficiency, independence, completeness


Institute of information theory and automation introduction to pattern recognition

Desirable properties of the training set

  • It should contain typical representatives of each class including intra-class variations

  • Reliable and large enough

  • Should be selected by domain experts


Institute of information theory and automation introduction to pattern recognition

Classification rule setup

Equivalent to a partitioning of the feature space

Independent of the particular application


Institute of information theory and automation introduction to pattern recognition

PR system – Recognition stage


Institute of information theory and automation introduction to pattern recognition

An example – Fish classification


Institute of information theory and automation introduction to pattern recognition

The features: Length, width, brightness


Institute of information theory and automation introduction to pattern recognition

2-D feature space


Institute of information theory and automation introduction to pattern recognition

Empirical observation

  • For a given training set, we can have

  • several classifiers (several partitioning

  • of the feature space)


Institute of information theory and automation introduction to pattern recognition

Empirical observation

  • For a given training set, we can have

  • several classifiers (several partitioning

  • of the feature space)

  • Thetraining samplesare not always

  • classified correctly


Institute of information theory and automation introduction to pattern recognition

Empirical observation

  • For a given training set, we can have

  • several classifiers (several partitioning

  • of the feature space)

  • Thetraining samplesare not always

  • classified correctly

  • We should avoid overtraining of the

  • classifier


Institute of information theory and automation introduction to pattern recognition

Formal definition of the classifier

  • Each class is characterized by its

  • discriminant function g(x)

  • Classification =maximization of g(x)

  • Assign x to class i iff

  • Discriminant functions defines decision

  • boundaries in the feature space


Institute of information theory and automation introduction to pattern recognition

Minimum distance (NN) classifier

  • Discriminant function

  • g(x) = 1/ dist(x,w)

  • Various definitions of dist(x,w)

  • One-element training set 


Institute of information theory and automation introduction to pattern recognition

Voronoi polygons


Institute of information theory and automation introduction to pattern recognition

Minimum distance (NN) classifier

  • Discriminant function

  • g(x) = 1/ dist(x,w)

  • Various definitions of dist(x,w)

  • One-element training set  Voronoi pol

  • NN classifier may not be linear

  • NN classifier is sensitive to outliers 

  • k-NN classifier


Institute of information theory and automation introduction to pattern recognition

k-NN classifier

Find nearest training points unless k samples

belonging to one class is reached


Institute of information theory and automation introduction to pattern recognition

Linear classifier

Discriminant functions g(x) are hyperplanes


Institute of information theory and automation introduction to pattern recognition

Bayesian classifier

Assumption: feature values are random variables

Statistic classifier, the decission is probabilistic

It is based on the Bayes rule


Institute of information theory and automation introduction to pattern recognition

TheBayes rule

Class-conditional

probability

A priori

probability

A posteriori

probability

Total

probability


Institute of information theory and automation introduction to pattern recognition

Bayesian classifier

Main idea: maximize posterior probability

Since it is hard to do directly, we rather maximize

In case of equal priors, we maximize only


Institute of information theory and automation introduction to pattern recognition

Equivalent formulation in terms

of discriminat functions


Institute of information theory and automation introduction to pattern recognition

How to estimate ?

  • From the case studies performed before (OCR, speech recognition)

  • From the occurence in the training set

  • Assumption of equal priors

  • Parametric estimate (assuming pdf is

  • of known form, e.g. Gaussian)

  • Non-parametric estimate (pdf is

  • unknown or too complex)


Institute of information theory and automation introduction to pattern recognition

Parametric estimate of Gaussian


Institute of information theory and automation introduction to pattern recognition

d-dimensional Gaussian pdf


Institute of information theory and automation introduction to pattern recognition

The role of covariance matrix


Institute of information theory and automation introduction to pattern recognition

Two-class Gaussian case in 2D

Classification = comparison of twoGaussians


Institute of information theory and automation introduction to pattern recognition

Two-class Gaussian case – Equal cov. mat.

Linear decision boundary


Institute of information theory and automation introduction to pattern recognition

Equal priors

max

min

Classification by minimum Mahalanobis distance

If the cov. mat. is diagonal with equal variances

then we get “standard” minimum distance rule


Institute of information theory and automation introduction to pattern recognition

Non-equal priors

Linear decision boundary still preserved


Institute of information theory and automation introduction to pattern recognition

General G case

in 2D

Decision boundary is a hyperquadric


Institute of information theory and automation introduction to pattern recognition

General G case

in 3D

Decision boundary is a hyperquadric


Institute of information theory and automation introduction to pattern recognition

More classes, Gaussian case in 2D


Institute of information theory and automation introduction to pattern recognition

What to do if the classes are not normally

distributed?

  • Gaussian mixtures

  • Parametric estimation of some other pdf

  • Non-parametric estimation


Institute of information theory and automation introduction to pattern recognition

Non-parametric estimation – Parzen window


Institute of information theory and automation introduction to pattern recognition

The role of the window size

  • Small window  overtaining

  • Large window  data smoothing

  • Continuous data  the size does not matter


Institute of information theory and automation introduction to pattern recognition

n = 1

n = 10

n = 100

n = ∞


Institute of information theory and automation introduction to pattern recognition

The role of the window size

Small window Large window


Institute of information theory and automation introduction to pattern recognition

Applications of Bayesian classifier

in multispectral remote sensing

  • Objects = pixels

  • Features = pixel values in the spectral bands

  • (from 4 to several hundreds)

  • Training set – selected manually by means of thematic maps (GIS), and on-site observation

  • Number of classes – typicaly from 2 to 16


Institute of information theory and automation introduction to pattern recognition

Satellite MS image


Institute of information theory and automation introduction to pattern recognition

Other classification methods in RS

  • Context-based classifiers

  • Shape and textural features

  • Post-classification filtering

  • Spectral pixel unmixing


Institute of information theory and automation introduction to pattern recognition

Non-metric classifiers

Typically for “YES – NO” features

Feature metric is not explicitely defined

Decision trees


Institute of information theory and automation introduction to pattern recognition

General decision tree


Institute of information theory and automation introduction to pattern recognition

Binary decision tree

Any decision tree can be replaced by a binary tree


Institute of information theory and automation introduction to pattern recognition

Real-valued features

  • Node decisions are in form of inequalities

  • Training = setting their parameters

  • Simple inequalities  stepwise decision

  • boundary


Institute of information theory and automation introduction to pattern recognition

Stepwise decision boundary


Institute of information theory and automation introduction to pattern recognition

Real-valued features

The tree structure and the form of inequalities

influence both performance and speed.


Institute of information theory and automation introduction to pattern recognition

Classification performance

  • How to evaluate the performance of the

  • classifiers?

  • - evaluation on the training set (optimistic error estimate)

  • - evaluation on the test set (pesimistic

  • error estimate)


Institute of information theory and automation introduction to pattern recognition

Classification performance

  • How to increase the performance?

  • - other features

  • - more features (dangerous – curse of dimensionality!)

  • - other (larger, better) traning sets

  • - other parametric model

  • - other classifier

  • - combining different classifiers


Institute of information theory and automation introduction to pattern recognition

Combining (fusing) classifiers

C feature vectors

Bayes rule: max

max

Several possibilities how to do that


Institute of information theory and automation introduction to pattern recognition

Product rule

Assumption: Conditional independence of xj

max


Institute of information theory and automation introduction to pattern recognition

Max-max and max-min rules

Assumption: Equal priors

max (max )

max (min )


Institute of information theory and automation introduction to pattern recognition

Majority vote

  • The most straightforward fusion method

  • Can be used for all types of classifiers


Institute of information theory and automation introduction to pattern recognition

Unsupervised Classification

(Cluster analysis)

Training set is not available, No. of classes may not be a priori known


Institute of information theory and automation introduction to pattern recognition

What are clusters?

Intuitive meaning

- compact, well-separated subsets

Formal definition

- any partition of the data into disjoint subsets


Institute of information theory and automation introduction to pattern recognition

What are clusters?


Institute of information theory and automation introduction to pattern recognition

How to compare different clusterings?

Variance measure J should be minimized

Drawback – only clusterings with the same

N can be compared. Global minimum

J = 0 is reached in the degenerated case.


Institute of information theory and automation introduction to pattern recognition

Clustering techniques

  • Iterative methods

  • - typically if N is given

  • Hierarchical methods

  • - typically if N is unknown

  • Other methods

  • - sequential, graph-based, branch & bound,

  • fuzzy, genetic, etc.


Institute of information theory and automation introduction to pattern recognition

Sequential clustering

  • N may be unknown

  • Very fast but not very good

  • Each point is considered only once

Idea: a new point is either added to an existing cluster or it forms a new cluster. The decision is based on the user-defined distance threshold.


Institute of information theory and automation introduction to pattern recognition

Sequential clustering

  • Drawbacks:

  • Dependence on the distance threshold

  • Dependence on the order of data points


Institute of information theory and automation introduction to pattern recognition

Iterative clustering methods

  • N-means clustering

  • Iterative minimization of J

  • ISODATA

  • Iterative Self-Organizing DATa Analysis


Institute of information theory and automation introduction to pattern recognition

N-means clustering

  • Select N initial cluster centroids.


Institute of information theory and automation introduction to pattern recognition

N-means clustering

2. Classify every point x according to minimum distance.


Institute of information theory and automation introduction to pattern recognition

N-means clustering

3. Recalculate the cluster centroids.


Institute of information theory and automation introduction to pattern recognition

N-means clustering

4. If the centroids did not change then STOP else GOTO 2.


Institute of information theory and automation introduction to pattern recognition

N-means clustering

  • Drawbacks

  • The result depends on the initialization.

  • J is not minimized

  • The results are sometimes “intuitively wrong”.


Institute of information theory and automation introduction to pattern recognition

N-means clustering – An example

Two features, four points, two clusters (N = 2)

Different initializations  different clusterings


Institute of information theory and automation introduction to pattern recognition

N-means clustering – An example

Initial centroids


Institute of information theory and automation introduction to pattern recognition

N-means clustering – An example

Initial centroids


Institute of information theory and automation introduction to pattern recognition

Iterative minimization of J

  • Let’s have an initial clustering (by N-means)

  • For every point x do the following:

  • Move x from its current cluster to another cluster, such that the decrease of J is maximized.

  • 3. If all data points do not move, then STOP.


Institute of information theory and automation introduction to pattern recognition

Example of “wrong” result


Institute of information theory and automation introduction to pattern recognition

ISODATA

  • Iterative clustering, N may vary.

  • Sophisticated method, a part of many statistical software systems.

  • Postprocessing after each iteration

  • Clusters with few elements are cancelled

  • Clusters with big variance are divided

  • Other merging and splitting strategies can be implemented


Institute of information theory and automation introduction to pattern recognition

Hierarchical clustering methods

  • Agglomerative clustering

  • Divisive clustering


Institute of information theory and automation introduction to pattern recognition

Basic agglomerative clustering

  • Each point = one cluster


Institute of information theory and automation introduction to pattern recognition

Basic agglomerative clustering

  • Each point = one cluster

  • Find two “nearest” or “most similar” clusters and merge them together


Institute of information theory and automation introduction to pattern recognition

Basic agglomerative clustering

  • Each point = one cluster

  • Find two “nearest” or “most similar” clusters and merge them together

  • Repeat 2 until the stop constraint is reached


Institute of information theory and automation introduction to pattern recognition

Basic agglomerative clustering

  • Particular implementations of this method differ from each other by

  • The STOP constraints

  • The distance/similarity measures used


Institute of information theory and automation introduction to pattern recognition

Simple between-cluster distance measures

d(A,B) = d(m1,m2)

d(A,B) = min d(a,b) d(A,B) = max d(a,b)


Institute of information theory and automation introduction to pattern recognition

Other between-cluster distance measures

d(A,B) = Hausdorf distance H(A,B)

d(A,B) = J(AUB) – J(A,B)


Institute of information theory and automation introduction to pattern recognition

Agglomerative clustering – representation

by a clustering tree (dendrogram)


Institute of information theory and automation introduction to pattern recognition

How many clusters are there?

2 or 4 ?

Clustering is a very subjective task


Institute of information theory and automation introduction to pattern recognition

How many clusters are there?

  • Difficult to answer even for humans

  • “Clustering tendency”

  • Hierarchical methods – N can be estimated from the complete dendrogram

  • The methods minimizing acost function –

  • N can be estimated from “knees” in J-N graph


Institute of information theory and automation introduction to pattern recognition

Life time of the clusters

Optimal number of clusters = 4


Institute of information theory and automation introduction to pattern recognition

Optimal number of clusters


Institute of information theory and automation introduction to pattern recognition

Applications of clustering in image proc.

  • Segmentation – clustering in color space

  • Preliminary classification of multispectral

  • images

  • Clustering in parametric space – RANSAC,

  • image registration and matching

Numerous applications are outside image processing area


References

References

Duda, Hart, Stork: Pattern Clasification,

2nd ed., Wiley Interscience, 2001

Theodoridis, Koutrombas: Pattern Recognition, 2nd ed., Elsevier, 2003

http://www.utia.cas.cz/user_data/zitova/prednasky/


  • Login