Loading in 2 Seconds...

Download Presentation

Document Analysis: Fundamentals of pattern recognition

Loading in 2 Seconds...

- 85 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Document Analysis: Fundamentals of pattern recognition' - pancho

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Presentation Transcript

Prof. Rolf Ingold, University of Fribourg

Master course, spring semester 2008

Document Analysis:Fundamentals of pattern recognitionOutline

Introduction

Feature extraction and decision

Role of training

Feature selection

Example : Font recognition

Bayesian decision theory

Evaluation

Goals of Pattern Recognition

Pattern recognition aims at discovering and identifying patterns in raw data

it consists of assigning symbols to data (patterns)

it is based on a a priori knowledge, often statistical information

Pattern recognition is used for computer perception (image/sound analysis)

in a preliminary step, a sensor captures raw information

this information is interpreted to take decisions

Pattern recognition can be thought as a methodic way of reducing the information in order to keep only the relevant meaning

Pattern Recognition Applications

Pattern recognition is involved in many applications

seismological survey

speech recognition

scientific imagery (biology, health-care, physics, ...)

satellite based observation (military and civil applications, ...)

document analysis, with several components:

optical character recognition (OCR)

font identification

handwriting recognition (off-line )

graphics recognition

computer vision (3D scene analysis)

biometry: person identification and authentication

...

Pattern recognition methodologies rely on other scientific domains: statistics, operation research, graph theory, artificial intelligence, ...

Origin of Difficulties

Pattern recognition is mainly an information overload problem

The difficulty is issued from

variability of objects belonging to the same class

distortion of captured data (noise, degradations, ...)

Steps Involved in Pattern Recognition

Pattern recognition is basically a two stage process:

Feature extraction, aiming at removing redundancy while keeping significant information

Classification, consisting in making a decision by associating a class label

observation

class

feature vector

Role of Training

Classifiers (tools that perform classification tasks) are generally designed to be trained

Each class is characterized by a model

Models are built with representative training data

Features

extraction

decision

classes

Models

training

Supervised vs. Unsupervised Training

Two different situations may occur regarding training material:

Supervised training is performed when the training samples are labeled with the class they belong to

each class is associated with a set of training samplesTi={xi1, xi2,..., xiNi}, supposed to be statistically representative for the class

Unsupervised training is performed when the training samples are statistically representative but mixed over all classesT={x1, x2,..., xn},

Feature Selection

Features are selected accordingly to the application

Features should be chosen carefully by considering

discrimination power between classes

robustness to intra-class distortions and noise

global statistical independency (spread over the entire feature space)

"fast computation"

reasonable dimension (number of features)

Features for Character Recognition

Given a binary image of a character, a lot of features can be used for character recognition

Size, i.e., width and height of the bounding box

Position of baseline (if available)

Weight (number of black pixels)

Perimeter (length of the contours)

Center of gravity

Moments (second and third order in both directions)

Distributions of horizontal and vertical runs

Number of intersections with a (eventually random) set of lines

Length and structure (singular points, holes) of skeleton

...

Local features computed on sub-images

…

Font Recognition: Goal

Goal: recognize fonts of synthetically generated isolated words

as binary (black & white) or grey level images

at 300 dpi

12 standard font classes are considered

3 families:

Arial

Courier New

Times New Roman

4 styles:

Plain

Italic

Bold

Bold Italic

single size : 12 pt

Font Recognition: Extracted Features

Words are segmented with a surrounding white border of 1 pixel

Some preprocessing steps are used

Horizontal projection profile (hp)

Derivative of horizontal projection profile (hpd)

The following features are calculated

hp-mean (or density): mean of hp

hpd-stdev (or slanting): standard deviation of hpd

hr-mean: mean of horizontal runs (up to length 12)

hr-stdev: standard deviation of horizontal runs (up to length 12)

vr-mean: mean of vertical runs (up to length 12)

vr-stdev: standard vertical of horizontal runs (up to length 12)

Font Recognition: Illustration of Features

Basic image processing features used are

horizontal projection profile

distribution of horizontal runs (from 1 to 11)

distribution of vertical runs (from 1 to 11)

Font Recognition: decision boundaries on single feature (1)

Some single features are highly discriminant for some font sets

hpd-stdev is discriminating ■ roman and ■ italicfonts

hr-mean is discriminating ■ normaland ■ bold fonts

Font Recognition: decision boundaries on single feature (2)

Other features may partly discriminate font sets

hr-mean can partly discriminate ■ Arial, ■ Courier and ■ Times

Font Recognition: decision boundaries on multiple features (1)

By combining two features, font discrimination is improved

(hpd-stdev, vr-stdev) discriminate ■ roman and ■ italicfonts

vr-stdev

hpd-stdev

Font Recognition: decision boundaries on multiple features (2)

font family discrimination (■ Arial, ■ Courier and ■ Times) becomes possible by combining several couple of features

Bayesian Decision Theory

Bayesian decision makes the assumption that all information contributing to the decision can be stated in form of probabilities

P(i): the a priori probability (or prior) of each class

p(x|i): the class conditional density function of the feature vector x, also called likelihood of the class i with respect to x

The goal is to determine the class i, for which the a posteriori probability (or posterior) P(i|x) is the highest

Bayesian Rule

The Bayes rule allows to calculate the a posteriori probability of each class, as a function of priors and likelihoods

where

p(x) is called evidence and can be considered as a normalization factor, i.e.,

Influence of Posterior Probabilities

Example with a single feature: posterior probabilities in two different cases regarding a priori probabilities

2

1

2

1

P(1)=0.5, P(2)=0.5

P(1)=0.1, P(2)=0.9

Probability of Error

Given a feature x of a given sample, the probability of error for a decision (x)=i is equal to

The probability of error is given by

Optimal Decision Boundaries

The minimal error is obtained by the decision (x)=i with

Decision Theory

In the simplest case a decision consist in assigning to an observation x a class label i = x

A natural extension consists in adding a “rejection class” R so that xR

In the most general case, the decision results in an action i = x

Optimal Decision Theory

Let us consider a loss function ij defining the loss incurred by taking action i when the true state of nature is j ; usually

The risk of taking an action i for a particular sample x is

The optimal decision consists in choosing i that minimizes the risk

Optimal decision

When ii = 0 and ij = 1 j ≠ i , the optimal decision consists of minimizing the probability of error

The minimal error is obtained by the decision (x)=i with

or equivalently

In the case when all a priori probabilities are equivalent

Minimum Risk for Two Classes

Let ijij be the loss of action i when the true state is j

The conditional risks of each decision is expressed as

Then, the optimal decision rule becomes

or equivalently

And in the case of 11 22

Discriminant Functions

In the case of multiple classes a pattern classifier can be specified by a set of discriminant functions gi(x) such that the decision i corresponds to

Thus, a Bayesian classifier is naturally represented by

The choice of discriminant functions is not unique

gi(x) can be replaced by f(gi(x)) for any monotonic increasing function f(x)

A minimum error-rate classifier can be obtained with

Bayesian Rule in Higher Dimensions

The Bayesian rule can easily be generalized to the multidimensional case, where features are represented by a vector x.

where

Conclusion about Bayesian Decision

Bayesian decision theory provides a theoretical framework for statistical pattern recognition

This theory supposes the following probabilistic information to be known:

the number of classes

a priori probabilities of each class

class dependent feature distributions for each class

The remaining problem is: how to estimate all these things

feature distributions are hard to be estimated

priors are seldom known

even the number of classes is not always given

Performance Evaluation

Performance evaluation is a very important issue of pattern recognition

it gives an objective measure of the performance

it allows to compare different methods

Performance evaluation requires correctly labeled test data

test data should be different from training data

a strategy consists in cyclically using 80% of the data for training, and the remaining 20% for evaluation

Performance Measures: Recognition / Error Rates

Performance evaluation uses several measures

recognition rate corresponds to the ratio number of correct answers / number of total answers

error rate corresponds to the ratio number of incorrect answers / number of total answers

rejection rate corresponds to the ratio number of rejections / number of total answers

recognition rate = 1 – (rejection rate + error rate)

Performance Measures: Recall & Precision

On binary decisions (a sample belongs to the class or not) two other measurements are frequently used

recall corresponds to the ratio of correctly assigned samples to the size of the class

precision corresponds to the ratio of correctly assigned samples to the number of assigned samples

Recall and precision are changing in opposite directions

equal error rate is sometimes considered to be the best trade-off

Additionally, the harmonic mean of precision and recall, calledF-measure is frequently used

Download Presentation

Connecting to Server..