Spoken dialog systems and voice xml intro to pattern recognition
Download
1 / 95

- PowerPoint PPT Presentation


  • 184 Views
  • Uploaded on

Spoken Dialog Systems and Voice XML : Intro to Pattern Recognition. Esther Levin Dept of Computer Science CCNY. Some materials used in this course were taken from the textbook “Pattern Classification” by Duda et al., John Wiley & Sons, 2001

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - pascal


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Spoken dialog systems and voice xml intro to pattern recognition l.jpg

Spoken Dialog Systems and Voice XML :Intro to Pattern Recognition

Esther Levin

Dept of Computer Science

CCNY

Some materials used in this course were taken from the textbook “Pattern Classification” by Duda et al., John Wiley & Sons, 2001

with the permission of the authors and the publisher


Credits and acknowledgments l.jpg
Credits and Acknowledgments

  • Materials used in this course were taken from the textbook “Pattern Classification” by Duda et al., John Wiley & Sons, 2001 with the permission of the authors and the publisher; and also from

  • Other material on the web:

    • Dr. A. Aydin Atalan, Middle East Technical University, Turkey

    • Dr. Djamel Bouchaffra, Oakland University

    • Dr. Adam Krzyzak, Concordia University

    • Dr. Joseph Picone, Mississippi State University

    • Dr. Robi Polikar, Rowan University

    • Dr. Stefan A. Robila, University of New Orleans

    • Dr. Sargur N. Srihari, State University of New York at Buffalo

    • David G. Stork, Stanford University

    • Dr. Godfried Toussaint, McGill University

    • Dr. Chris Wyatt, Virginia Tech

    • Dr. Alan L. Yuille, University of California, Los Angeles

    • Dr. Song-Chun Zhu, University of California, Los Angeles


Outline l.jpg
Outline

  • Introduction

    • What is this pattern recogntiion

  • Background Material

    • Probability theory


Pattern recognition areas l.jpg
PATTERN RECOGNITION AREAS

  • Optical Character Recognition ( OCR)

    • Sorting letters by postal code.

    • Reconstructing text from printed materials (such as reading machines for blind people).

  • Analysis and identification of human patterns

    • Speech and voice recognition.

    • Finger prints and DNA mapping.

  • Banking and insurance applications

    • Credit cards applicants classified by income, credit worthiness, mortgage amount, # of dependents, etc.

    • Car insurance (pattern including make of car, #of accidents, age, sex, driving habits, location, etc).

  • Diagnosis systems

    • Medical diagnosis (disease vs. symptoms classification, X-Ray, EKG and tests analysis, etc).

    • Diagnosis of automotive malfunctioning

  • Prediction systems

    • Weather forecasting (based on satellite data).

    • Analysis of seismic patterns

  • Dating services (where pattern includes age, sex, race, hobbies, income, etc).


More pattern recognition applications l.jpg

SENSORY

Vision

Face/Handwriting/Hand

Speech

Speaker/Speech

Olfaction

Apple Ripe?

DATA

Text Categorization

Information Retrieval

Data Mining

Genome Sequence Matching

More Pattern Recognition Applications


What is a pattern l.jpg
What is a pattern?

“A pattern is the opposite of a chaos; it is an entity vaguely defined, that could be given a name.”


Pr definitions l.jpg
PR Definitions

  • Theory, Algorithms, Systems to Put Patterns into Categories

  • Classification of Noisy or Complex Data

  • Relate Perceived Pattern to Previously Perceived Patterns


Characters l.jpg
Characters

A v t u I h D U w K

Ç ş ğ İ ü Ü Ö Ğ

ع٤٧چك

КЦД

ζωΨΩξθ

נדתשםא



Terminology l.jpg
Terminology

  • Features, feature vector

  • Decision boundary

  • Error

  • Cost of error

  • Generalization


A fishy example i l.jpg
A Fishy Example I

  • “Sorting incoming Fish on a conveyor according to species using optical sensing”

  • Salmon or Sea Bass?


Slide14 l.jpg

  • Problem Analysis

    • Set up a camera and take some sample images to extract features

      • Length

      • Lightness

      • Width

      • Number and shape of fins

      • Position of the mouth, etc…

        This is the set of all suggested features to explore for use in our classifier!


Solution by stages l.jpg
Solution by Stages

  • Preprocess raw data from camera

  • Segment isolated fish

  • Extract features from each fish (length,width, brightness, etc.)

  • Classify each fish


Slide16 l.jpg

  • Preprocessing

    • Use a segmentation operation to isolate fishes from one another and from the background

  • Information from a single fish is sent to a feature extractor whose purpose is to reduce the data by measuring certain features

  • The features are passed to a classifier

2


Slide17 l.jpg

2


Slide18 l.jpg

  • Classification

    Select the length of the fish as a possible feature for discrimination

2


Slide19 l.jpg

2


Slide20 l.jpg

The length is a poor feature alone!

Select the lightness as a possible feature.

2


Slide21 l.jpg

2


Customers do not want sea bass in their cans of salmon l.jpg
Customers do not want sea bass in their cans of salmon”

  • Threshold decision boundary and cost relationship

  • Move our decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!)

    Task of decision theory

2


Slide23 l.jpg

Lightness

Width

2


Slide24 l.jpg

2


Slide25 l.jpg

  • We might add other features that are not correlated with the ones we already have. A precaution should be taken not to reduce the performance by adding such “noisy features”

  • Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure:

2


Slide26 l.jpg

2 ones we already have. A precaution should be taken not to reduce the performance by adding such


Slide27 l.jpg

2


Slide28 l.jpg

2 aim of designing a classifier is to correctly classify novel input


Decision boundaries l.jpg
Decision Boundaries aim of designing a classifier is to correctly classify novel input

Observe: Can do much better with two features

Caveat: overfitting!


Occam s razor l.jpg
Occam aim of designing a classifier is to correctly classify novel input ’s Razor

Entities are not to be multiplied without necessity

William of Occam

(1284-1347)


A complete pr system l.jpg
A Complete PR System aim of designing a classifier is to correctly classify novel input


Problem formulation l.jpg
Problem Formulation aim of designing a classifier is to correctly classify novel input

Input

object

Class

Label

Measurements

&

Preprocessing

Features

Classification

  • Basic ingredients:

  • Measurement space (e.g., image intensity, pressure)

  • Features (e.g., corners, spectral energy)

  • Classifier - soft and hard

  • Decision boundary

  • Training sample

  • Probability of error


Pattern recognition systems l.jpg
Pattern Recognition Systems aim of designing a classifier is to correctly classify novel input

  • Sensing

    • Use of a transducer (camera or microphone)

    • PR system depends of the bandwidth, the resolution, sensitivity, distortion of the transducer

  • Segmentation and grouping

    • Patterns should be well separated and should not overlap

3


Slide34 l.jpg

3 aim of designing a classifier is to correctly classify novel input


Slide35 l.jpg

  • Feature extraction aim of designing a classifier is to correctly classify novel input

    • Discriminative features

    • Invariant features with respect to translation, rotation and scale.

  • Classification

    • Use a feature vector provided by a feature extractor to assign the object to a category

  • Post Processing

    • Exploit context dependent information other than from the target pattern itself to improve performance


The design cycle l.jpg
The Design Cycle aim of designing a classifier is to correctly classify novel input

  • Data collection

  • Feature Choice

  • Model Choice

  • Training

  • Evaluation

  • Computational Complexity

4


Slide37 l.jpg

4 aim of designing a classifier is to correctly classify novel input


Slide38 l.jpg

  • Data Collection aim of designing a classifier is to correctly classify novel input

    How do we know when we have collected an adequately large and representative set of examples for training and testing the system?

4


Slide39 l.jpg

  • Feature Choice aim of designing a classifier is to correctly classify novel input

    Depends on the characteristics of the problem domain. Simple to extract, invariant to irrelevant transformation insensitive to noise.

4


Slide40 l.jpg

  • Model Choice aim of designing a classifier is to correctly classify novel input

    Unsatisfied with the performance of our linear fish classifier and want to jump to another class of model

4


Slide41 l.jpg

  • Training aim of designing a classifier is to correctly classify novel input

    Use data to determine the classifier. Many different procedures for training classifiers and choosing models

4


Slide42 l.jpg

  • Evaluation aim of designing a classifier is to correctly classify novel input

    Measure the error rate (or performance) and switch from one set of features & models to another one.

4


Slide43 l.jpg

  • Computational Complexity aim of designing a classifier is to correctly classify novel input

    What is the trade off between computational ease and performance?

    (How an algorithm scales as a function of the number of features, number or training examples, number patterns or categories?)

4


Learning and adaptation l.jpg
Learning and Adaptation aim of designing a classifier is to correctly classify novel input

  • Learning: Any method that combines empirical information from the environment with prior knowledge into the design of a classifier, attempting to improve performance with time.

  • Empirical information: Usually in the form of training examples.

  • Prior knowledge: Invariances, correlations

  • Supervised learning

    • A teacher provides a category label or cost for each pattern in the training set

  • Unsupervised learning

    • The system forms clusters or “natural groupings” of the input patterns

5


Syntactic versus statistical pr l.jpg
Syntactic Versus Statistical PR aim of designing a classifier is to correctly classify novel input

  • Basic assumption: There is an underlying regularity behind the observed phenomena.

  • Question: Based on noisy observations, what is the underlying regularity?

  • Syntactic: Structure through common generative mechanism. For example, all different manifestations of English, share a common underlying set of grammatical rules.

  • Statistical: Objects characterized through statistical similarity. For example, all possible digits `2' share some common underlying statistical relationship.


Difficulties l.jpg
Difficulties aim of designing a classifier is to correctly classify novel input

  • Segmentation

  • Context

  • Temporal structure

  • Missing features

  • Aberrant data

  • Noise

Do all these images represent an `A'?


Design cycle l.jpg
Design Cycle aim of designing a classifier is to correctly classify novel input

How do we know what features to select, and how do we select them…?

What type of classifier shall we use. Is there best classifier…?

How do we train…?

How do we combine prior knowledge with

empirical data?

How do we evaluate our performance

Validate the results. Confidence in decision?


Conclusion l.jpg
Conclusion aim of designing a classifier is to correctly classify novel input

  • I expect you are overwhelmed by the number, complexity and magnitude of the sub-problems of Pattern Recognition

  • Many of these sub-problems can indeed be solved

  • Many fascinating unsolved problems still remain

6


Toolkit for pr l.jpg
Toolkit for PR aim of designing a classifier is to correctly classify novel input

  • Statistics

  • Decision Theory

  • Optimization

  • Signal Processing

  • Neural Networks

  • Fuzzy Logic

  • Decision Trees

  • Clustering

  • Genetic Algorithms

  • AI Search

  • Formal Grammars

  • ….


Linear algebra l.jpg
Linear algebra aim of designing a classifier is to correctly classify novel input

  • Matrix A:

  • Matrix Transpose

  • Vector a


Matrix and vector multiplication l.jpg
Matrix and vector multiplication aim of designing a classifier is to correctly classify novel input

  • Matrix multiplication

  • Outer vector product

  • Vector-matrix product


Inner product l.jpg
Inner Product aim of designing a classifier is to correctly classify novel input

  • Inner (dot) product:

  • Length (Eucledian norm) of a vector

  • a is normalized iff ||a|| = 1

  • The angle between two n-dimesional vectors

  • An inner product is a measure of collinearity:

    • a and b are orthogonal iff

    • a and b are collinear iff

  • A set of vectors is linearly independent if no vector is a linear combination of other vectors.


Determinant and trace l.jpg
Determinant and Trace aim of designing a classifier is to correctly classify novel input

  • Determinant

  • det(AB)= det(A)det(B)

  • Trace


Matrix inversion l.jpg
Matrix Inversion aim of designing a classifier is to correctly classify novel input

  • A (n x n) is nonsingular if there exists B

  • A=[2 3; 2 2], B=[-1 3/2; 1 -1]

  • A is nonsingular iff

  • Pseudo-inverse for a non square matrix, provided

    is not singular


Eigenvectors and eigenvalues l.jpg
Eigenvectors and Eigenvalues aim of designing a classifier is to correctly classify novel input

Characteristic equation:

n-th order polynomial, with n roots.


Probability theory l.jpg
Probability Theory aim of designing a classifier is to correctly classify novel input

  • Primary references:

    • Any Probability and Statistics text book (Papoulis)

    • Appendix A.4 in “Pattern Classification” by Duda et al

      The principles of probability theory, describing the behavior of systems with random characteristics, are of fundamental importance to pattern recognition.


Example 1 wikipedia l.jpg
Example 1 ( wikipedia) aim of designing a classifier is to correctly classify novel input

  • two bowls full of cookies.

    • Bowl #1 has 10 chocolate chip cookies and 30 plain cookies,

    • bowl #2 has 20 of each.

  • Fred picks a bowl at random, and then picks a cookie at random.

    • The cookie turns out to be a plain one.

  • How probable is it that Fred picked it out of bowl

  • what’s the probability that Fred picked bowl #1, given that he has a plain cookie?”

    • event A is that Fred picked bowl #1,

    • event B is that Fred picked a plain cookie.

    • Pr(A|B) ?


Example1 cpntinued l.jpg
Example1 - cpntinued aim of designing a classifier is to correctly classify novel input

Tables of occurrences and relative frequencies

It is often helpful when calculating conditional probabilities to create a simple table containing the number of occurrences of each outcome, or the relative frequencies of each outcome, for each of the independent variables. The tables below illustrate the use of this method for the cookies.

The table on the right is derived from the table on the left by dividing each entry by the total number of cookies under consideration, or 80 cookies.


Example 2 l.jpg
Example 2 aim of designing a classifier is to correctly classify novel input

  • 1. Power Plant Operation.

    • The variables X, Y, Z describe the state of 3 power plants (X=0 means plant X is idle).

    • Denote by A an event that a plant X is idle, and by B an event that 2 out of three plants are working.

    • What’s P(A) and P(A|B), the probability that X is idle given that at least 2 out of three are working?


Slide77 l.jpg


Slide78 l.jpg

2. 0.07+0.04 +0.03 +0.18 =0.32Cars are assembled in four possible locations. Plant I supplies 20% of the cars; plant II, 24%; plant III, 25%; and plant IV, 31%. There is 1 year warrantee on every car.

The company collected data that shows

P(claim| plant I) = 0.05; P(claim|Plant II)=0.11;

P(claim|plant III) = 0.03; P(claim|Plant IV)=0.18;

Cars are sold at random.

An owned just submitted a claim for her car. What are the posterior probabilities that this car was made in plant I, II, III and IV?


Slide79 l.jpg

  • P(claim) = P(claim|plant I)P(plant I) + 0.07+0.04 +0.03 +0.18 =0.32

    P(claim|plant II)P(plant II) +

    P(claim|plant III)P(plant III) +

    P(claim|plant IV)P(plant IV) =0.0687

  • P(plant1|claim) =

    = P(claim|plant I) * P(plant I)/P(claim) = 0.146

  • P(plantII|claim) =

    = P(claim|plant II) * P(plant II)/P(claim) = 0.384

  • P(plantIII|claim) =

    = P(claim|plant III) * P(plant III)/P(claim) = 0.109

  • P(plantIV|claim) =

    = P(claim|plant IV) * P(plant IV)/P(claim) = 0.361


Example 3 l.jpg

3. It is known that 1% of population suffers from a particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease.

a. What is the probability that a random person has a positive blood test.

b. If a blood test is positive, what’s the probability that the person has the disease?

c. If a blood test is negative, what’s the probability that the person does not have the disease?

Example 3


Slide81 l.jpg

  • A is the event that a person has a disease. P(A) = 0.01; P(A particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease. ’) = 0.99.

  • B is the event that the test result is positive.

    • P(B|A) = 0.97; P(B’|A) = 0.03;

    • P(B|A’) = 0.06; P(B’|A’) = 0.94;

  • (a) P(B) = P(A) P(B|A) + P(A’)P(B|A’) = 0.01*0.97 +0.99 * 0.06 = 0.0691

  • (b) P(A|B)=P(B|A)*P(A)/P(B) = 0.97* 0.01/0.0691 = 0.1403

  • (c) P(A’|B’) = P(B’|A’)P(A’)/P(B’)= P(B’|A’)P(A’)/(1-P(B))= 0.94*0.99/(1-.0691)=0.9997


Sums of random variables l.jpg
Sums of Random Variables particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease.

  • z = x + y

  • Var(z) = Var(x) + Var(y) + 2Cov(x,y)

  • If x,y independent: Var(z) = Var(x) + Var(y)

  • Distribution of z:


Examples l.jpg
Examples: particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease.

  • x and y are uniform on [0,1]

    • Find p(z=x+y), E(z), Var(z);

  • x is uniform on [-1,1], and P(y)= 0.5 for y =0, y=10; and 0 elsewhere.

    • Find p(z=x+y), E(z), Var(z);


Normal distributions l.jpg
Normal Distributions particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease.

  • Gaussian distribution

  • Mean

  • Variance

  • Central Limit Theorem says sums of random variables tend toward a Normal distribution.

  • Mahalanobis Distance:


Multivariate normal density l.jpg
Multivariate Normal Density particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease.

  • x is a vector of d Gaussian variables

  • Mahalanobis Distance

  • All conditionals and marginals are also Gaussian


Bivariate normal densities l.jpg
Bivariate Normal Densities particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease.

  • Level curves - elliplses.

    • x and y width are determined by the variances, and the eccentricity by correlation coefficient

    • Principal axes are the eigenvectors, and the width in these direction is the root of the corresponding eigenvalue.


Information theory l.jpg
Information theory particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease.

  • Key principles:

    • What is the information contained in a random event?

      • Less probable event contains more information

      • For two independent event, the information is a sum

  • What is the average information or entropyof a distribution?


Slide95 l.jpg

Examples: uniform distribution, dirac distribution; particular disease. A blood test has a 97% chance to identify the disease for a diseased individual, by also has a 6% chance of falsely indicating that a healthy person has a disease.

Mutual information: reduction in uncertainty about one variable due to knowledge of other variable.


ad