Pattern Recognition Project

Pattern Recognition Project. Contents. Data set Iris (Fisher’s data) Riply’s data set Hand-written numerals Classifier – (MATLAB CODE) Bayesian SVM K -nearest neighbor. Fisher’s Iris Plants Database.

Pattern Recognition Project

### Pattern Recognition Project

• Data set

• Iris (Fisher’s data)

• Riply’s data set

• Hand-written numerals

• Classifier – (MATLAB CODE)

• Bayesian

• SVM

• K-nearest neighbor

• The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis.

• The sepal length, sepal width, petal length, and petal width are measured in centimeters on fifty iris specimens from each of three species, Iris setosa, I. versicolor, and I. virginica.

• Attribute Information:

1. sepal length

2. sepal width

3. petal length

4. petal width

5. class:

-- Iris Setosa = 1

-- Iris Versicolour = 2

-- Iris Virginica = 3

Example: SVM (MATLAB Tool)

%Load the sample data, which includes Fisher's iris data of 5 measurements on a sample of 150 irises.

%Create data, a two-column matrix containing sepal length and sepal width measurements for 150 irises.

data = [meas(:,1), meas(:,2)];

%From the species vector, create a new column vector, groups, to classify data into two groups: Setosa and non-Setosa.

groups = ismember(species,'setosa');

[train, test] = crossvalind('holdOut',groups); cp = classperf(groups);

%Train an SVM classifier using a linear kernel function and plot the grouped data.

svmStruct = svmtrain(data(train,:),groups(train),'showplot',true);

title(sprintf('Kernel Function:’%s',func2str(svmStruct.KernelFunction)),'interpreter','none');

%Use the svmclassify function to classify the test set.

classes = svmclassify(svmStruct,data(test,:),'showplot',true);

%Evaluate the performance of the classifier.

classperf(cp,classes,test); cp.CorrectRate

• The well-known Ripley dataset problem consists of two classes where the data for each class have been generated by a mixture of two Gaussian distributions.

• This has two real-valued co-ordinates (xs and ys) and a class (yc) which is 0 or 1.

• riply.tra: has 250 rows of the training set

• riply.tes: has 1000 rows of the test set

Example: Bayesian classifier

inx1 = find(trn.y==1);

inx2 = find(trn.y==2);

% Estimation of class-conditional distributions by EM

bayes_model.Pclass{1} = emgmm(trn.X(:,inx1),struct('ncomp',2));

bayes_model.Pclass{2} = emgmm(trn.X(:,inx2),struct('ncomp',2));

% Estimation of priors

n1 = length(inx1); n2 = length(inx2);

bayes_model.Prior = [n1 n2]/(n1+n2);

% Evaluation on testing data

ypred = bayescls(tst.X,bayes_model);

cerror(ypred,tst.y)

Example: Binary SVM

options.ker = 'rbf'; % use RBF kernel

options.arg = 1; % kernel argument

options.C = 10; % regularization constant

% train SVM classifier

model = smo(trn,options);

% visualization figure;

ppatterns(trn); psvm(model);

ypred = svmclass(tst.X,model); % classify data

cerror(ypred,tst.y) % compute error

Example: K-nearest neighbor classifier

• % load training data and setup 8-NN rule

• model = knnrule(trn,8);

• % visualize decision boundary and training data

• figure;

• ppatterns(trn);

• pboundary(model);

• % evaluate classifier

• ypred = knnclass(tst.X,model);

• cerror(ypred,tst.y)

• Pen-Based Recognition of Handwritten Digits.

• Examples of numerals collected from 44 different persons.

• The samples written by 30 writers are used for training, cross-validation and writer dependent testing.

• The digits written by the other 14 are used for writer independent testing.

• Each person drew 250 examples of each of numerals from ’0’ to ’9’.

• Number of Instances

• pendigits.txt 10992

• pendigits.tra Training 7494

• pendigits.tes Testing 3498

• Number of Attributes

• 16 input+1 class attribute

• For Each Attribute:

• All input attributes are integers in the range 0..100

• The last attribute is the class code 0..9

47,100, 27, 81, 57, 37, 26, 0, 0, 23, 56, 53,100, 90, 40, 98, 8

0, 89, 27,100, 42, 75, 29, 45, 15, 15, 37, 0, 69, 2,100, 6, 2

0, 57, 31, 68, 72, 90,100,100, 76, 75, 50, 51, 28, 25, 16, 0, 1

0,100, 7, 92, 5, 68, 19, 45, 86, 34,100, 45, 74, 23, 67, 0, 4

0, 67, 49, 83,100,100, 81, 80, 60, 60, 40, 40, 33, 20, 47, 0, 1

100,100, 88, 99, 49, 74, 17, 47, 0, 16, 37, 0, 73, 16, 20, 20, 6

0,100, 3, 72, 26, 35, 85, 35,100, 71, 73, 97, 65, 49, 66, 0, 4

0, 39, 2, 62, 11, 5, 63, 0,100, 43, 89, 99, 36,100, 0, 57, 0

13, 89, 12, 50, 72, 38, 56, 0, 4, 17, 0, 61, 32, 94,100,100, 5

57,100, 22, 72, 0, 31, 25, 0, 75, 13,100, 50, 75, 87, 26, 85, 0

74, 87, 31,100, 0, 69, 62, 64,100, 79,100, 38, 84, 0, 18, 1, 9

48, 96, 62, 65, 88, 27, 21, 0, 21, 33, 79, 67,100,100, 0, 85, 8

100,100, 72, 99, 36, 78, 34, 54, 79, 47, 64, 13, 19, 0, 0, 2, 5

• Install MATLAB in your machine.

• Unzip the .zip files into a arbitrary directory, say \$MATLABArsenalRoot

• Add the path \$MATLABArsenalRoot and its subfolders in MATLAB. Use addpath command or menu File->Set Path.

test_classify('classify -t input_file [general_option] [-- EvaluationMethod [evaluation_options]] ... [-- ClassifierWrapper [param] ] -- BaseClassifier [param] );

Example 1

test_classify('classify -t pendigits.txt -sf 1 -- LibSVM -Kernel 0 -CostFactor 3');

Prec:0.979803, Rec:0.979803, Err:0.020197

566 0 10 0 1 0 0 2 0 1

0 547 0 0 0 1 0 0 22 0

10 0 565 1 0 0 0 1 0 0

2 0 0 534 0 4 0 0 0 1

0 0 0 1 557 0 0 0 0 0

0 0 0 1 0 514 1 0 12 3

0 0 0 0 0 0 543 0 1 0

4 0 0 1 1 0 0 562 0 2

0 10 0 0 0 5 0 0 484 1

0 2 0 1 0 8 0 1 0 513

Classify pengigit.txt Shuffle the data before classfication

('-sf 1')50%-50% train-test split (default)Linear Kernel Support Vector Machine

Classify pendigits.txt Training the model using pendigits.traLinear Kernel Support Vector Machine

test_classify(strcat('classify -t pendigits.tra -- Train_Only -m pendigits.libSVM.model -- LibSVM -Kernel 0 -CostFactor 3'));

Error = 0.009608

Classify pendigits.txt Testing the new data for pendigits.tes using pendigits.libSVM.model

Linear Kernel Support Vector Machine

test_classify(strcat('classify -t pendigits.tes -- Test_Only -m pendigits.libSVM.model -- LibSVM -Kernel 0 -CostFactor 3'));

Error = 0.069754

Classify pendigits.txt Do not shuffle the dataUse first 7494 data as training, the rest as testing Apply a multi-class classification wrapper RBF Kernel SVM_LIGHT Support Vector Machine

test_classify('classify -t pendigits.txt -sf 0 -- train_test_validate -t 7494

-- train_test_multiple_class -- SVM_LIGHT -Kernel 2 -KernelParam 0.01 -CostFactor 3');

Error = 0.047170