1 / 23

Pattern Recognition Project

Pattern Recognition Project. Contents. Data set Iris (Fisher’s data) Riply’s data set Hand-written numerals Classifier – (MATLAB CODE) Bayesian SVM K -nearest neighbor. Fisher’s Iris Plants Database.

dreama
Download Presentation

Pattern Recognition Project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Recognition Project 1

  2. Contents • Data set • Iris (Fisher’s data) • Riply’s data set • Hand-written numerals • Classifier – (MATLAB CODE) • Bayesian • SVM • K-nearest neighbor

  3. Fisher’s Iris Plants Database • The iris data published by Fisher (1936) have been widely used for examples in discriminant analysis and cluster analysis. • The sepal length, sepal width, petal length, and petal width are measured in centimeters on fifty iris specimens from each of three species, Iris setosa, I. versicolor, and I. virginica. • Download the package from http://chien.csie.ncku.edu.tw/web/course/iris_svm.rar

  4. Attribute Information: 1. sepal length 2. sepal width 3. petal length 4. petal width 5. class: -- Iris Setosa = 1 -- Iris Versicolour = 2 -- Iris Virginica = 3

  5. Examples

  6. Example: SVM (MATLAB Tool) %Load the sample data, which includes Fisher's iris data of 5 measurements on a sample of 150 irises. load fisheriris %Create data, a two-column matrix containing sepal length and sepal width measurements for 150 irises. data = [meas(:,1), meas(:,2)]; %From the species vector, create a new column vector, groups, to classify data into two groups: Setosa and non-Setosa. groups = ismember(species,'setosa');

  7. %Randomly select training and test sets. [train, test] = crossvalind('holdOut',groups); cp = classperf(groups); %Train an SVM classifier using a linear kernel function and plot the grouped data. svmStruct = svmtrain(data(train,:),groups(train),'showplot',true); title(sprintf('Kernel Function:’%s',func2str(svmStruct.KernelFunction)),'interpreter','none');

  8. %Use the svmclassify function to classify the test set. classes = svmclassify(svmStruct,data(test,:),'showplot',true); %Evaluate the performance of the classifier. classperf(cp,classes,test); cp.CorrectRate

  9. Riply’s data set • The well-known Ripley dataset problem consists of two classes where the data for each class have been generated by a mixture of two Gaussian distributions. • This has two real-valued co-ordinates (xs and ys) and a class (yc) which is 0 or 1. • riply.tra: has 250 rows of the training set • riply.tes: has 1000 rows of the test set • Download the package from http://chien.csie.ncku.edu.tw/web/course/stprtool.rar

  10. Example

  11. Evaluated on the testing Example: Bayesian classifier % load input training data trn = load('riply_trn'); inx1 = find(trn.y==1); inx2 = find(trn.y==2); % Estimation of class-conditional distributions by EM bayes_model.Pclass{1} = emgmm(trn.X(:,inx1),struct('ncomp',2)); bayes_model.Pclass{2} = emgmm(trn.X(:,inx2),struct('ncomp',2)); % Estimation of priors n1 = length(inx1); n2 = length(inx2); bayes_model.Prior = [n1 n2]/(n1+n2); % Evaluation on testing data tst = load('riply_tst'); ypred = bayescls(tst.X,bayes_model); cerror(ypred,tst.y)

  12. Example: Binary SVM trn = load('riply_trn'); % load training data options.ker = 'rbf'; % use RBF kernel options.arg = 1; % kernel argument options.C = 10; % regularization constant % train SVM classifier model = smo(trn,options); % visualization figure; ppatterns(trn); psvm(model); tst = load('riply_tst'); % load testing data ypred = svmclass(tst.X,model); % classify data cerror(ypred,tst.y) % compute error

  13. Example: K-nearest neighbor classifier • % load training data and setup 8-NN rule • trn = load('riply_trn'); • model = knnrule(trn,8); • % visualize decision boundary and training data • figure; • ppatterns(trn); • pboundary(model); • % evaluate classifier • tst = load('riply_tst'); • ypred = knnclass(tst.X,model); • cerror(ypred,tst.y)

  14. Hand-written numerals • Pen-Based Recognition of Handwritten Digits. • Examples of numerals collected from 44 different persons. • The samples written by 30 writers are used for training, cross-validation and writer dependent testing. • The digits written by the other 14 are used for writer independent testing. • Each person drew 250 examples of each of numerals from ’0’ to ’9’.

  15. Number of Instances • pendigits.txt 10992 • pendigits.tra Training 7494 • pendigits.tes Testing 3498 • Number of Attributes • 16 input+1 class attribute • For Each Attribute: • All input attributes are integers in the range 0..100 • The last attribute is the class code 0..9

  16. Number of example

  17. Example 47,100, 27, 81, 57, 37, 26, 0, 0, 23, 56, 53,100, 90, 40, 98, 8 0, 89, 27,100, 42, 75, 29, 45, 15, 15, 37, 0, 69, 2,100, 6, 2 0, 57, 31, 68, 72, 90,100,100, 76, 75, 50, 51, 28, 25, 16, 0, 1 0,100, 7, 92, 5, 68, 19, 45, 86, 34,100, 45, 74, 23, 67, 0, 4 0, 67, 49, 83,100,100, 81, 80, 60, 60, 40, 40, 33, 20, 47, 0, 1 100,100, 88, 99, 49, 74, 17, 47, 0, 16, 37, 0, 73, 16, 20, 20, 6 0,100, 3, 72, 26, 35, 85, 35,100, 71, 73, 97, 65, 49, 66, 0, 4 0, 39, 2, 62, 11, 5, 63, 0,100, 43, 89, 99, 36,100, 0, 57, 0 13, 89, 12, 50, 72, 38, 56, 0, 4, 17, 0, 61, 32, 94,100,100, 5 57,100, 22, 72, 0, 31, 25, 0, 75, 13,100, 50, 75, 87, 26, 85, 0 74, 87, 31,100, 0, 69, 62, 64,100, 79,100, 38, 84, 0, 18, 1, 9 48, 96, 62, 65, 88, 27, 21, 0, 21, 33, 79, 67,100,100, 0, 85, 8 100,100, 72, 99, 36, 78, 34, 54, 79, 47, 64, 13, 19, 0, 0, 2, 5

  18. Installation for MATLAB code • Install MATLAB in your machine. • Download the package from http://chien.csie.ncku.edu.tw/web/course/MATLABArsenal.rar • Unzip the .zip files into a arbitrary directory, say $MATLABArsenalRoot • Add the path $MATLABArsenalRoot and its subfolders in MATLAB. Use addpath command or menu File->Set Path.

  19. How to use classifiers test_classify('classify -t input_file [general_option] [-- EvaluationMethod [evaluation_options]] ... [-- ClassifierWrapper [param] ] -- BaseClassifier [param] ); Example 1 test_classify('classify -t pendigits.txt -sf 1 -- LibSVM -Kernel 0 -CostFactor 3'); Prec:0.979803, Rec:0.979803, Err:0.020197 566 0 10 0 1 0 0 2 0 1 0 547 0 0 0 1 0 0 22 0 10 0 565 1 0 0 0 1 0 0 2 0 0 534 0 4 0 0 0 1 0 0 0 1 557 0 0 0 0 0 0 0 0 1 0 514 1 0 12 3 0 0 0 0 0 0 543 0 1 0 4 0 0 1 1 0 0 562 0 2 0 10 0 0 0 5 0 0 484 1 0 2 0 1 0 8 0 1 0 513 Classify pengigit.txt Shuffle the data before classfication ('-sf 1')50%-50% train-test split (default)Linear Kernel Support Vector Machine

  20. Example 2 Classify pendigits.txt Training the model using pendigits.traLinear Kernel Support Vector Machine test_classify(strcat('classify -t pendigits.tra -- Train_Only -m pendigits.libSVM.model -- LibSVM -Kernel 0 -CostFactor 3')); Error = 0.009608 Classify pendigits.txt Testing the new data for pendigits.tes using pendigits.libSVM.model Linear Kernel Support Vector Machine test_classify(strcat('classify -t pendigits.tes -- Test_Only -m pendigits.libSVM.model -- LibSVM -Kernel 0 -CostFactor 3')); Error = 0.069754

  21. Example 3 Classify pendigits.txt Do not shuffle the dataUse first 7494 data as training, the rest as testing Apply a multi-class classification wrapper RBF Kernel SVM_LIGHT Support Vector Machine test_classify('classify -t pendigits.txt -sf 0 -- train_test_validate -t 7494 -- train_test_multiple_class -- SVM_LIGHT -Kernel 2 -KernelParam 0.01 -CostFactor 3'); Error = 0.047170

More Related