MKL for Category Recognition

MKL forCategory Recognition Kumar Srijan Syed Ahsan Ishtiaque

Dataset • 19 categories considered • Currently • Minimum of 58 images in each • Average of 101 images • The images have been taken from Google Images. • Has been supplemented by images from Flickr. http://images.google.com and http://flickr.com

http://images.google.com

Code Walkthrough – Relevant Files • preprocCal101.sh - Rescales images and renames them according to the code • cal_preprocDatabases.m - Builds Image and ROI database • cal_preprocVocabularies.m– Prepares the visual word vocabularies • cal_preprocFeatures.m – computes features for all the images, project them onto visual words and build map files for each • cal_preprocHistograms.m – prepares Histograms for the visual words. • cal_preprocKernels.m – computes training and testing kernel matrices. • cal_classAll.m – final classification

Code Walkthrough Calculating Local Descriptors Preparing Database ( Separate Training and Testing Images, Define Region of Interest and add Jitters to it ) cal_preprocDatabeses computes the features for all the images, projects them on visual words, and produce map files for each. cal_preprocFeatures Compute and quantize descriptors for training and test images bk_calcFeatures Calculate Image Descriptors bk_calcFeatures Construct Visual Words cal_preprocVocabularies Run on all categories cal_classAll Train SVM with MKL One vs. Rest Classifiers bk_trainAppModel Vector quantization bk_calcVocabulary Evaluate SVM on test data bk_testAppModel Compute Training and Testing Kernel Matrices cal_preprocKernels Prepare the visual words Histogram cal_preprocHistograms

Documentation for modifications and adjustment of parameters for code execution

Changing the number of Training and testing images • Default value is 15. • Change drivers/cal_filenames.txt accordingly – this file contains the name of images for each category to be processed as training or testing images

Changing the number of Training images • To change the number of final training images, which includes jittered images • In drivers/cal_conf.m file, change conf.numPos to desired value • To change the number of initial training images (without jitters), which are input to the code • In drivers/cal_preprocDatabases.m – Change this if ni <= 15% Hard Coded Toif ni <= conf.numPos% Changed, change it to your desired valueimdb.images(ii).set = imdb.sets.TRAIN ; else imdb.images(ii).set = imdb.sets.TEST ; end

Changing the number of test images • In drivers/cal_setupTrainTest.m – for cl = fieldnames(roidb.classes)‘ selCla = findRois(testRoidb, 'class', char(cl)) ; Change thiskeep(selCla(1 : min(15, length(selCla)))) = true ; % Hard Coded Tokeep(selCla(1 : min(conf.numPos, length(selCla)))) = true ; % Changed % you can change it to desired value end

Adding a new Feature • In drivers/cal_conf.m • Add that feature name to conf.featNames • Now specify the properties and parameters for that feature like conf.feat.<your_feature_name>.<parameter> • Now add your extractFn, quantizeFn and clusterFn in features directory ( check for input and output format for each )

Parameters • Parameter should include • format – dense or sparse • In the dense format, one stores features on a grid, specifying the x and y pixel coordinate of each column/row of the grid. Then we store an "image" whose pixels correspond to grid elements andspecify corresponding visual words. • In the sparse format, one store a list of visual words and their x, y location in the image. • extractFn - pointer to the function called to extract the feature

Parameters • clusterFn - pointer to the clustering (k-means) function • quantizeFn - pointer to the function used to project onto k-means cluster • vocabSize - k-means vocab. size (number of visual words) • numImagesPerClass - number of image per class used to sample features to train the vocabulary with k-means • numFeatsPerImage - number of features per image sampled to train the vocabulary with k-means • compress – “false” generally • pyrLevels - pyramidal levels used when building histogram based on this features

Changing Jitters • Jitters are basic modifications (zooming, flipping and rotating) on an image, in the code they are used to create more training data out of the basic training data, which helps to increase the accuracy. • Current jitters supported are • rp5, rm5, fliplr, fliplr_rp5, fliplr_rm5, zm1, zm2 – these all are modifications of zoom, rotate and flip only. • For changing the jitters to be used, in drivers/cal_conf.m file – change conf.jitterNames accordingly.

Changing Features • Current features supported are • gb – Sparse Geometric-Blur words • gist • bow – Sparse SIFT words, Bag of Words • phog180, phog360 – Dense edge-based shape • phowColor, phowGray – Dense SIFT words • For changing features to be used, in drivers/cal_conf.m file – change conf.featNames accordingly • For using bow feature, also use cal_preprocDiscrimScores after cal_preprocFeatures step.

Changing the weight learning method • Current learning methods supported are • Manik • equalMean – It means that the weights are set to the inverse of the average of the kernel matrices. It is a simple heuristic whose only purpose is to "balance" the kernels when you combine them additively. • For changing the weight learning method, in drivers/cal_conf.m file – change conf.learnWeightMethod accordingly.

Obtaining Results • Calculate SVM score for the image for all the classes. • The image is assigned the class which has the highest score. • Use this information to create the confusion matrix. • Use confusion matrix to calculate the final accuracy.

Code Execution - I • In the current execution, we have taken 10 classes. • Badge • Bulb • Camera • Cell • Frog • Horse • Keyboard • Kingfisher • Locket • Moon • 15 train + 15 test images were used for the execution of the code

Kernel Matrices echi2_phowGray_L0 echi2_phowGray_L1 echi2_phowGray_L2 el2_gb

Aggregate SVM Scores Highest Lowest Scores Categories Test Images

Confusion matrix

Confusion Matrix Highest Lowest Scores Category Category

Analysis • Overall accuracy is 61%. • Moon and keyboard have very high classification rate – they have relatively lesser intraclass variance. • Cell phone, frog and kingfisher have very low classification rates. • There is appreciable confusion in horse vs. frog and kingfisher vs. frog. These are found in natural surroundings, possibly creating the confusion. • Artificial objects don’t get confused with natural ones very frequently.

Code Execution - II • In this execution, we have taken 19 classes. • badge • bulb • camera • cell • frog • horse • keyboard • kingfisher • locket • moon • owl • photo • piggy • pliers • remote • shirt • shoe • spoon • sunflower • 15 train + 15 test images were used for the execution of the code

Aggregate SVM Scores Highest Lowest Scores Categories Test Images

Confusion Matrix Highest Lowest Scores Category Category

Analysis • Overall accuracy is 50.5% (lesser as compared to 10 category classification). • Moon, keyboard and shirt have very high classification rate. • Cell phone, frog and kingfisher have very low classification rates. • There is appreciable confusion in photo-frame vs. cell phone and kingfisher vs. frog.

Analysis • The classification of blub was good in the 10 category case, but was very bad in the 19 category case. • Similar looking objects(low interclass difference) like camera, cell phone ,remote control and photo frame are more likely to get confused amongst themselves than with other groups.

Code Execution - III • In this execution, we have taken 19 classes. • badge • bulb • camera • cell • frog • horse • keyboard • kingfisher • locket • moon • owl • photo • piggy • pliers • remote • shirt • shoe • spoon • sunflower • 20 train + 15 test images were used for the execution of the code

Aggregate SVM Scores Highest Lowest Scores

MKL for Category Recognition