1 / 64

MKL for Category Recognition

MKL for Category Recognition. Kumar Srijan Syed Ahsan Ishtiaque. Dataset. 19 categories considered Currently Minimum of 58 images in each Average of 101 images The images have been taken from Google Images. Has been supplemented by images from Flickr. Code Walkthrough – Relevant Files.

clem
Download Presentation

MKL for Category Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MKL forCategory Recognition Kumar Srijan Syed Ahsan Ishtiaque

  2. Dataset • 19 categories considered • Currently • Minimum of 58 images in each • Average of 101 images • The images have been taken from Google Images. • Has been supplemented by images from Flickr. http://images.google.com and http://flickr.com

  3. http://images.google.com

  4. http://images.google.com

  5. http://images.google.com

  6. http://images.google.com

  7. http://images.google.com

  8. http://images.google.com

  9. http://images.google.com

  10. http://images.google.com

  11. http://images.google.com

  12. http://images.google.com

  13. http://images.google.com

  14. http://images.google.com

  15. http://images.google.com

  16. http://images.google.com

  17. http://images.google.com

  18. http://images.google.com

  19. http://images.google.com

  20. http://images.google.com

  21. http://images.google.com

  22. Code Walkthrough – Relevant Files • preprocCal101.sh - Rescales images and renames them according to the code • cal_preprocDatabases.m - Builds Image and ROI database • cal_preprocVocabularies.m– Prepares the visual word vocabularies • cal_preprocFeatures.m – computes features for all the images, project them onto visual words and build map files for each • cal_preprocHistograms.m – prepares Histograms for the visual words. • cal_preprocKernels.m – computes training and testing kernel matrices. • cal_classAll.m – final classification

  23. Code Walkthrough Calculating Local Descriptors Preparing Database ( Separate Training and Testing Images, Define Region of Interest and add Jitters to it ) cal_preprocDatabeses computes the features for all the images, projects them on visual words, and produce map files for each. cal_preprocFeatures Compute and quantize descriptors for training and test images bk_calcFeatures Calculate Image Descriptors bk_calcFeatures Construct Visual Words cal_preprocVocabularies Run on all categories cal_classAll Train SVM with MKL One vs. Rest Classifiers bk_trainAppModel Vector quantization bk_calcVocabulary Evaluate SVM on test data bk_testAppModel Compute Training and Testing Kernel Matrices cal_preprocKernels Prepare the visual words Histogram cal_preprocHistograms

  24. Documentation for modifications and adjustment of parameters for code execution

  25. Changing the number of Training and testing images • Default value is 15. • Change drivers/cal_filenames.txt accordingly – this file contains the name of images for each category to be processed as training or testing images

  26. Changing the number of Training images • To change the number of final training images, which includes jittered images • In drivers/cal_conf.m file, change conf.numPos to desired value • To change the number of initial training images (without jitters), which are input to the code • In drivers/cal_preprocDatabases.m – Change this if ni <= 15% Hard Coded Toif ni <= conf.numPos% Changed, change it to your desired valueimdb.images(ii).set = imdb.sets.TRAIN ; else imdb.images(ii).set = imdb.sets.TEST ; end

  27. Changing the number of test images • In drivers/cal_setupTrainTest.m – for cl = fieldnames(roidb.classes)‘ selCla = findRois(testRoidb, 'class', char(cl)) ; Change thiskeep(selCla(1 : min(15, length(selCla)))) = true ; % Hard Coded Tokeep(selCla(1 : min(conf.numPos, length(selCla)))) = true ; % Changed % you can change it to desired value end

  28. Adding a new Feature • In drivers/cal_conf.m • Add that feature name to conf.featNames • Now specify the properties and parameters for that feature like conf.feat.<your_feature_name>.<parameter> • Now add your extractFn, quantizeFn and clusterFn in features directory ( check for input and output format for each )

  29. Parameters • Parameter should include • format – dense or sparse • In the dense format, one stores features on a grid, specifying the x and y pixel coordinate of each column/row of the grid. Then we store an "image" whose pixels correspond to grid elements andspecify corresponding visual words. • In the sparse format, one store a list of visual words and their x, y location in the image. • extractFn - pointer to the function called to extract the feature

  30. Parameters • clusterFn - pointer to the clustering (k-means) function • quantizeFn - pointer to the function used to project onto k-means cluster • vocabSize - k-means vocab. size (number of visual words) • numImagesPerClass - number of image per class used to sample features to train the vocabulary with k-means • numFeatsPerImage - number of features per image sampled to train the vocabulary with k-means • compress – “false” generally • pyrLevels - pyramidal levels used when building histogram based on this features

  31. Changing Jitters • Jitters are basic modifications (zooming, flipping and rotating) on an image, in the code they are used to create more training data out of the basic training data, which helps to increase the accuracy. • Current jitters supported are • rp5, rm5, fliplr, fliplr_rp5, fliplr_rm5, zm1, zm2 – these all are modifications of zoom, rotate and flip only. • For changing the jitters to be used, in drivers/cal_conf.m file – change conf.jitterNames accordingly.

  32. Changing Features • Current features supported are • gb – Sparse Geometric-Blur words • gist • bow – Sparse SIFT words, Bag of Words • phog180, phog360 – Dense edge-based shape • phowColor, phowGray – Dense SIFT words • For changing features to be used, in drivers/cal_conf.m file – change conf.featNames accordingly • For using bow feature, also use cal_preprocDiscrimScores after cal_preprocFeatures step.

  33. Changing the weight learning method • Current learning methods supported are • Manik • equalMean – It means that the weights are set to the inverse of the average of the kernel matrices. It is a simple heuristic whose only purpose is to "balance" the kernels when you combine them additively. • For changing the weight learning method, in drivers/cal_conf.m file – change conf.learnWeightMethod accordingly.

  34. Obtaining Results • Calculate SVM score for the image for all the classes. • The image is assigned the class which has the highest score. • Use this information to create the confusion matrix. • Use confusion matrix to calculate the final accuracy.

  35. Code Execution - I • In the current execution, we have taken 10 classes. • Badge • Bulb • Camera • Cell • Frog • Horse • Keyboard • Kingfisher • Locket • Moon • 15 train + 15 test images were used for the execution of the code

  36. Kernel Matrices echi2_phowGray_L0 echi2_phowGray_L1 echi2_phowGray_L2 el2_gb

  37. Aggregate SVM Scores Highest Lowest Scores Categories Test Images

  38. Confusion matrix

  39. Confusion Matrix Highest Lowest Scores Category Category

  40. Analysis • Overall accuracy is 61%. • Moon and keyboard have very high classification rate – they have relatively lesser intraclass variance. • Cell phone, frog and kingfisher have very low classification rates. • There is appreciable confusion in horse vs. frog and kingfisher vs. frog. These are found in natural surroundings, possibly creating the confusion. • Artificial objects don’t get confused with natural ones very frequently.

  41. Code Execution - II • In this execution, we have taken 19 classes. • badge • bulb • camera • cell • frog • horse • keyboard • kingfisher • locket • moon • owl • photo • piggy • pliers • remote • shirt • shoe • spoon • sunflower • 15 train + 15 test images were used for the execution of the code

  42. Kernel Matrices echi2_phowGray_L0 echi2_phowGray_L1 echi2_phowGray_L2 el2_gb

  43. Aggregate SVM Scores Highest Lowest Scores Categories Test Images

  44. Confusion Matrix Highest Lowest Scores Category Category

  45. Confusion Matrix Highest Lowest Scores Category Category

  46. Analysis • Overall accuracy is 50.5% (lesser as compared to 10 category classification). • Moon, keyboard and shirt have very high classification rate. • Cell phone, frog and kingfisher have very low classification rates. • There is appreciable confusion in photo-frame vs. cell phone and kingfisher vs. frog.

  47. Analysis • The classification of blub was good in the 10 category case, but was very bad in the 19 category case. • Similar looking objects(low interclass difference) like camera, cell phone ,remote control and photo frame are more likely to get confused amongst themselves than with other groups.

  48. Code Execution - III • In this execution, we have taken 19 classes. • badge • bulb • camera • cell • frog • horse • keyboard • kingfisher • locket • moon • owl • photo • piggy • pliers • remote • shirt • shoe • spoon • sunflower • 20 train + 15 test images were used for the execution of the code

  49. Kernel Matrices echi2_phowGray_L0 echi2_phowGray_L1 echi2_phowGray_L2 el2_gb

  50. Aggregate SVM Scores Highest Lowest Scores

More Related