Scalable Image Matching: Implementation of Vocabulary Tree and Inverted Index

Scalable Image Matching Mid Project Report David Strickland ENGN 256 Spring 2013

Reference Paper: Inverted Index Compression for Scalable Image Matching • Chen, D.M.; Tsai, S.S.; Chandrasekhar, V.; Takacs, G.; Vedantham, R.; Grzeszczuk, R.; Girod, B., "Inverted Index Compression for Scalable Image Matching," Data Compression Conference (DCC), 2010 , vol., no., pp.525,525, 24-26 March 2010

Vocabulary Tree + Inverted Index • The Vocabulary Tree is a tree-structured vector quantizer constructed by hierarchical k-means clustering of feature descriptors. • Inverted Index: Each node has two lists • Image IDs • Array of counts 1Image from Chen et al.

Process Recap • Detect Features • Extract Feature Locations and Descriptors • Quantize Descriptors into a Vocabulary Tree • Score Database Images using Inverted Index • Pairwise Match on top scoring Images

Schedule I: VT/II Implementation • Week 1: Research Vocabulary Tree / Inverted Index, Determine which libraries to use • Week 2: Implement Feature Locator/Descriptors • Week 3: Implement Quantization of Descriptors in VT • Week 4: Implement Database scoring scheme using Inverted Index • Week 5: Milestone: Mid Project Presentation, Combine Previous parts, Pairwise Match to retrieve a single image

Library Choices • VLFeat • Includes hierarchical integer k means methods • VLFeat is available for MATLAB or C • Also includes SIFT and dense SIFT feature detection • FreeImage • C++ library that handles image input/output • OpenSURF • C++ SURF implementation • OpenCV • Required by OpenSURF • Dirent.h • Provides POSIX bindings for windows C++, useful for file I/O

Language Choice: MATLAB or C++ • MATLAB • + I/O is simple • + Integration is easy • + VLFeat tutorials are all for the MATLAB bindings • + Data is easy to handle, array manipulation is simple • - Proprietary, would require MATLAB to use/modify • - Not as fast as C++ • C++ • + No license required • + Extremely fast, as you control everything • - Integration is difficult (different data structure schemes) • - You have to control everything • Language Choice: C++ • Builds character

Feature Detection • SURF • C++: OpenSURF • Requires OpenCV, which handles the image I/O • SURF features used in the paper (Chen et al.) • SIFT • Dense SIFT feature detection • Included in VLFeat library • Same # of features for every image, but large # of features • Analysis/Comparison of DTII results when using SURF features vs DSIFT features would provide new/useful information

Pairwise Image Matching • Simple Nearest Neighbor algorithms can be applied to the features of the two images to calculate the number of matching images • Whichever image has the highest fraction of matches is the best match • Methods for this are included in some of the libraries • The Dictionary Tree + Inverted Index scoring method will produce the similarity scores for all the database images • The highest scoring images can then be scored using pairwise matching to find the best image match

Dictionary Tree Creation • HIKM – hierarchical k means clustering is done via VLFeat’s library • All the feature descriptors are used to create hierarchical clusters • Creates the dictionary tree, each node is a cluster center 1Image from http://www.vlfeat.org/overview/hikm.html

Inverted Index Creation • Inverted Index • The descriptors of each feature of each image traverse the tree to build the inverted index • Each leaf node visited adds to the associated image’s array of counts • Each leaf node has its own inverted index 1Image from Chen et al.

Similarity Scoring & Memory Usage • When matching an image, each image ik1 in the database of N images is given a similarity score • For each node visited by query descriptors the node’s inverted list of images all have the scores incremented : Where:

Current Progress • Implemented: • Image I/O, SURF feature detection • Read/Write SURF features to file • HIKM (Dictionary Tree) Creation • Inverted Index Creation • Image Scoring Methods • Pairwise matching for SURF method • Combined everything together for Image Matching via DTII • To Do: • Read/Write HIKM tree to file • Read/Write Inverted Index to file • Compression • Add additional feature types (e.g. dsift) + analysis

Challenges • File I/O needed to be handled manually in C++ • Different libraries format data differently • Very little documentation available for some libraries • Dictionary Tree & Inverted Index creation times are large • Finding SURF features for every image is time consuming • DT + II is very large, memory management is important • Saving Information (variables etc.) to disk is non-trivial

Schedule II: Compression & Analysis • Week 6: Inverted Index Image ID storage • Week 7: DSIFT feature version of the DT+II • Week 8: Soft Binned Tree, Analysis • Week 9: Final Project Presentation

Inverted Index Compression • Encode each inverted index’s Image IDs by consecutive differences • Inverted index compression techniques can significantly reduce memory usage by up to 5X without any loss in recognition accuracy • Reorder database to minimize differences • Minimize:

Soft-binned feature descriptor histograms • Classify a feature descriptor to k nearest tree nodes instead of just nearest tree node • Soft-binned tree gives improvement in classification accuracy • Disadvantage: • Each database feature now appears in k different inverted lists • Results in larger lists

References • 1David M. Chen, Sam S. Tsai, Vijay Chandrasekhar, Gabriel Takacs, Ramakrishna Vedantham, Radek Grzeszczuk, Bernd Girod, “Inverted Index Compression for Scalable Image Matching”

Scalable Image Matching: Implementation of Vocabulary Tree and Inverted Index