Recent Advances of Compact Hashing for Large-Scale Visual Search

Recent Advances of Compact Hashing for Large-Scale Visual Search Shih-Fu Chang Columbia University October 2012 Joint work with Junfeng He (Facebook), Sanjiv Kumar (Google), Wei Liu (IBM Research), and Jun Wang (IBM Research)

Outline • Lessons learned in designing hashing functions • The importance of balancing hash bucket size • How to incorporate supervised information • Prediction of NN search difficulty & hashing performance • Demo: Bag of hash bits for Mobile Visual Search

Fast Nearest Neighbor Search • Applications: image search, texture synthesis, denoising … • Avoid exhaustive search ( time complexity) Image search Dense matching, Coherence sensitive hashing (Korman&Avidan ’11) Photo tourism patch search

Locality-Sensitive Hashing [Indyk, and Motwani 1998] [Datar et al. 2004] • hash code collision probability proportional to original similarityl: # hash tables, K: hash bits per table 101 Index by compact code 0 1 hash function random 0 1 1 0

Hash Table based Search • O(1) search time by table lookup • bucket size is important (affect accuracy & post processing cost) hash table hash bucket address n 01100 01101 01110 xi 01111 01101 q

Different Approaches Unsupervised Hashing LSH ‘98, SH ‘08, KLSH ‘09,AGH ’10, PCAH, ITQ ‘11 Semi-Supervised Hashing SSH ‘10, WeaklySH ‘10 Supervised Hashing RBM ‘09, BRE ‘10, MLH ‘11, LDAH ’11,ITQ ‘11, KSH ‘12

PCA + Minimize Quantization Errors ITQ method, Gong&Lazebnik, CVPR 11 • PCA to maximize variance in each hash dimension • find optimal rotation in the subspace to minimize quantization error

Effects of Min Quantization Errors • 580K tiny images PCA-ITQ, Gong&Lazebnik, CVPR 11 PCA-random rotation PCA-ITQ optimal alignment

Utilize supervised labels Metric Supervision Semantic Category Supervision similar dissimilar dissimilar dissimilar similar

Design Hash Codes to Match Supervised Information similar dissimilar • Preferred hashing function 1 0

Adding Supervised Labels to PCA Hash similar pair dissimilar pair Relaxation: PCA covariance matrix Fitting labels “adjusted” covariance matrix • solution W: eigen vectors of adjusted covariance matrix • If no supervision (S=0), it is simply PCA hash Wang, Kumar, Chang, CVPR ’10, ICML’10

Semi-Supervised Hashing (SSH) 1 Million GIST Images1% labels, 99% unlabeled Precision @ top 1K SSH Supervised RBM Unsupervised SH Random LSH

Problem of orthogonal projections Precision @ hamming radius 2 • Many buckets become empty when # bits increases. • Need to search many neighbor buckets at query time

ICA Type Hashing SPICA Hash, He et al, CVPR 11 • Explicitly optimize two terms • Preserve similarity (accuracy) • Balanced bucket size  max entropy  min mutual info I (searchtime) Balanced bucket size Search accuracy Fast ICA to find non-orthogonal projections

The Importance of balanced size Simulation over 1M tiny image samples The largest bucket of LSH contains 10% of all 1M samples LSH SPICA Hash Balanced bucket size Bucket size Bucket index

Different Approaches Unsupervised Hashing LSH ‘98, SH ‘08, KLSH ‘09,AGH ’10, PCAH, ITQ ‘11 Semi-Supervised Hashing SSH ‘10, WeaklySH ‘10 Supervised Hashing RBM ‘09, BRE ‘10, MLH ‘11, LDAH ’11,ITQ ‘11, KSH ‘12

Better ways to handle supervised information? BRE [Kulis & Darrell, ‘10] Hamming distance between H(xi) and H(xj) MLH [Norouzi & Flee, ‘11] hinge loss But optimizing Hamming Distance (DH, XOR) is not easy!

A New Supervision Form: Code Inner Products Liu, Wang, Ji, Jiang, Chang, CVPR’12 labeled data code inner products x1 x2 code matrix code matrix similar T x1 r supervised hashing Х x2 x3 dissimilar dissimilar fitting pair-wise label matrix x3 S x1 x2 x3 x1 x2 x3 proof: code inner product ≡ Hamming distance

Code Inner Product enables efficient optimization Liu, Wang, Ji, Jiang, Chang, CVPR2012 hash bit • Much easier/faster to optimize and extend to kernels Hashing: sample Design hash codes to match supervised information

Extend Code Inner Product to Kernel • Following KLSH, construct a hash function using a kernel function and m anchor samples: zero-mean normalization applied to k(x). hash coefficients kernel matrix =sgn × l samples m anchors

Benefits of Code Inner Product Supervised Methods • CIFAR 10, 60K object images from 10 classes, 1K query images. • 1K supervised labels. • KSH0Spec Relax, KSH Sigmoid hashing function Open Issue: empty buckets and balance not addressed

Speedup by Inner Code Product Significant speedup CVPR 2012

Tiny-1M: Visual Search Results More visually relevant CVPR 2012

Comparison of Hashing vs. KD-Tree KD Tree Photo Tourism Patch set (Norte Dame subset, 103K samples) 512D GIFT Supervised Hashing Anchor Graph Hashing

Understand Difficulty of Approximate Nearest Neighbor Search He, Kumar, Chang, ICML 2012 • How difficult is approximate nearest neighbor search in a dataset? x is an ε-approximate NN if Toy example Search not meaningful! q A concrete measure of difficulty of search in a dataset?

Relative Contrast He, Kumar, Chang, ICML 2012 • A naïve search approach: Randomly pick a point and compare that to the NN Relative Contrast q • High Relative Contrast  easier search • If , search not meaningful

Estimation of Relative Contrast • With CLT, and binomial approximation n: data size p: Lp distance ϕ - standard Gaussian cdf σ'– a function of data properties (dimensionality and sparsity)

Synthetic Data • Data sampled randomly from U[0,1] relative contrast relative contrast s: prob. of non-zero element in each dim. d: feature dimension sparser vectors  good higher dimensionality  bad

Synthetic Data • Data sampled randomly from U[0,1] relative contrast relative contrast Larger database  good lower p  good

Predict Hashing Performance of Real-World Data 28 bits LSH 16 bits LSH

Mobile Search System by Hashing Low Bit Rate Big Data Indexing Light Computing He, Feng, Liu, Cheng, Lin, Chung, Chang. Mobile Product Search with Bag of Hash Bits and Boundary Reranking, CVPR 2012.

Estimate the Complexity • 500 local features per image • Feature size ~128 Kbytes • more than 10 seconds for transmission over 3G • Database indexing • 1 million images need 0.5 billions local features • Finding matched features becomes challenging • Idea: directly compute compact hash codes on mobile devices

Approach: hashing • Each local feature coded as hash bits • locality sensitive, efficient for high dimensions • Each image is represented as Bag of Hash Bits 011001100100111100… 110110011001100110…

Bit Reuse for Multi-Table Hashing • To reduce transmission size • Reuse a single hash bit pool by random subsampling Optimal hash bit pool (e.g., 80 bits, PCA Hash or SPICA hash) 1 0 0 1 1 1 0 0 0 0 1 0 1 0 1 0 . . . 0 0 1 1 0 1 1 1 Random subset Random subset Random subset Random subset . . . . . . Table 12 Table 2 Table 11 Table 1 32 bits Union Results

Rerank Results with Boundary Features • Use automatic salient object segmentation for every image in DB[Cheng et al, CVPR 2011] • Compute boundary features: normalized central distance, Fourier magnitude • Invariance: translation, scaling, rotation

Boundary Feature – Central Distance FFT: F(n) Distance to Center D(n)

Reranking with boundary feature

Mobile Product Search System: Bags of Hash Bits and Boundary features He, Feng, Liu, Cheng, Lin, Chung, Chang. Mobile Product Search with Bag of Hash Bits and Boundary Reranking, CVPR 2012. Server: • 1 million product images crawled from Amazon, eBay and Zappos • Hundreds of categories; shoes, clothes, electrical devices, groceries, kitchen supplies, movies, etc. Speed • Feature extraction: ~1s • Transmission: 80 bits/feature, 1KB/image • Serer Search: ~0.4s • Download/display: 1-2s video demo (52”)

Performance • Baseline [Chandrasekhar et al CVPR ‘10]: Client: compress local features with CHoGServer: BoW with Vocabulary Tree (1M codes) 30% higher recall and 6X-30X search speedup

Summary • Some Ideas Discussed • bucket balancing is important • code inner product – an efficient form of supervised hashing • insights on search difficulty prediction • Large mobile search – a good test case for hashing • Open Issues • supervised hashing vs. attribute discovery • hashing beyond point-to-point search • hashing to incorporate structured relation (spatio-temporal)

References • (Supervised Kernel Hash)W. Liu, J. Wang, R. Ji, Y. Jiang, and S.-F. Chang, Supervised Hashing with Kernels, CVPR 2012. • (Difficulty of Nearest Neighbor Search)J. He, S. Kumar, S.-F. Chang, On the Difficulty of Nearest Neighbor Search, ICML 2012. • (Hash Based Mobile Product Search)J. He, T. Lin, J. Feng, X. Liu, S.-F. Chang, Mobile Product Search with Bag of Hash Bits and Boundary Reranking, CVPR 2012 • (Hashing with Graphs)W. Liu, J. Wang, S. Kumar, S.-F. Chang. Hashing with Graphs, ICML 2011. • (Iterative Quantization)Y. Gong and S. Lazebnik, Iterative Quantization: A Procrustean Approach to Learning Binary Codes, CVPR 2011. • (Semi-Supervised Hash)J. Wang, S. Kumar, S.-F. Chang. Semi-Supervised Hashing for Scalable Image Retrieval. CVPR 2010. • (ICA Hashing)J.He, R. Radhakrishnan, S.-F. Chang, C. Bauer. Compact Hashing with Joint Optimization of Search Accuracy and Time. CVPR 2011.

Recent Advances of Compact Hashing for Large-Scale Visual Search

Recent Advances of Compact Hashing for Large-Scale Visual Search

Presentation Transcript

Large Scale Internet Search at Ask.com

Scalable Distributed Compact Trie Hashing

Large Scale Visual Recognition Challenge 2011

Recent Advances of Compact Hashing for Large-Scale Visual Search

Cross-Indexing of Binary Scale Invariant Feature Transform Codes for Large-Scale Image Search

Analysis of Large Scale Visual Recognition

LARGE SCALE

recent technology advances

DataMeadow A Visual Canvas for Analysis of Large-Scale Multivariate Data

Large Scale Internet Search at Ask

Very Large Scale Neighborhood Search

Large scale

MUFIN: Large-scale Similarity Search

Recent Advances

Large Scale Visual Recognition Challenge 2011

Efficient Simulation of Large-Scale P2P Networks: Compact Data Structures

Search and Access Technologies for Large Scale Web Archives

TAU: Recent Advances

HathiTrust Large Scale Search

DataMeadow A Visual Canvas for Analysis of Large-Scale Multivariate Data