Learning Non-Redundant Codebooks for Classifying Complex Objects

Learning Non-Redundant Codebooks for Classifying Complex Objects Wei Zhang wei.zhang22@hp.com (zhangwe@eecs.oregonstate.edu) Akshat Surve survea@eecs.oregonstate.edu Xiaoli Fern xfern@eecs.oregonstate.edu Thomas Dietterich tgd@eecs.oregonstate.edu

Contents • Learning codebooks for object classification • Learning non-redundant codebooks • Framework • Boost-Resampling algorithm • Boost-Reweighting algorithm • Experiments • Conclusions and future work

Problem 1: Stonefly Recognition Cal Dor Hes Iso Mos Pte Swe Yor Zap

Visual Codebook for Object Recognition Training image Visual Codebook Interest Region Detector 20 17 Region Descriptors Testing image 3 18 Classifier 2 6 Image Attribute Vector (Term Frequency)

Problem 2: Document Classification Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered … Fixed-length Bag-of-words Variable-length Document … absent: 0 … active: 1 … animal: 2 … believe: 1 … dinosaur: 3 … social:1 …

Codebook for Document Classification • Cluster the words to form code-words codebook Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered … dog, canine, hound, ... Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered … cluster 1 Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered … car, automobile, vehicle, … cluster 2 Training corpus … … cluster K Through the first half of the 20th century, most of the scientific community believed dinosaurs to have been slow, unintelligent cold-blooded animals. Most research conducted since the 1970s, however, has supported the view that dinosaurs were active animals with elevated metabolisms and numerous adaptations for social interaction. The resulting transformation in the scientific understanding of dinosaurs has gradually filtered … … 20 1 0 2 Input document Classifier 7

Learning Non-Redundant Codebooks Non-Redundant Learning Codebook Approaches: k-means, Gaussian Mixture Modeling, Information Bottleneck, Vocabulary trees, Spatial pyramid … Motivation: Improve the discriminative performance of any codebook and classifier learning approach by encouraging non-redundancy in the learning process. Approach: learn multiple codebooks and classifiers; wrap the codebook and classifier learning process inside a boosting procedure [1]. [1] Freund, Y. and Schapire, R. (1996). Experiments with a new boosting algorithm. ICML.

Non-Redundant Codebook and Classifier Learning Framework W1(B) Clustering X based on weights W1(B) X Codebook D1 Classifier C1 Predictions L1 Update boosting weights ………… Update boosting weights Wt(B) Clustering X based on weights Wt (B) X Predictions Lt Codebook Dt Classifier Ct Final Predictions L Update boosting weights ………… WT(B) Update boosting weights Clustering X based on weights WT(B) Predictions LT X Codebook DT Classifier CT

Instantiations of the Framework • Boost-Reweighting (discrete feature space): • Supervised clustering features X based on the joint distribution table Pt(X, Y) (Y represents the class labels). This table is updated at each iteration based on the new boosting weights. • Boost-Resampling (continuous feature space): • Generate a non-redundant clustering set by sampling the training examples according to the updated boosting weights. The codebook is constructed by clustering the features in this clustering set.

Codebook Learning and Classification Algorithms Documents: • Codebook Learning: Information Bottleneck (IB) [1]: L = I(X ; X’) − βI(X’ ; Y) • Classification: Naïve Bayes Objects: • Codebook Learning: K-Means • Classification: Bagged Decision Trees [1] Bekkerman, R., El-yaniv, R., Tishby, N., Winter, Y., Guyon, I. and Elisseeff, A. (2003). Distributional word clusters vs. words for text categorization. JMLR.

Image Attributes: tf−idf Weights Visual Codebook 20 17 Classifier 3 18 tf-idf 2 Interest Regions 6 Region Descriptors Image Attribute Vector Term-frequency−inverse document frequency (tf−idf) weight [1]: "Document" = Image "Term" = Instance of a visual word [1] Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management.

Experimental Results − Stonefly Recognition • 3-fold cross validation experiments • The size of each codebook K = 100 • The number of boosting iterations T = 50 [1] Larios, N., Deng, H., Zhang, W., Sarpola, M., Yuen, J., Paasch, R., Moldenke, A., Lytle, D., Ruiz Correa, S., Mortensen, E., Shapiro, L. and Dietterich, T. (2008). Automated insect identification through concatenated histograms of local appearance features. Machine Vision and Applications. [2] Opelt, A., Pinz, A., Fussenegger, M. and Auer, P. (2006). Generic object recognition with boosting. PAMI.

Experimental Results − Stonefly Recognition (cont.) • Single: learns only a single codebook of size K×T = 5000. • Random: weighted sampling is replaced with uniform random sampling that neglects the boosting weights. • Boost achieves 77% error reduction comparing with Single on STONEFLY9.

Experimental Results − Stonefly Recognition (cont.)

Experimental Results − Document Classification • S1000: learns a single codebook of size 1000. • S100: learns a single codebook of size 100. • Random: 10 bagged samples of the original training corpus are used to estimate the joint distribution table Pt(X, Y).

Experimental Results − Document Classification (cont.) • [TODO]: add Figure 5 in a similar format as Figure 4

Conclusions and Future Work • Conclusions: Non-redundant learning is a simple and general framework to effectively improve the performance of codebooks. • Future work: • Explore the underlying reasons for the effectiveness of non-redundant codebooks – discriminative analysis, non-redundancy tests; • More comparison experiments on well-established datasets.

Acknowledgements • Supported by Oregon State University insect ID project: http://web.engr.oregonstate.edu/~tgd/bugid • Supported by NSF under grant number IIS-0705765. • Thank you !

Learning Non-Redundant Codebooks for Classifying Complex Objects

Learning Non-Redundant Codebooks for Classifying Complex Objects

Presentation Transcript

Learning Objects

UniProt Non-redundant Reference Cluster (UniRef) Databases

Learning Objects

Learning Measurement Matrices for Redundant Dictionaries

Classifying Complex Numbers Test

Classifying Complex Numbers

Classifying Objects by Their Behavior

Clustering for non-redundant sequence set

Reusable Learning Objects

Designing Learning Objects for PDAs

Learning With Objects

non-redundant masking on JWST TFI

Non-Redundant Patent Sequence Databases

Non Redundant Data Cache

Colossus Learning Objects

LEARNING OBJECTS

Non-linear Components for Complex Models

Learning Objects

Learning Objects Metadata

Learning Objects

NON Redundant ring