1 / 28

Learning visual representations for unfamiliar environments

Learning visual representations for unfamiliar environments. Kate Saenko , Brian Kulis , Trevor Darrell UC Berkeley EECS & ICSI. The challenge of large scale visual interaction. ?. Last decade has proven the superiority of models learned from data vs. hand engineered structures!.

kalani
Download Presentation

Learning visual representations for unfamiliar environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI

  2. The challenge of large scale visual interaction ? Last decade has proven the superiority of models learned from data vs. hand engineered structures!

  3. Large-scale learning • “Unsupervised”: Learn models from “found data”; often exploit multiple modalities (text+image) … The Tote is the perfect example of two handbag design principles that ... The lines of this tote are incredibly sleek, but ... The semi buckles that form the handle attachments are ... Wikipedia Flickr Google

  4. E.g., finding visual senses Artifact sense: “telephone” DICTIONARY 1: (n) telephone, phone, telephone set (electronic equipment that converts sound into electrical signals that can be transmitted over distances and then converts received signals back into sounds) 2: (n) telephone, telephony (transmitting speech at a distance) [Saenko and Darrell ’09]

  5. Large-scale Learning • “Unsupervised”: Learn models from “found data”; often exploit multiple modalities (text+image) • Supervised: Crowdsource labels (e.g., ImageNet) … The Tote is the perfect example of two handbag design principles that ... The lines of this tote are incredibly sleek, but ... The semi buckles that form the handle attachments are ... Wikipedia Flickr Google

  6. Yet… • Even the best collection of images from the web and strong machine learning methods can often yield poor classifiers on in-situ data! • Supervised learning assumption: training distribution == test distribution • Unsupervised learning assumption: joint distribution is stationary w.r.t. online world and real world Almost never true! ?

  7. “What You Saw Is Not What You Get” The models fail due to domain shift SVM:20% NBNN:19% SVM:54% NBNN:61%

  8. Examples of visual domain shifts digital SLR webcam Close-up Far-away amazon.com CCTV FLICKR Consumer images

  9. Examples of domain shift: change in camera, feature type, dimension webcam digital SLR SIFT SURF VQ to 1000 VQ to 300 Different dimensions

  10. Solutions? • Do nothing (poor performance) • Collect all types of data (impossible) • Find out what changed (impractical) • Learn what changed

  11. Prior Work on Domain Adaptation • Pre-process the data [Daumé ’07] : replicate features to also create source- and domain-specific versions; re-train learner on new features • SVM-based methods [Yang’07], [Jiang’08], [Duan’09], [Duan’10] : adapt SVM parameters • Kernel mean matching [Gretton’09] : re-weight training data to match test data distribution

  12. Our paradigm: Transform-based Domain Adaptation Example: “green” and “blue” domains Previous methods’ drawbacks • cannot transfer learned shift to new categories • cannot handle new features We can do both by learning domaintransformations* W * Saenko, Kulis, Fritz, and Darrell. Adapting visual category models to new domains. ECCV, 2010

  13. Limitations of symmetric transforms Symmetric assumption fails! Saenko et al. ECCV10 used metric learning: • symmetric transforms • same features How do we learn more general shifts? W

  14. Latest approach*: asymmetric transforms Asymmetric transform (rotation) • Metric learning model no longer applicable • We propose to learn asymmetric transforms • Map from target to source • Handle different dimensions *Kulis, Saenko, and Darrell, What You Saw is Not What You Get: Domain Adaptation Using Asymmetric Kernel Transforms, CVPR 2011

  15. Latest approach: asymmetric transforms Asymmetric transform (rotation) • Metric learning model no longer applicable • We propose to learn asymmetric transforms • Map from target to source • Handle different dimensions W

  16. Model Details • Learn a linear transformation to map points from one domain to another • Call this transformation W • Matrices of source and target: W

  17. Loss Functions Choose a point x from the source and y from the target, and consider inner product: Should be “large” for similar objects and “small” for dissimilar objects

  18. Loss Functions • Input to problem includes a collection of m loss functions • General assumption: loss functions depend on data only through inner product matrix

  19. Regularized Objective Function • Minimize a linear combination of sum of loss functions and a regularizer: • We use squared Frobenius norm as a regularizer • Not restricted to this choice

  20. The Model Has Drawbacks • A linear transformation may be insufficient • Cost of optimization grows as the product of the dimensionalities of the source and target data • What to do?

  21. Kernelization • Main idea: run in kernel space • Use a non-linear kernel function (e.g., RBF kernel) to learn non-linear transformations in input space • Resulting optimization is independent of input dimensionality • Additional assumption necessary: regularizer is a spectral function

  22. Kernelization Kernel matrices for source and target Original Transformation Learning Problem New Kernel Problem Relationship between original and new problems at optimality

  23. Summary of approach Train Time Input space Input space 2. Generate Constraints, Learn W 1. Multi-Domain Data Test Time Test point y1 y2 Test point 3. Map via W 4. Apply to New Categories

  24. Multi-domain dataset

  25. Experimental Setup • Utilized a standard bag-of-words model • Also utilize different features in the target domain • SURF vs SIFT • Different visual word dictionaries • Baseline for comparing such data: KCCA

  26. Novel-class experiments • Test method’s ability to transfer domain shift to unseen classes • Train transform on half of the classes, test on the other half Our Method (linear) Our Method

  27. Extreme shift example Nearest neighbors in source using KCCA+KNN Query from target Nearest neighbors in source using transformation

  28. Conclusion • Should not rely on hand-engineered features any more than we rely on hand engineered models! • Learn feature transformation across domains • Developed a domain adaptation method based on regularized non-linear transforms • Asymmetric transform achieves best results on more extreme shifts • Saenkoet al ECCV 2010 and Kulis et al CVPR 2011; journal version forthcoming

More Related