1 / 24

Learning Instance Specific Distance Using Metric Propagation

Learning Instance Specific Distance Using Metric Propagation. De-Chuan Zhan, Ming Li, Yu-Feng Li, Zhi-Hua Zhou LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China {zhandc, lim, liyf, zhouzh}@lamda.nju.edu.cn. Distance metric learners are introduced….

Download Presentation

Learning Instance Specific Distance Using Metric Propagation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Instance Specific Distance Using Metric Propagation De-Chuan Zhan, Ming Li, Yu-Feng Li, Zhi-Hua Zhou LAMDA Group National Key Lab for Novel Software Technology Nanjing University, China {zhandc, lim, liyf, zhouzh}@lamda.nju.edu.cn

  2. Distance metric learners are introduced… Distance based classification K-nearest neighbor classification SVM with Gaussian kernels Is the distance reliable? Are there any more natural measurements?

  3. Any more natural measurements? When sky is compared to other pictures… Color, probably texture features When Phelps II is compared to other athletes… Speed of swimming, shape of feet… Can we assign a specific distance measurement for each instance, both labeled and unlabeled? … our work

  4. Outline • Introduction • Our Methods • Experiments • Conclusion

  5. Introduction Distance Metric Learning • Many machine learning algorithms rely on the distance metric for input data patterns. • Classification • Clustering • Retrieval There are many metric learning algorithms developed [Yang, 2006] Problem: Focus on learning a uniform Mahalanobis distance for ALL instances

  6. Introduction Other distance functions • Instead of applying a uniform distance metric for every example, it is more natural to measure distances according to specific properties of data • Some researchers define distance from sample’s own perspective • QSim [Zhou and Dai, ICDM’06] [Athitsos et al., TDS’07] • Local distance functions [Frome et al., NIPS’06, ICCV’07]

  7. Introduction Query sensitive similarity Actually, instance specific similarities or query specific similarities are studied in other fields before: In content-based image retrieval, there has been a study which tries to compute query sensitive similarities. The similarities among different images are decided after receiving a query image. [Zhou and Dai, ICDM’06] The problem: Query similarity is based on pure heuristics.

  8. Introduction Local distance functions • [Frome et al. NIPS’06] The distance from the j-th instance to the i-th instance is larger than that from the j-th to the k-th Dji>Djk 1. Cannot generalize directly 2.The local distance defined is not directly comparable. • [Frome et al. ICCV’07] Dij>Dkj All constraints can be tired together. Requiring more heuristics for testing. The problem: Local distance functions for unlabeled data are N/A.

  9. Introduction Our Work Can we assign a specific distance measurement for each instance, both labeled and unlabeled? Yes, we learn Instance Specific Distance via Metric Propagation

  10. Outline • Introduction • Our Methods • Experiments • Conclusion

  11. Our Methods Intuition • Focus on learning instance specific distance for both labeled and unlabeled data. • For labeled data: the pair of examples come from the same class should be closer to each other • For unlabeled data: Metric propagation on a relationship graph

  12. The Loss function for labeled data Induced by the labels of instances, provides the side information A regularization term responsible for the implicit metric propagation Inspired by [Zhu 2003], the regularization term can be defined as: is a convex loss function, such as hinge loss in classification or least square loss in regression Our Methods The ISD Framework • Instead of directly conducting metric propagation while learning the distances for labeled examples, we formulate the metric propagation with a regularized framework. The j-th instance belongs to a class other than the i-th, or the j-th instance is a neighbor of i-th instance, i.e., all Cannot-links and some of the must-links are considered

  13. Replaced with high-order side information, such as triplets information L is set to identity matrix Our Methods The ISD Framework – relationship to FSM Although only pair-wised side information is investigated in our work, the ISD Framework is a common frame… FSM [Frome et al. NIPS’06] is a special case of ISD

  14. Given structure Predefined graph In new ISD space Initialize Final ISD Weights Weights Updated Graph Graph Our Methods The ISD Framework – update graph

  15. Introducing slack variables Solving it respect to all w simultaneously is of great challenge. The computational cost is too expensive. Our Methods ISD with L1-loss Convex problem  we employ the alternating descent method to solve it, i.e. to sequentially solve one w for one instance at each time by fixing other ws till converges or maxiters reached.

  16. Dual: Our Methods ISD with L1-loss (con’t) Primal:

  17. Inspired by nu-SVM, we probably can obtain a more efficient method: Our Methods Acceleration: ISD with L2-loss • For acceleration: • The alternating descent method is used to solve the problem • Reduce the number of constraints by considering some must- links However, the number of inequality constraints may be large

  18. drop Dual: A linear equality constraint We will project the solution back to the feasible region after we get the optimization results: Thus, this dual variable can be efficiently solved using Sequential Minimal Optimization. Our Methods Acceleration: ISD with L2-loss

  19. Outline • Introduction • Our Methods • Experiments • Conclusion

  20. Experiments Configurations • Data sets: • 15 UCI data sets • COREL image dataset (20 classes, 100 images/class) • 2/3 labeled training set; 1/3 unlabeled for testing, 30 runs • Compared methods • ISD-L1/L2 • FSM/FSSM (Frome et al. 2006 & 2007) • LMNN (Weinberger et al. 2005) • DNE (Zhang et al, 2007) • Parameters are selected via cross validation

  21. The win/tie/loss counts ISD vs. other methods t-test, 95% significance level 12 11 Experiments Classification Performance Comparison of test error rates (mean±std.)

  22. Experiments Influence of the number of iteration rounds Updating rounds Starting from Euclidean The error rates of ISD-L2 reduce on some datasets. However, on others, the performance are degenerated. Overfitting – L2-loss is more sensitive to noise The error rates of ISD-L1 are reduced on most datasets as the number of update increasing The error rates of ISD-L2 reduce on some datasets.

  23. Experiments Influence of the amount of labeled data ISD is less sensitive to the influence of the amount of labeled data When the amount of labeled samples is limited, the superiority of ISD is more apparent

  24. Conclusion Main contribution: • A method for learning instance-specific distance for labeled as well as unlabeled instances. Future work: • The construction of the initial graph • Label propagation, metric propagation, … any more properties to propagate? Thanks!

More Related