1 / 57

Dependency Modeling for Information Fusion with Applications in Visual Recognition

Dependency Modeling for Information Fusion with Applications in Visual Recognition. Andy Jinhua MA Advisor: Prof. Pong Chi YUEN. Outline. Motivation Related Works Supervised Spatio -Temporal Manifold Learning Linear Dependency Modeling Reduced Analytic Dependency Modeling

ilana
Download Presentation

Dependency Modeling for Information Fusion with Applications in Visual Recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dependency Modeling for Information Fusion with Applications in Visual Recognition Andy Jinhua MA Advisor: Prof. Pong Chi YUEN

  2. Outline • Motivation • Related Works • Supervised Spatio-Temporal Manifold Learning • Linear Dependency Modeling • Reduced Analytic Dependency Modeling • Conclusion

  3. Outline • Motivation • Related Works • Supervised Spatio-Temporal Manifold Learning • Linear Dependency Modeling • Reduced Analytic Dependency Modeling • Conclusion

  4. Motivation • Multiple features provide complementary information, e.g. • Color information can differ Daffodil from Windflower Daffodil Windflower Flower images from Oxford Flowers dataset [CVPR’06]

  5. Motivation • Multiple features provide complementary information, e.g. • Color information can differ Daffodil from Windflower • Shape characteristics can differ Daffodil from Buttercup Daffodil Windflower Buttercup Flower images from Oxford Flowers dataset [CVPR’06]

  6. Motivation • Fusion by estimating the joint distribution, but • Not accurate with high-dimension • Independent assumption can simplify the fusion process, , but • May not be valid in practice • Degrades the fusion performance • Existing dependency modeling techniques based on normal assumption, but • Not robust to non-normal cases • Solution • Develop dependency modeling methods without normal assumption

  7. Outline • Motivation • Related Works • Supervised Spatio-Temporal Manifold Learning • Linear Dependency Modeling • Reduced Analytic Dependency Modeling • Conclusion

  8. Related Works • Probabilistic approach • Independent assumption based [TPAMI’98] • Product, Sum, Majority Votes • Normal assumption based [TPAMI’2009] • Independent Normal (IN) combination • Dependent Normal (DN) combination • Non-probabilistic approach • Supervised weighting • LPBoost[ML’2002] • LP-B [ICCV’09] • Reduced multivariate polynomial (RM) [TCSVT’04] • Multiple kernel learning (MKL) [ICML’04, JMLR’06, JMLR’08] • Unsupervised approach • Signal strength combination (SSC) [TNNLS’12] • Graph-regularized robust late fusion (GRLF) [CVPR’12]

  9. Outline • Motivation • Related Works • Supervised Spatio-Temporal Manifold Learning • Linear Dependency Modeling • Reduced Analytic Dependency Modeling • Conclusion

  10. Spatio-Temporal Manifold Learning • Why manifold learning • Can discover non-linear structures in visual data • Successful applications in image analysis, e.g. Laplacianfaces • Limitation for video applications • Temporal information not fully considered • Proposed method can • Discover non-linear structures • Utilize global constraint of temporal labels

  11. Manifold Learning Based Action Recognition Framework Video Input Preprocessing By information saliency method [PR’09] Action Unit Image representation Feature Vectors Spatio-temporal manifold projection Embedded Manifold Classification Label Output

  12. Supervised Spatial (SS) Topology • Construct a new topological base by • Local information • Label information • Mathematical formulation • Temporal adjacency neighbors are in • Poses deform continuously over time

  13. Temporal Pose Correspondence (TPC) Topology • Sequences of the same action share similar poses • Employ dynamic time warping (DTW) [TASSP’78] to construct TPC sets • TPC topological base is

  14. Topology Combination • Combine SS and TPC topological bases • Supervised spatio-temporal neighborhood topology learning (SSTNTL) can • Preserve local structure • Separate sequences of different actions

  15. Experiments • Methods for comparison • Manifold learning methods • Locality preserving projection (LPP) [NIPS’03] • Supervised LPP (SLPP) [SP’07] • Locality sensitive discriminant analysis (LSDA) [IJCAI’07] • Local spatio-temporal discriminant embedding (LSTDE) [CVPR’08] • State-of-the-art action recognition algorithms • Classifier: nearest neighbor framework with median Hausdorff distance [TIP’07] where is the learnt projection, is action label, represents query data, is training data index

  16. Experiments • Datasets For Evaluation • Weizmann Human Action • KTH Human Action • UCF Sports • HOllywood Human Action (HOHA) • Cambridge Gesture • Image representation after preprocessing • Gray-scale for Weizmann and KTH • Gist [IJCV’01]for KTH, UCF sports, HOHA and Cambridge Gesture • Perform principle component analysis (PCA) [TPAMI’97] to avoid singular matrix problem

  17. Results • Accuracy (%) compared with other manifold embedding methods • Our method achieves highest accuracy • Image representation method affects the performance

  18. Results • Accuracy (%) compared with state-of-the-art methods under different scenarios in KTH • Outdoor (S1), Scale Change (S2), Clothes Change (S3), Indoor (S4) • Interest region based method outperforms others under fixed camera setting • Interest points based, e.g. Trackletand AFMKL, are better for scale change

  19. Results • Does global constraint of temporal labels help? • Compare the proposed method with and without TPC neighbors Neighbors not detected by local similarity

  20. Outline • Motivation • Related Works • Supervised Spatio-Temporal Manifold Learning • Linear Dependency Modeling • Reduced Analytic Dependency Modeling • Conclusion

  21. Linear Dependency Modeling • Main idea • If and • Under independent assumption [TPAMI’98] • Final decision is dominated by one classifier Feature 1 Input Classifier 1 Independent fusion Independent Model (Product): Feature 2 Classifier 2 … … (Windflower image) Feature M Classifier M Output × Not windflower Add dependency terms, s.t. the fused score is large

  22. Linear Classifier Dependency Modeling (LCDM) • Design of the dependency term • Dependency terms cannot be too large • Dependency weight to determine the feature importance • Prior probability • Following J. Kittler et al [TPAMI’98], suppose posteriors will not deviate dramatically from priors, where is small • Define dependency term as , with Dependency weight Prior Small number

  23. Linear Classifier Dependency Modeling (LCDM) • Main idea • If and • Dependency model Dependency model Feature 1 Input Classifier 1 Proposed: Feature 2 Classifier 2 … … (Windflower image) Feature M Classifier M Output √ Windflower Dependency term

  24. Linear Classifier Dependency Modeling (LCDM) • Expend the product formulation by neglecting terms • Linear Classifier Dependency Model (LCDM) is where ,

  25. Linear Feature Dependency Modeling (LFDM) • Why dependency modeling in featurelevel? • Feature level contains more information • Symbol definition • Denote , • Denote , • Denote be label viewed as random variable • Rigorous result • By Data Processing Inequality • Information about label is more in feature level, i.e. where represents mutual information

  26. Linear Feature Dependency Modeling (LFDM) • Posterior probability can be written as • Linear Feature Dependency Model (LFDM) is where can be calculated by one- dimensional density estimation

  27. Model Learning • Objective function in LCDM Maximizing margins Normalization constraint Dependency model constraint

  28. Model Learning • Objective function in LFDM • Solve by off-the-shelf techniques

  29. Estimation Error Analysis • Upper bounds of error factors in LCDM and LFDM where and represent estimation errors in and • Compare the denominators and numerators • LFDM is better than LCDM in the worst case,

  30. Experiments • Methods for comparison • Independent assumption: Sum rule [TPAMI’98] • Normal assumption [TPAMI’09]: Independent Normal (IN) and Dependent Normal (DN) combination rules • Boosting methods: LPBoost[ML’02] and LP-B [ICCV’09] • Multiple kernel learning (MKL) [JMLR’08] • Support vector machines (SVM) as base classifier • Datasets for evaluation • Synthetic data • Oxford 17 Flower • Human Action

  31. Experiments with Synthetic data • Data setting: 4 kinds of distributions • Independent Normal (IndNormal) • Dependent Normal (DepNormal) • Independent Non-Normal (IndNonNor) • Dependent Normal (DepNonNor) • Results: recognition rates • IN and DN methods outperform others under normal distributions

  32. Experiments with Synthetic data • Data setting: 4 kinds of distributions • Independent Normal (IndNormal) • Dependent Normal (DepNormal) • Independent Non-Normal (IndNonNor) • Dependent Normal (DepNonNor) • Results: recognition rates • IN and DN methods outperform others under normal assumption • LCDM achieves the best results when the distributions are non-normal

  33. Experiments with Oxford 17 Flower Dataset • Data setting • 17 flowers with 80 images per category • 3 predefined splits with 17 × 40 for training, 17 × 20 for validation, and 17×20 for testing • 7 kinds of features [CVPR’06] • Shape, color, texture, HSV, HoG, SIFT internal, and SIFT boundary • Results: recognition accuracy Example images • Feature combination outperform single feature • LCDM highest accuracy

  34. Experiments with Human Action Datasets • Data setting • Weizmann • Nine fold cross-validation • KTH • Training (8 persons), validation (8 persons), and testing (9 persons) • Space-time interest point (STIP) detection [VSPETS’05] STIP detection example in Weizmann STIP detection example in KTH

  35. Experiments with Human Action Datasets • Data setting • Weizmann • Nine fold cross-validation • KTH • Training (8 persons), validation (8 persons), and testing (9 persons) • Space-time interest points (STIP) detection [VSPETS’05] • 8 kinds of descriptors are computed on each STIP • Gray-scale intensity • Intensity difference • HoF and HoGwithout grid • HoF and HoGwith 2D grid • HoF and HoGwith 3D grid • 8 kinds of features are generated by Bag-of-Words

  36. Experiments with Human Action Datasets • Recognition accuracy (%) • LFDM outperforms others • Feature-level Improvement by LFDM is significant Classifier fusion Feature fusion

  37. Outline • Motivation • Related Works • Supervised Spatio-Temporal Manifold Learning • Linear Dependency Modeling • Reduced Analytic Dependency Modeling • Conclusion

  38. Problems in Linear Dependency Modeling (LDM) • Product formulation may not best model dependency • Assumption that posteriors will not deviate dramatically from priors • With strong classifiers could be large • Propose a new method removing these two assumptions Dependency term

  39. Analytic Dependency Modeling • Observation • Independent fusion [TPAMI’98] Constant w.r.t label Function of posteriors

  40. Analytic Dependency Modeling • Observation • Independent fusion [TPAMI’98] • Linear Dependency Model where denote

  41. Analytic Dependency Modeling • General score fusion model • Explicitly write out by converged power series • Denote and as weight vector • Rearrange according to where and is an analytic function of similar to

  42. Analytic Dependency Modeling • By Bayes’ rule and marginal distribution property where is a linear function of Trivial solution to equation system

  43. Analytic Dependency Modeling • By Bayes’ rule and marginal distribution property where is a linear function of • Independent condition solution to the equation system is trivial, i.e. • Model dependency by setting non-trivial solution

  44. Reduced Model • Analytic function contains infinite number of coefficients

  45. Reduced Model • Analytic function contains infinite number of coefficients • Approximate by converged power series property • Reduced Analytic Dependency Model (RADM)

  46. Modeling Learning • Objective function of empirical classification error • Objective function of dependency model constraint • Final optimization problem with regularization term • Solve by setting the first derivative to zero

  47. Experiments • Methods for comparison • Sum rule [TPAMI’98] • Independent Normal (IN) combination rule [TPAMI’09] • Dependent Normal (DN) combination rule [TPAMI’09] • Multi-class LPBoost namely LP-B [ICCV’09] • Reduced multivariate polynomial (RM) [TCSVT’04] • Signal strength combination (SSC) [TNNLS’12] • Graph-regularized robust late fusion (GRLF) [CVPR’12] • Datasets for evaluation • PASCAL VOC 2007 • Columbia Consumer Video (CCV) • HOllywood Human Action (HOHA)

  48. Experiments with VOC 2007 and CCV Datasets • PASCAL VOC 2007 • 20 classes, 9,963 images, 5,011 for training, 4,952 for testing • 8 features [CVPR’10] • RGB, HSV, LAB, dense SIFT, Harris SIFT, dense HUE and Harris HUE with horizontal decomposition • Gist descriptor • Columbia Consumer Video (CCV) • 20 categories, 9,317 videos, 4,659 for training, 4,658 for testing • 3 features [ICMR’11] • Visual features: SIFT and space-time interest point (STIP) • Audio feature: Mel-frequency cepstralcoefficients (MFCC)

  49. Experiments with VOC 2007 and CCV Datasets • Mean average precision (MAP) • RADM achieves highest MAP

  50. RADM Fusion with SSTNTL • Data setting • HOHA dataset is used • 8 actions • Answer Phone (AnP), Get out of Car (GoC), Hand Shake (HS), Hug Person (HP), Kiss (Ki), Sit Down (SiD), Sit Up (SiU), Stand Up (StU) • Features • Supervised spatio-temporal neighborhood topology learning (SSTNTL) • 8 kinds of space-time interest point (STIP) based features STIP detection examples

More Related