real time articulated hand pose estimation using semi supervised transductive regression forests n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Tsz-Ho Yu PowerPoint Presentation
Download Presentation
Tsz-Ho Yu

Loading in 2 Seconds...

play fullscreen
1 / 25

Tsz-Ho Yu - PowerPoint PPT Presentation


  • 136 Views
  • Uploaded on

Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests . Tsz-Ho Yu. T-K Kim. Danhang Tang. Sponsored by . Motivation. Multiple cameras with invserse kinematics [Bissacco et al. CVPR2007] [Yao et al. IJCV2012] [Sigal IJCV2011] .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tsz-Ho Yu' - sibley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
real time articulated hand pose estimation using semi supervised transductive regression forests

Real-time Articulated Hand Pose Estimation using Semi-supervised Transductive Regression Forests

Tsz-Ho

Yu

T-K

Kim

DanhangTang

Sponsored by

motivation
Motivation

Multiple cameras with invserse kinematics

[Bissacco et al. CVPR2007]

[Yao et al. IJCV2012]

[Sigal IJCV2011]

Specialized hardware(e.g. structured light sensor, TOF camera)

[Shotton et al. CVPR’11][Baak et al. ICCV2011]

[Ye et al. CVPR2011]

[Sun et al. CVPR2012]

Learning-based (regression)

[Navaratnam et al. BMVC2006][Andriluka et al. CVPR2010]

motivation1
Motivation
  • Discriminative approaches (RF) have achieved great success in humanbody pose estimation.
    • Efficient – real-time
    • Accurate – frame-basis, not rely on tracking
    • Require a large dataset to cover many poses
    • Train on synthetic, test on real data
    • Didn’t exploit kinematic constraints

Examples:

Shotton et al. CVPR’11, Girshick et al. ICCV’11, Sun et al. CVPR’12

slide5

Challenges for Hand?

  • Viewpoint changes and self occlusions
  • Discrepancy between synthetic and real data is larger than human body
  • Labeling is difficult and tedious!
slide6

Our method

  • Viewpoint changes and self occlusions

Hierarchical

Hybrid

Forest

Transductive

Learning

  • Discrepancy between synthetic and real data is larger than human body

Semi-supervised

Learning

  • Labeling is difficult and tedious!
existing approaches
Existing Approaches
  • Generative approaches
  • Model-fitting
  • No training is required
  • Slow
  • Needs initialisation and tracking

Oikonomidis et al. ICCV2011

Motion capture

Ballan et al. ECCV 2012

De La Gorce et al. PAMI2010

Hamer et al. ICCV2009

  • Discriminative approaches
  • Similar solutions to human body pose estimation
  • Performance on real data remains challenging
  • Discriminative approaches
  • Similar solutions to human body pose estimation
  • Performance on real data remains challenging
  • Xu and Cheng ICCV 2013

Stengeret al. IVC 2007

Keskin et al. ECCV2012

Wang et al. SIGGRAPH2009

slide8

Our method

  • Viewpoint changes and self occlusions

Hierarchical

Hybrid

Forest

  • Discrepancy between synthetic and real data is larger than human body
  • Labeling is difficult and tedious!
hierarchical hybrid f orest
Hierarchical Hybrid Forest

Viewpoint Classification: Qa

  • STR forest:
    • Qa – View point classification quality (Information gain)

Qapv =αQa+ (1-α)βQP + (1-α)(1-β)QV

hierarchical hybrid forest
Hierarchical Hybrid Forest

Viewpoint Classification: Qa

Finger joint Classification: Qp

  • STR forest:
    • Qa – View point classification quality (Information gain)
    • Qp – Joint label classification quality (Information gain)

Qapv =αQa+ (1-α)βQP+ (1-α)(1-β)QV

hierarchical hybrid forest1
Hierarchical Hybrid Forest

Viewpoint Classification: Qa

Finger joint Classification: Qp

Pose Regression: Qv

  • STR forest:
    • Qa – View point classification quality (Information gain)
    • Qp – Joint label classification quality (Information gain)
    • Qv – Compactness of voting vectors (Determinant of covariance trace)

Qapv =αQa+ (1-α)βQP + (1-α)(1-β)QV

hierarchical hybrid forest2
Hierarchical Hybrid Forest

Viewpoint Classification: Qa

Finger Joint Classification: Qp

Pose Regression:Qv

  • STR forest:
    • Qa – View point classification quality (Information gain)
    • Qp – Joint label classification quality (Information gain)
    • Qv – Compactness of voting vectors (Determinant of covariance trace)
    • (α,β) – Margin measures of view point labels and joint labels

Qapv =αQa+ (1-α)βQP + (1-α)(1-β)QV

slide13

Our method

  • Viewpoint changes and self occlusions

Transductive

Learning

  • Discrepancy between synthetic and real data is larger than human body

Semi-supervised

Learning

  • Labeling is difficult and tedious!
transductive learning
Transductive learning

Source space

(Synthetic data S)

Target space

(Realistic data R)

  • Training data D = {Rl, Ru, S}: labeled unlabeled
  • Synthetic data S:
    • Generated from an articulated hand model. All labeled.
  • Realistic data R:
    • Captured from Primesense depth sensor
    • A small part of R, Rlare labeled manually (unlabeled set Ru)
transductive learning1
Transductive learning

Source space

(Synthetic data S)

Target space

(Realistic data R)

  • Training data D = {Rl, Ru, S}:
    • Realistic data R:
      • Captured from Kinect
      • A small part of R, Rlare labeled manually (unlabeled set Ru)
    • Synthetic data S:
      • Generated from a articulated hand model, where |S| >> |R|
transductive learning2
Transductive learning

Source space

(Synthetic data S)

Target space

(Realistic data R)

  • Training data D = {Rl, Ru, S}:
    • Similar data-points in Rl and S are paired(if separated by split function give penalty)
semi supervised learning
Semi-supervised learning

Source space

(Synthetic data S)

Target space

(Realistic data R)

  • Training data D = {Rl, Ru, S}:
    • Similar data-points in Rl and S are paired(if separated by split function give penalty)
    • Introduce a semi-supervised term to make use of unlabeled real data when evaluating split function
experiment settings
Experiment settings
  • Training data:
      • Synthetic data(337.5K images)
      • Real data(81K images, <1.2K labeled)
  • Evaluation data:
    • Three different testing sequences
      • Sequence A --- Single viewpoint(450 frames)
      • Sequence B --- Multiple viewpoints, with slow hand movements(1000 frames)
      • Sequence C --- Multiple viewpoints, with fast hand movements(240 frames)
self comparison experiment
Self comparison experiment
  • Self comparison(Sequence A):
      • This graph shows the joint classification accuracy of Sequence A.
      • Realistic and synthetic baselines produced similar accuracies.
      • Using the transductive term is better than simply augmented real and synthetic data.
      • All terms together achieves the best results.
multiview experiments
Multiview experiments
  • Multi view experiment (Sequence C):
conclusion
Conclusion
  • A 3D hand pose estimation algorithm
    • STR forest: Semi-supervised and transductive regression forest
    • A data-driven refinement scheme to rectify the shortcomings of STR forest
      • Real-time (25Hz on Intel i7 PC without CPU/GPU optimisation)
      • Works better than state-of-the-arts
      • Makes use of unlabelled data, required less manual annotation.
      • More accurate in real scenario
thank you http www iis ee ic ac uk icvl
Thank you!

http://www.iis.ee.ic.ac.uk/icvl