Database based hand pose estimation
Download
1 / 35

Database-Based Hand Pose Estimation - PowerPoint PPT Presentation


  • 164 Views
  • Uploaded on

Database-Based Hand Pose Estimation. CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington. Static Gestures (Hand Poses). Given a hand model, and a single image of a hand, estimate: 3D hand shape (joint angles). 3D hand orientation. Joints. Input image.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Database-Based Hand Pose Estimation' - hasad


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Database based hand pose estimation

Database-Based Hand Pose Estimation

CSE 6367 – Computer Vision

Vassilis Athitsos

University of Texas at Arlington


Static gestures hand poses
Static Gestures (Hand Poses)

Given a hand model, and a single image of a hand, estimate:

3D hand shape (joint angles).

3D hand orientation.

Joints

Input image

Articulated hand model


Static gestures
Static Gestures

Given a hand model, and a single image of a hand, estimate:

3D hand shape (joint angles).

3D hand orientation.

Input image

Articulated hand model


Given the 3D hand pose in the previous frame, estimate it in the current frame.

Problem: no good way to automatically initialize a tracker.

Rehg et al. (1995), Heap et al. (1996), Shimada et al. (2001),

Wu et al. (2001), Stenger et al. (2001), Lu et al. (2003), …

Goal: Hand Tracking Initialization


Assumptions in our approach
Assumptions in Our Approach the current frame.

A few tens of distinct hand shapes.

All 3D orientations should be allowed.

Motivation: American Sign Language.


Assumptions in our approach1
Assumptions in Our Approach the current frame.

A few tens of distinct hand shapes.

All 3D orientations should be allowed.

Motivation: American Sign Language.

Input: single image, bounding box of hand.


Assumptions in our approach2
Assumptions in Our Approach the current frame.

We do not assume precise segmentation!

No clean contour extracted.

input image

skin detection

segmented hand


Approach database search
Approach: Database Search the current frame.

Over 100,000 computer-generated images.

Known hand pose.

input


Why? the current frame.

We avoid direct estimation of 3D info.

With a database, we only match 2D to 2D.

We can find all plausible estimates.

Hand pose is often ambiguous.

input


Building the database
Building the Database the current frame.

26 hand shapes


Building the database1
Building the Database the current frame.

4128 images are generated for each hand shape.

Total: 107,328 images.


Features edge pixels
Features: Edge Pixels the current frame.

We use edge images.

Easy to extract.

Stable under illumination changes.

input

edge image


Chamfer distance
Chamfer Distance the current frame.

input

model

Overlaying input and model

How far apart are they?


Directed chamfer distance

Input: two sets of points. the current frame.

red, green.

c(red, green):

Average distance from each red point to nearest green point.

Directed Chamfer Distance


Directed chamfer distance1

Input: two sets of points. the current frame.

red, green.

c(red, green):

Average distance from each red point to nearest green point.

c(green, red):

Average distance from each red point to nearest green point.

Directed Chamfer Distance


Chamfer distance1

Input: two sets of points. the current frame.

red, green.

c(red, green):

Average distance from each red point to nearest green point.

c(green, red):

Average distance from each red point to nearest green point.

Chamfer Distance

Chamfer distance:

C(red, green) = c(red, green) + c(green, red)


Evaluating retrieval accuracy
Evaluating Retrieval Accuracy the current frame.

A database image is a correct match for the input if:

the hand shapes are the same,

3D hand orientations differ by at most 30 degrees.

correct matches

input

incorrect matches


Evaluating retrieval accuracy1
Evaluating Retrieval Accuracy the current frame.

An input image has 25-35 correct matches among the 107,328 database images.

Ground truth for input images is estimated by humans.

correct matches

input

incorrect matches


Evaluating retrieval accuracy2
Evaluating Retrieval Accuracy the current frame.

Retrieval accuracy measure: what is the rank of the highest ranking correct match?

correct matches

input

incorrect matches


Evaluating retrieval accuracy3
Evaluating Retrieval Accuracy the current frame.

input

rank 1

rank 2

rank 3

rank 4

rank 5

rank 6

highest ranking

correct match



Results on 703 real hand images1
Results on 703 Real Hand Images the current frame.

  • Results are better on “nicer” images:

    • Dark background.

    • Frontal view.

    • For half the images, top match was correct.

22


Examples
Examples the current frame.

segmented

hand

edge

image

initial

image

correct

match

rank: 1


Examples1
Examples the current frame.

segmented

hand

edge

image

initial

image

correct

match

rank: 644


Examples2
Examples the current frame.

segmented

hand

edge

image

initial

image

incorrect

match

rank: 1


Examples3
Examples the current frame.

segmented

hand

edge

image

initial

image

correct

match

rank: 1


Examples4
Examples the current frame.

segmented

hand

edge

image

initial

image

correct

match

rank: 33


Examples5
Examples the current frame.

segmented

hand

edge

image

initial

image

incorrect

match

rank: 1


Examples6
Examples the current frame.

segmented

hand

edge

image

“hard”

case

segmented

hand

edge

image

“easy”

case


Research directions
Research the current frame. Directions

More accurate similarity measures.

Better tolerance to segmentation errors.

Clutter.

Incorrect scale and translation.

Verifying top matches.

Registration.


Efficiency of the chamfer distance
Efficiency of the Chamfer Distance the current frame.

Computing chamfer distances is slow.

For images with d edge pixels, O(d log d) time.

Comparing input to entire database takes over 4 minutes.

Must measure 107,328 distances.

input

model


The nearest neighbor problem
The Nearest Neighbor Problem the current frame.

database


The nearest neighbor problem1
The Nearest Neighbor Problem the current frame.

Goal:

find the k nearest neighbors of query q.

database

query


The nearest neighbor problem2
The Nearest Neighbor Problem the current frame.

Goal:

find the k nearest neighbors of query q.

Brute force time is linear to:

n (size of database).

time it takes to measure a single distance.

database

query


The nearest neighbor problem3

Goal: the current frame.

find the k nearest neighbors of query q.

Brute force time is linear to:

n (size of database).

time it takes to measure a single distance.

The Nearest Neighbor Problem

database

query


ad