Multimodal templates for real time detection of texture less objects in heavily cluttered scenes
This presentation is the property of its rightful owner.
Sponsored Links
1 / 30

Outline PowerPoint PPT Presentation


  • 58 Views
  • Uploaded on
  • Presentation posted in: General

Stefan Hinterstoisser , Stefan Holzer , Cedric Cagniart , Slobodan Ilic,Kurt Konolige , Nassir Navab , Vincent Lepetit Department of Computer Science, CAMP, Technische Universit¨at M¨unchen (TUM), Germany WillowGarage , Menlo Park, CA, USA

Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Multimodal templates for real time detection of texture less objects in heavily cluttered scenes

Stefan Hinterstoisser, Stefan Holzer, Cedric Cagniart, Slobodan Ilic,KurtKonolige, Nassir Navab, Vincent Lepetit

Department of Computer Science, CAMP, TechnischeUniversit¨atM¨unchen (TUM), Germany WillowGarage, Menlo Park, CA, USA

IEEE International Conference on Computer Vision (ICCV) 2011

Multimodal Templates for Real-Time Detection of Texture-less Objects in Heavily Cluttered Scenes


Outline

Outline

  • Goal &Challenges

  • Related Work

  • Modality Extraction

    • Image Cue

    • Depth Cue

  • Similarity Measure

  • Efficient Computation

  • Experiments


Outline 2203833

Goal


Challenges

Challenges

  • Objects under different poses over heavily cluttered background

  • Online learning

  • Real-time object learning and detection


Related work

Related Work

  • Solving the problem of multi-view 3D object detection has two main categories:

    • Learning Based Methods

    • Template Matching

  • Learning Based Methods:

    • Require a large amount of training data

    • Require long offline training phase

    • Expensive learning for new object


Related work1

Related Work

  • Template Matching:

    • Better adapted to low textured objects than feature point approaches

    • Easily update template for new object

    • Direct matching is inappropriate for real-time.

  • Others:

    • Matching in Range Data :

      • Construct full 3D CAD model of the object


Outline1

Outline

  • Goal &Challenges

  • Related Work

  • Modality Extraction

    • Image Cue

    • Depth Cue

  • Similarity Measure

  • Efficient Computation

  • Experiments


Modality extraction image cue

Modality Extraction-Image Cue

  • Image Cue:

  • Image gradientsare proved to be discriminant and robustto illumination change and noise.

  • Normalized gradients and not their magnitudes makes the measure robust to contrast changes.

  • We compute the normalized gradients on each color channel for input RGB color image.

  • Input image , gradient map at location x:


Modality extraction image cue1

Modality Extraction-Image Cue

  • Keep only the gradients whose norms are larger than a threshold.

  • Assign to the gradient whose quantizedorientationoccurs most in a 3 × 3 neighborhood.

  • The similarity measurement function fg:

    Og(r): the normalized gradient map of thereference image at location r

    Ig(t): the normalized gradient map of the input image at location t


  • Modality extraction image cue2

    Modality Extraction-Image Cue

    Quantizing the gradient orientations

    Input color image

    Gradient image computed on gray image

    Gradient image computed with our approach


    Modality extraction depth cue

    Modality Extraction-Depth Cue

    • Depth Cue

    • We use a standard camera and a aligned depth sensor to obtain depth map.

    • Use quantized surface normal computed on a dense depth field forour template representation.

    • Consider the first order Taylor expansion of the depth function D(x):

    • Within a patch defined around x, each pixel offset dx yields an equation.


    Modality extraction depth cue1

    Modality Extraction-Depth Cue

    • Estimate an optimal gradient in least-square.

    • Depth gradient corresponds to a tangent plane going through three points X, X1 and X2:

      vector along the line of sight that goes through pixel x (obtain from parameters of depth sensor)


    Modality extraction depth cue2

    Modality Extraction-Depth Cue

    • The normal to the surface can be estimated as the normalized cross-product of X1 − X and X2 − X.

    • Within a patch defined around x,this would not be robust around occluding contours.

    • Inspired by bilateral filtering, we ignore the pixels whose depth difference with the central pixel (X) is above a threshold.

    +Z

    X

    Tangent

    plane

    D(x)

    Depth sensor

    Normal of X


    Modality extraction depth cue3

    Modality Extraction-Depth Cue

    • Quantizethe normal directions into n0 bins.

    • Assign to each location the quantized value that occurs most often in a 5 × 5 neighborhood.

  • The similarity measurement function fD:

    OD(r): the normalized surface normal of thereference image at location r

    ID(t): the normalized surface normal of the input image at location t


  • Modality extraction depth cue4

    Modality Extraction-Depth Cue

    Quantizing the surface normals

    Input

    image

    The corresponding

    depth image

    Surface normalscomputed with our approach.

    Details are clearly visible and depth discontinuities are well handled.


    Outline2

    Outline

    • Goal &Challenges

    • Related Work

    • Modality Extraction

      • Image Cue

      • Depth Cue

    • Similarity Measure

    • Efficient Computation

    • Experiments


    Similarity measure

    Similarity Measure

    • We define a template as T = ({Om} m∈M, P ).

      P: a list of pairs (r,m) made of the locations rof a discriminant feature in modality m.

    • Each template is createdby extractingfor each “m” a set of its most discriminant features (P).

    P:(rk, surface normals)

    r: record the

    feature location

    with respect to

    object center (C).

    C

    P:(ri, gradients)


    Similarity measure1

    Similarity Measure

    • The object measurement energy function :

      T: ({Om} m∈M, P )

      c:the detected location (could be object center)

      R(c+r):[c+r- , c+r+][c+r- , c+r+] , N∈ const.(neighborhood of size Ncentered on (c+r) in Im)

      fm(Om (r), Im(t)): computes the similarity score for modality m


    Efficient computation

    Efficient Computation

    • We first quantize the input data for each modality into a small number of n0.

    • Use a lookup table тi,m for energy response:

      i: the index of the quantized value of modality m. (also use i to represent the corresponding value)

      Lm: list of values of a special modality m appearing in a local neighborhood of a value i from input I.

    C

    C’

    Lm’

    Lm


    Efficient computation1

    Efficient Computation

    • “Spread” [11] the data aroundneighborhood to obtain a robust representation Jminstead of Lm.

    • For each quantized value of one modality m with index i we can now compute the response at each location c:

      тi,m: the precomputedlookup table, Jmas the index

    [11] S. Hinterstoisser, C. Cagniart, S. Ilic, P. Sturm, P. Fua, N. Navab, and V. Lepetit. Gradient

    response maps for realtime detection of texture-less objects. under revision PAMI.


    Efficient computation2

    Efficient Computation

    • Finally, the similarity measure can be:

    • Since the maps Si,m are shared between the templates, matching several templates against the input image can be done very fast once they are computed.


    Experiments

    Experiments

    • LINE-MOD: our approach (intensity & depth)

    • LINE-2D: introduced in [11] (use only intensity)

    • LINE-3D: use only the depth map

    • Hardware:

      • Performed on one processor of a standard notebook with an Intel Centrino Processor Core2Duo with 2.4 GHz and 3 GB of RAM.

    • Test data:

      • Six object sequences made of 2000 real images each.

      • Each sequence presents illumination and large viewpoint changesover heavy cluttered background.


    Experiments1

    Experiments

    • Robustness:

      • A threshold (about 80) separates almost all true positives for LINE-MOD.


    Experiments2

    Experiments

    • Speed:

      • Learning new templates only requires extracting and storing features, which is almost instantaneous.

      • Templates include: 360 degree tilt rotation, 90 degree inclination rotation and in-plane rotations of ± 80 degrees, scale changes from 1.0 to 2.0.

      • Parse a 640×480 image with over 3000 templateswith 126 features at about 10 fps(real-time).

      • The runtime of LINE-MOD is only dependent on the number of features and independent of the object/template size.


    Experiments3

    Experiments

    • Speed:


    Experiments4

    Experiments

    • Occlusion:

    • Right: Average recognition score for the six objects with respect to occlusion.

    • With over 30% occlusion our method is still able to recognize objects.


    Experiments5

    Experiments

    Cup

    Toy-Car

    Hole punch


    Experiments6

    Experiments

    Toy-Monkey

    Toy-Duck

    Camera


    Experiments7

    Experiments

    True positive rates =

    False positive rates =


    Experiments8

    Experiments


  • Login