Advanced Object Recognition and Action Classification using Deep Neural Networks

Object ClassesMost recent work is at the object level

We want in addition: 1. Individual Recognition

2. Object parts and sub-partsCalled: Full Interpretation Window Mirror Window Door knob Headlight Back wheel Bumper Front wheel Headlight

3. Action recognition (which 2 are different?)

4. Agents Interactions 3 1 2 4 5 6

Class Non-class

Is this an airplane?

Unsupervised Training Data

Features and Classifiers In DNN -- the net produces features of the top layer Previous work explored a broad range of features

Features used in the past: Generic Features Simple (wavelets) Complex (Geons)

Marr-Nishihara

Marr Net 2017 rotated versions of the object in the image

Past Class-specific Features: Common Fragments

Optima Features: Mutual Information I(C,F) Class: 1 1 0 1 0 1 0 0 Feature: 1 0 0 1 1 1 0 0 I(F,C) = H(C) – H(C|F)

Star model Detected fragments ‘vote’ for the center location Find location with maximal vote In variations, a popular state-of-the art scheme

Hierarchies of sub-fragments(a ‘deep net’) Detect the part itself by simpler sub-parts Repeat at multiple levels, to obtain a hierarchy of parts and sub-parts

Example Hierarchies

Classification by Features Hierarchy c x2 X1 X3 X4 X5 p(c,X,F) = p(c)Πp(xi|xi-) p(Fi|xi)

Global optimum can be found by max-sum message passing (two-pass computation) c x2 X1 X3 X4 X5 X = argmax [p(c,X,F) = p(c)Πp(xi|xi-) p(Fi|xi) ]

Context: Parts and ObjectsResults of two-pass computation

Current use of probabilistic graph models

HoG Descriptor Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human Detection SIFT is similar, different details, multi-scale

SVM – linear separation in feature space

Optimal Separation SVM Perceptron The Nature of Statistical Learning Theory, 1995 Rosenblatt, Principles of Neurodynamics 1962. Find a separating plane such that the closest points are as far as possible

+1 The Margin -1 0 Separating line: w ∙ x + b = 0 Far line: w ∙ x + b = +1 Their distance: w ∙ ∆x = +1 Separation: |∆x| = 1/|w| Margin: 2/|w|

Using patches with HoG descriptors and classification by SVM Person model: HoG

DPM: Adding Parts

Bicycle model: root, parts, spatial map Person model

Deep Learning

ImageNet

AlexNet

On the history deep learning

A Neural Network Model A network of ‘neurons’ with multiple layers Repeating structure, linear, non-linear Automatic learning of weights between units

The McCulloch–Pitts neuron (1943) Relu

Perceptron learning yj = f(xj)

Back-propagation 1986

LeNet 1998 Essentially the same as the current generation

MNIST data set

Hinton Trends in Cognitive Science 2007 The goal: unsupervised Restricted Boltzmann Machines Combining generative model and inference CNN are feed-forward and massively supervised

Basic structure of deep nets. Not detailed here, but make sure you know the layers structure and repeating 3-layer arrangement

Advanced Object Recognition and Action Classification using Deep Neural Networks