470 likes | 490 Views
Explore complex object interpretation, individual recognition, and action classification at the object level using cutting-edge deep neural network techniques. Uncover the nuances of features and classifiers in DNNs and delve into hierarchical structures for object parts and sub-parts detection.
E N D
2. Object parts and sub-partsCalled: Full Interpretation Window Mirror Window Door knob Headlight Back wheel Bumper Front wheel Headlight
4. Agents Interactions 3 1 2 4 5 6
Features and Classifiers In DNN -- the net produces features of the top layer Previous work explored a broad range of features
Features used in the past: Generic Features Simple (wavelets) Complex (Geons)
Marr Net 2017 rotated versions of the object in the image
Optima Features: Mutual Information I(C,F) Class: 1 1 0 1 0 1 0 0 Feature: 1 0 0 1 1 1 0 0 I(F,C) = H(C) – H(C|F)
Star model Detected fragments ‘vote’ for the center location Find location with maximal vote In variations, a popular state-of-the art scheme
Hierarchies of sub-fragments(a ‘deep net’) Detect the part itself by simpler sub-parts Repeat at multiple levels, to obtain a hierarchy of parts and sub-parts
Classification by Features Hierarchy c x2 X1 X3 X4 X5 p(c,X,F) = p(c)Πp(xi|xi-) p(Fi|xi)
Global optimum can be found by max-sum message passing (two-pass computation) c x2 X1 X3 X4 X5 X = argmax [p(c,X,F) = p(c)Πp(xi|xi-) p(Fi|xi) ]
HoG Descriptor Dallal, N & Triggs, B. Histograms of Oriented Gradients for Human Detection SIFT is similar, different details, multi-scale
Optimal Separation SVM Perceptron The Nature of Statistical Learning Theory, 1995 Rosenblatt, Principles of Neurodynamics 1962. Find a separating plane such that the closest points are as far as possible
+1 The Margin -1 0 Separating line: w ∙ x + b = 0 Far line: w ∙ x + b = +1 Their distance: w ∙ ∆x = +1 Separation: |∆x| = 1/|w| Margin: 2/|w|
Using patches with HoG descriptors and classification by SVM Person model: HoG
Bicycle model: root, parts, spatial map Person model
A Neural Network Model A network of ‘neurons’ with multiple layers Repeating structure, linear, non-linear Automatic learning of weights between units
Perceptron learning yj = f(xj)
LeNet 1998 Essentially the same as the current generation
Hinton Trends in Cognitive Science 2007 The goal: unsupervised Restricted Boltzmann Machines Combining generative model and inference CNN are feed-forward and massively supervised
Basic structure of deep nets. Not detailed here, but make sure you know the layers structure and repeating 3-layer arrangement