generalizing backpropagation to include sparse coding
Download
Skip this Video
Download Presentation
Generalizing Backpropagation to Include Sparse Coding

Loading in 2 Seconds...

play fullscreen
1 / 27

Generalizing Backpropagation to Include Sparse Coding - PowerPoint PPT Presentation


  • 248 Views
  • Uploaded on

Generalizing Backpropagation to Include Sparse Coding. David M. Bradley ( [email protected] ) and Drew Bagnell. Robotics Institute Carnegie Mellon University. Outline. Discuss value of modular and deep gradient based systems, especially in robotics

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Generalizing Backpropagation to Include Sparse Coding' - paul


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
generalizing backpropagation to include sparse coding
Generalizing Backpropagation to Include Sparse Coding

David M. Bradley ([email protected])

and Drew Bagnell

Robotics Institute

Carnegie Mellon University

outline
Outline
  • Discuss value of modular and deep gradient based systems, especially in robotics
  • Introduce a new and useful family of modules
  • Properties of new family
    • Online training with non-gaussian priors
      • E.g. encourage sparsity, multi-task weight sharing
    • Modules internally solve continuous optimization problems
      • Captures interesting nonlinear effects such as inhibition that involve coupled outputs
      • Sparse Approximation
    • Modules can be jointly optimized by a generalization of backpropagation
deep modular learning systems
Deep Modular Learning systems
  • Efficiently represent complex functions
    • Particularly efficient for closely related tasks
  • Recently shown to be powerful learning machines
    • Greedy layer-wise training improves initialization
  • Greedy module-wise training is useful for designing complex systems
    • Design and Initialize modules independently
    • Jointly optimize the final system with backpropagation
  • Gradient methods allow the incorporation of diverse data sources and losses

G. Hinton, S. Osindero, and Y. Teh, “A fast learning algorithm for deep belief networks.”, Neural Computation 2006

Y. Bengio, P. Lamblin, H. Larochelle, “Greedy layer-wise training of deep networks.”, NIPS 2007

Y. LeCun, L. Bottou, Y. Bengio and P. Haffner: Gradient-Based Learning Applied to Document Recognition, 1998

mobile robot perception
Mobile Robot Perception

Ladar

RGB Camera

NIR Camera

Lots of unlabeled data

Hard to define traditional supervised learning data

Target task is defined by weakly-labeled structured output data

perception problem scene labeling
Perception Problem: Scene labeling

Cost for each

2-D cell

Motion

Planner

goal system
Goal System

Data Flow

Gradient

Lighting

Variance Cost

Webcam Data

Camera

Object Classification Cost

Labelme

Labelme

Proprioception

Prediction Cost

IMU data

Labeled 3-D

points

Point

Classifier

Classification

Cost

Motion plans

Max Margin

Planner

Human-Driven

Example Paths

Ground Plane

Estimator

Observed Wheel Heights

Laser

new modules
New Modules
  • Modules that are important in this system require two new abilities
    • Induce new priors on weights
    • Allow modules to solve internal optimization problems
standard backpropagation assumes l2 prior
Standard Backpropagation assumes L2 prior
  • Gradient descent with convex loss functions:
  • Small steps with early stopping imply L2 regularization
    • Minimizes a regret bound by solving the optimization:
    • Which bounds the true regret

M. Zinkevich, “Online Convex Programming and Generalized Infinitesimal Gradient Ascent”, ‘03

alternate priors
Alternate Priors
  • KL-divergence
    • Useful if many features are irrelevant
    • Approximately solved with exponentiated gradient descent
  • multi-task priors (encourage sharing between related tasks)

Argyriou and Evgeniou, “Multi-task Feature Learning”, NIPS 07

Bradley and Bagnell 2008

l 2 backpropagation
L2 Backpropagation

Loss Function

+

Module (M1)

Module (M3)

Loss Function

a

c

Input

Module (M2)

b

with kl prior modules
With KL prior modules

Loss Function

+

Module (M1)

Module (M3)

Loss Function

a

c

Input

Module (M2)

b

general mirror descent
General Mirror Descent

Loss Function

+

Module (M1)

Module (M3)

Loss Function

a

c

Input

Module (M2)

b

new modules13
New Modules
  • Modules that are important in this system require two new abilities
    • Induce new priors on weights
    • Allow modules to solve internal optimization problems
      • interesting nonlinear effects such as inhibition that involve coupled outputs
      • Sparse Approximation
inhibition
Inhibition

Input

Basis

inhibition15
Inhibition

Input

Basis

Projection

inhibition16
Inhibition

Input

Basis

KL-regularized Optimization

sparse approximation
Sparse Approximation
  • Assumes the input is a sparse combination of elements, plus observation noise
    • Many possible elements
    • Only a few present in any particular example
  • True for many real-world signals
  • Many applications
    • Compression (JPEG), Sensing (MRI), Machine Learning
  • Produces effects observed in biology
    • V1 receptive fields, Inhibition

Tropp et al. “Algorithms For Simultaneous Sparse Approximation”, 2005

Raina et al. “Self Taught Learning: Transfer Learning from unlabeled data”, ICML ’07

Olhausen and Field, “Sparse Coding of Natural Images Produces Localized, Oriented, Bandpass Receptive Fields”, Nature 95

Doi and Lewicki, “Sparse Coding of natural images using an overcomplete set of limited capacity units”, NIPS 04

sparse approximation18
Sparse Approximation

Semantic meaning

is sparse

Visual Representation is Sparse (JPEG)

mnist digits dataset
MNIST Digits Dataset
  • 60,000 28x28 pixel handwritten digits
    • 10,000 reserved for a validation set
  • Separate 10,000 digit test set
sparse approximation20
Sparse Approximation

Basis Coefficients (w1)

r1=Bw

Error

gradient

Input

Reconstruction Error

(Cross Entropy)

sparse approximation21
Sparse Approximation

KL-regularized Coefficients on a KL-regularized Basis

Input

Output

sparse coding
Sparse Coding

Basis Coefficients (w(i))

r=Bw(i)

Input

Reconstruction Error

(Cross Entropy)

Training

Examples

Minimize over W and B

optimization modules
Optimization Modules

L1 Regularized Sparse Approximation

L1 Regularized Sparse Coding

Regularization Term

Reconstruction Loss

Convex

Not Convex

Lee et al. “Efficient Sparse Coding Algorithms”, NIPS \'06

kl regularized sparse approximation
KL-regularized Sparse Approximation

Unnormalized KL

Reconstruction Loss

  • Since this is continuous and differentiable, at the minimum we have:
  • Differentiating both sides with respect to B, and solving for the kth row we get:
preliminary results
Preliminary Results

L1 sparse coding

KL improves classification performance

Backpropagation further improves performance

KL sparse coding with backpropagation

main points
Main Points
  • Modular, gradient based systems are an important design tool for large scale learning systems
  • Need new tools to include a family of modules that have important properties
  • Presented a generalized backpropagation technique that
    • Allow priors that encourage, e.g. sparsity (KL prior): uses mirror descent to modify weights
    • Uses implicit differentiation to compute gradients through modules (e.g. sparse approximation) that internally solve optimization
  • Demonstrated work-in-progress on building deep, sparse coders using generalized backpropagation
acknowledgements
Acknowledgements
  • The Authors would like to thank the UPI team, especially Cris Dima, David Silver, and Carl Wellington
  • DARPA and the Army Research Office for supporting this work through the UPI program and the NDSEG fellowship
ad