ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS - PowerPoint PPT Presentation

zuriel
submitted by supervised by ankit bhutani prof amitabha mukerjee y9227094 prof k s venkatesh n.
Skip this Video
Loading SlideShow in 5 Seconds..
ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS PowerPoint Presentation
Download Presentation
ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS

play fullscreen
1 / 33
Download Presentation
ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS
100 Views
Download Presentation

ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Submitted by: Supervised by: AnkitBhutani Prof. AmitabhaMukerjee (Y9227094) Prof. K S Venkatesh ALTERNATE LAYER SPARSITY & INTERMEDIATE FINE-TUNING FOR DEEP AUTOENCODERS

  2. AUTOENCODERS • AUTO-ASSOCIATIVE NEURAL NETWORKS • OUTPUT SIMILAR AS INPUT

  3. DIMENSIONALITY REDUCTION • BOTTLENECK CONSTRAINT • LINEAR ACTIVATION – PCA [Baldi et al., 1989] • NON-LINEAR PCA [Kramer, 1991] – 5 layered network • ALTERNATE SIGMOID AND LINEAR ACTIVATION • EXTRACTS NON-LINEAR FACTORS

  4. ADVANTAGES OF NETWORKS WITH MULTIPLE LAYERS • ABILITY TO LEARN HIGHLY COMPLEX FUNCTIONS • TACKLE THE NON-LINEAR STRUCTURE OF UNDERLYING DATA • HEIRARCHICAL REPRESENTATION • RESULTS FROM CIRCUIT THEORY – SINGLE LAYERED NETWORK WOULD NEED EXPONENTIALLY HIGH NUMBER OF HIDDEN UNITS

  5. PROBLEMS WITH DEEP NETWORKS • DIFFICULTY IN TRAINING DEEP NETWORKS • NON-CONVEX NATURE OF OPTIMIZATION • GETS STUCK IN LOCAL MINIMA • VANISHING OF GRADIENTS DURING BACKPROPAGATION • SOLUTION • -``INITIAL WEIGHTS MUST BE CLOSE TO A GOOD SOLUTION’’ – [Hinton et. al., 2006] • GENERATIVE PRE-TRAINING FOLLOWED BY FINE-TUNING

  6. HOW TO TRAIN DEEP NETWORKS? • PRE-TRAINING • INCREMENTAL LAYER-WISE TRAINING • EACH LAYER ONLY TRIES TO REPRODUCE THE HIDDEN LAYER ACTIVATIONS OF PREVIOUS LAYER

  7. FINE-TUNING • INITIALIZE THE AUTOENCODER WITH WEIGHTS LEARNT BY PRE-TRAINING • PERFORM BACKPROPOAGATION AS USUAL

  8. MODELS USED FOR PRE-TRAINING • STOCHASTIC – RESTRICTED BOLTZMANN MACHINES (RBMs) • HIDDEN LAYER ACTIVATIONS (0-1) USED TO TAKE A PROBABILISTIC DECISION OF PUTTING 0 OR 1 • MODEL LEARNS THE JOINT PROBABILITY OF 2 BINARY DISTRIBUTIONS - 1 IN INPUT AND THE OTHER IN HIDDEN LAYER • EXACT METHODS – COMPUTATIONALLY INTRACTABLE • NUMERICAL APPROXIMATION - CONTRASTIVE DIVERGENCE

  9. MODELS USED FOR PRE-TRAINING • DETERMINISTIC – SHALLOW AUTOENCODERS • HIDDEN LAYER ACTIVATIONS (0-1) ARE DIRECTLY USED FOR INPUT TO NEXT LAYER • TRAINED BY BACKPROPAGATION • DENOISING AUTOENCODERS • CONTRACTIVE AUTOENCODERS • SPARSE AUTOENCODERS

  10. CLASSIFIERS & AUTOENCODERS

  11. DATASETS • MNIST • Big and Small Digits

  12. DATASETS • Square & Room • 2d Robot Arm • 3d Robot Arm

  13. Libraries used • Numpy, Scipy • Theano – takes care of parallelization • GPU Specifications • Memory – 256 MB • Frequency – 33 MHz • Number of Cores – 240 • Tesla C1060

  14. MEASURE FOR PERFORMANCE • REVERSE CROSS-ENTROPY • X – Original input • Z – Output • Θ– Parameters – Weights and Biases

  15. BRIDGING THE GAP • RESULTS FROM PRELIMINARY EXPERIMENTS

  16. PRELIMINARY EXPERIMENTS • TIME TAKEN FOR TRAINING • CONTRACTIVE AUTOENCODERS TAKE VERY LONG TO TRAIN

  17. SPARSITY FOR DIMENSIONALITY REDUCTION • EXPERIMENT USING SPARSE REPRESENTATIONS • STRATEGY A – BOTTLENECK • STRATEGY B – SPARSITY + BOTTLENECK • STRATEGY C – NO CONSTRAINT + BOTTLENECK

  18. ALTERNATE SPARSITY

  19. OTHER IMPROVEMENTS • MOMENTUM • INCORPORATING THE PREVIOUS UPDATE • CANCELS OUT COMPONENTS IN OPPOSITE DIRECTIONS – PREVENTS OSCILLATION • ADDS UP COMPONENTS IN SAME DIRECTION – SPEEDS UP TRAINING • WEIGHT DECAY • REGULARIZATION • PREVENTS OVER-FITTING

  20. COMBINING ALL • USING ALTERNATE LAYER SPARSITY WITH MOMENTUM & WEIGHT DECAY YIELDS BEST RESULTS

  21. INTERMEDIATE FINE-TUNEING • MOTIVATION

  22. PROCESS

  23. PROCESS

  24. RESULTS

  25. RESULTS

  26. RESULTS

  27. CONCLUDING REMARKS

  28. NEURAL NETWORK BASICS

  29. BACKPROPAGATION

  30. RBM

  31. RBM

  32. AUTOENCODERS