1 / 35

Pattern Recognition and Machine Learning Deep Alternative Architectures

Pattern Recognition and Machine Learning Deep Alternative Architectures. Dipartimento di Ingegneria «Enzo Ferrari» Università di M odena e Reggio Emilia. UNSUPERVISED LEARNING. Moti v ation.

jnina
Download Presentation

Pattern Recognition and Machine Learning Deep Alternative Architectures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern Recognition and Machine LearningDeep Alternative Architectures Dipartimento di Ingegneria «Enzo Ferrari» Università di Modena e Reggio Emilia

  2. UNSUPERVISED LEARNING

  3. Motivation Mostimpressiveresultsindeeplearninghavebeenobtainedwith purelysupervisedlearningmethods(seeprevioustalk) • Invision,typicallyclassification(e.g.objectrecognition) • Thoughprogresshasbeenslower,itislikelythatunsupervised learningwillbeimportanttofutureadvancesinDL Image:Krizhevsky(2012)-AlexNet,the“hammer”ofDL • 23 June 2014/ 2 CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor

  4. Why UnsupervisedLearning? Reason1: Wecanexploitunlabelleddata;muchmorereadilyavailable and oftenfree. 23 June 2014/ 4 CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor

  5. Why UnsupervisedLearning? Reason2: We can capture enough information about the observed variablessoastoasknewquestionsaboutthem;questions thatwerenotanticipatedattrainingtime. Layer1 Layer2 Layer3 Layer4 Layer5 23 June 2014/ 5 Image:Featuresfromaconvolutionalnet(ZeilerandFergus,2013) CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor

  6. Why UnsupervisedLearning? Reason3: Unsupervised learning has been shown to be a good regularizerforsupervisedlearning;ithelpsgeneralize. 1500 This advantage showsup in practicalapplications: • transfer learning, domainadaptation • unbalancedclasses • zero-shot,one-shot learning Withoutpre−training 1000 Withpre−training 500 0 −500 −1000 −1500 −4000 −3000 −2000 −1000 0 1000 2000 3000 4000 Image:ISOMAPembeddingoffunctionsrepresentedby 50networkswandw/opretraining(Erhanetal.,2010) 23 June 2014/ 6 CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor

  7. Why UnsupervisedLearning? Reason4: Thereisevidencethatunsupervisedlearningcanbeachieved mainly through a level-local training signal; compare this to supervised learning where the only signal driving parameter updatesisavailableattheoutputandgetsbackpropagated. Propagatecredit Supervisedlearning Locallearning 23 June 2014/ CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor

  8. Why UnsupervisedLearning? Reason5: A recent trend in machine learning is to consider problems where the output is high-dimensional and has a complex, possibly multi-modal joint distribution. Unsupervised learningcanbeusedinthese“structuredoutput”problems. animal pet furry … striped Attribute Prediction Segmentation 23 June 2014/ CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor

  9. LearningRepresentations “Concepts”or“Abstractions”thathelpusmake senseofthevariabilityindata • Often hand-designed to have desirableproperties: e.g.sensitivetovariableswewanttopredict,less sensitive to other factors explainingvariability • DLhasleveragedtheabilitytolearn representations • these canbe task-specific or task-agnostic - 23 June 2014/ CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor

  10. Supervised Learningof Representations Learn a representation with the objective of selectingonethatisbestsuitedforpredicting targets giveninput • (c) Layer 5, strongest feature mapprojections (a) InputImage (b) Layer 5, strongest featuremap input prediction f() Error target 23 June2014 / 10 CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor Image:Featuresfromaconvolutionalnet(ZeilerandFergus,2013)

  11. Unsupervised Learningof Representations f() input prediction Error ? 23 June2014 / 11 CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor

  12. Unsupervised learningof representations code What is theobjective? • reconstructionerror? - input reconstruction maximumlikelihood? - Input images disentangle factors ofvariation? - Learning Identitymanifold coordinates FixedID Posemanifold coordinates 23 June2014 / 12 CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor FixedPose Input Image: Lee et al.2014

  13. Principal ComponentsAnalysis PCA works well whenthe data is near a linear manifold in high- dimensionalspace • Project the dataonto this subspace spanned by principalcomponents • directionoffirstprincipalcomponenti.e. direction of greatestvariance In dimensionsorthogonal to the subspace the data has lowvariance • Credit: GeoffHinton

  14. AninefficientwaytofitPCA Train a neuralnetwork with a “bottleneck” hiddenlayer • code (bottleneck) output (reconstruction) input Ifthehiddenandoutputlayersarelinear, and we minimize squaredreconstruction error: • Try to make theoutput the same asthe input • The M hidden units will span the same space as the first M principalcomponents • Buttheirweightvectorswillnotbe orthogonal • Andtheywillhaveapproximatelyequal variance • Credit: GeoffHinton

  15. Why fit PCAinefficiently? input code reconstruction encoder decoder h(x) xˆ (h(x)) Error Withnonlinearlayersbeforeandafterthecode,itshouldbepossibleto representdatathatliesonornearanonlinearmanifold • theencodermapsfromdataspacetoco-ordinatesonthemanifold - thedecoderdoestheinversetransformation - Theencoder/decodercanberich,multi-layerfunctions •

  16. Auto-encoder input code reconstruction encoder decoder h(x) xˆ (h(x)) Error Feed-forwardarchitecture • Trained to minimize reconstructionerror • bottleneck or regularizationessential - 23 June2014 / 17 CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor

  17. RegularizedAuto-encoders input code reconstruction encoder decoder h(x) xˆ (h(x)) Error Permitcodetobehigher-dimensionalthantheinput • Capture structure of the training distribution due to predictive opposition b/w reconstructiondistribution andregularizer • Regularizertriestomakeenc/decassimpleaspossible •

  18. Simple? Reconstructtheinputfromthecodeandmakethecode compact (PCA, auto-encoder withbottleneck) • Reconstructtheinputfromthecodeandmakethecodesparse (sparseauto-encoders) • Addnoisetotheinputorcodeandreconstructthecleaned-up version (denoisingauto-encoders) • Reconstructtheinputfromthecodeandmakethecode insensitivetotheinput(contractiveauto-encoders) • 23 June2014 / 19 CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor

  19. SparseAuto-encoders 23 June2014 / 20 CVPRDLforVisionTutorial・UnsupervisedLearning/GTaylor

  20. DeconvolutionalNetworks Deep convolutional sparsecoding Layer4 • Trained to reconstruct theinput from anylayer • Layer3 Fast approximateinference • Recently used to visualizefeatures learned by convolutional nets (Zeiler and Fergus2013) • Layer1 Layer2

  21. (Vincent et al.2008) DenoisingAuto-encoders noisy input input code reconstruction encoder decoder noise x˜ (x) h(x˜) xˆ (h(x˜)) Error Thecodecanbeviewedasalossy compression of theinput • Learning drives it to be a good compressor for trainingexamples (andhopefullyothersaswell)but not arbitraryinputs •

  22. (Rifai et al.2011) ContractiveAuto-encoders input code reconstruction encoder decoder h(x) xˆ (h(x)) Error Learn good models of high- dimensional data (Bengioetal. 2013) • Can obtain goodrepresentations forclassification • Can produce good quality samplesbyarandomwalknear the manifold of high density (Rifai et al.2012) •

  23. Resources Onlinecourses • Andrew Ng’s Machine Learning(Coursera) - Geoff Hinton’s Neural Networks(Coursera) - Websites • deeplearning.net - http://deeplearning.stanford.edu/wiki/index.php/ UFLDL_Tutorial -

  24. Surveys andReviews • Y. Bengio, A. Courville, and P. Vincent. Representation learning:A review and new perspectives. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 35(8):1798–1828, Aug2013. • Y.Bengio.Deeplearningofrepresentations:Lookingforward.InStatistical LanguageandSpeechProcessing,pages1–37.Springer,2013. • Y.Bengio,I.Goodfellow,andA.Courville.DeepLearning.2014.Draft available athttp://www.iro.umontreal.ca/~bengioy/dlbook/ • J.Schmidhuber.Deeplearninginneuralnetworks:Anoverview.arXiv preprint arXiv:1404.7828,2014. • Y.Bengio.Learningdeeparchitecturesforai.Foundationsandtrendsin Machine Learning, 2(1):1–127,2009.

  25. Sequencemodelling

  26. Sequencemodelling • When applying machine learning to sequences, we often want to turn an input sequence into an output sequence that lives in a different domain. – E. g. turn a sequence of sound pressures into a sequence of word identities. • When there is no separate target sequence, we can get a teaching signal by trying to predict the next term in the input sequence. – The target output sequence is the input sequence with an advance of 1 step. – This seems much more natural than trying to predict one pixel in an image from the other pixels, or one patch of an image from the rest of the image. • For temporal sequences there is a natural order for the predictions.

  27. Memorylessmodels for sequences Autoregressivemodels FeedForward network

  28. Memory and Hidden State • If we give our generative model some hidden state, and if we give this hidden state its own internal dynamics, we get a much more interestingkind of model. – It can store information in its hidden state for a long time. – If the dynamics is noisy and the way it generates outputs from its hidden state is noisy, we can never know its exact hidden state. • The best we can do is to infer a probability distribution over the space of hidden state vectors.

  29. RNN • RNNs are very powerful, because they combine twoproperties: • – Distributed hidden state that allows them to store a lot of information aboutthe pastefficiently. • Non-linear dynamicsthatallowsthem to update their hidden state in complicatedways. With enough neurons and time, RNNs can compute anything that can be computedby your computer.

  30. RNN Structure and WeightSharing Jordan Network Elman Network

  31. Backpropagatingthrough time

More Related