Director3
Uploaded by
136 SLIDES
1 VIEWS
0LIKES

ml4

DESCRIPTION

ppt on machine learning

1 / 136

Download Presentation

ml4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NeuralNetworks!(MLPs) CS 445/545

  2. WhatcanIdowithaNN? Ashortlistofapplications: (*)Binaryclassification,1-in-Kclassification,regression (*)Generalpatternrecognition/statisticallearning(*)Characterrecognition,facialrecognition (*)ComputerVision:imageclassification,localization,scene recognition,captioning (*)SignalProcessing:noisesuppression,signalanalysis(*)Datacompression (*)NLP:machinetranslation,sentimentanalysis (*)Finance:statisticalarbitrage,riskanalysis(*)AI:Q-Learning(reinforcementlearning)(*)Medicine:diagnosis,imaging,genomics(*)Law:informationretrieval (*)Computationalcreativityapplications

  3. Abitofhistory • 1960s:Rosenblattprovedthattheperceptronlearningruleconvergestocorrectweightsinafinitenumberofsteps,providedthetrainingexamplesarelinearlyseparable. • 1969:MinskyandPapertprovedthatperceptronscannotrepresentnon-linearlyseparabletargetfunctions. • However,theyshowedthataddingafullyconnectedhiddenlayermakes • thenetworkmorepowerful. • – I.e.,Multi-layerneuralnetworkscanrepresentnon-lineardecisionsurfaces. • Lateritwasshownthatbyusingcontinuousactivationfunctions(ratherthanthresholds),afullyconnectednetworkwithasinglehiddenlayercaninprinciplerepresentanyfunction. • 1986:“rediscovery”of backprop algorithm:Hintonetal.

  4. Abitofhistory

  5. Linearseparability Hyperplane In2D: w1x1w2x2w00 Feature1 w w x  x2 0 1 1 w w2 2 Feature2 Aperceptroncanseparatedatathatis linearly separable.

  6. Multi-layerneuralnetworkexample Decisionregionsofamultilayerfeedforwardnetwork.(From T.M.Mitchell,MachineLearning) Thenetworkwastrainedtorecognize1of10 vowelsoundsoccurringinthecontext“h_d”(e.g.,“had”,“hid”) Thenetworkinputconsistsoftwoparameters,F1andF2,obtainedfromaspectralanalysisofthesound.The10networkoutputscorrespondtothe10possiblevowelsounds.

  7. Goodnews:Addinghiddenlayerallowsmoretargetfunctionstoberepresented.Goodnews:Addinghiddenlayerallowsmoretargetfunctionstoberepresented. • Badnews:Noalgorithmforlearninginmulti-layerednetworks,andnoconvergencetheorem! • QuotefromMinskyandPapert’sbook,Perceptrons(1969): • “[Theperceptron]hasmanyfeaturesto attractattention: itslinearity;itsintriguing learningtheorem;itsclearparadigmaticsimplicityasakindofparallelcomputation.Thereisnoreasontosupposethatanyofthesevirtuescarryovertothemany-layeredversion.Nevertheless,weconsiderittobeanimportantresearchproblemtoelucidate(orreject)our intuitivejudgmentthattheextensionis sterile.”

  8. Twomajorproblemstheysawwere: • Howcanthelearningalgorithmapportioncredit(orblame)toindividualweightsforincorrectclassificationsdependingona(sometimes)largenumberofweights? • Howcansuchanetworklearnusefulhigher-orderfeatures? • Goodnews:Successfulcredit-apportionmentlearningalgorithmsdevelopedsoonafterwards(e.g.,back-propagation). • Badnews:However,inmulti-layernetworks,thereisno • guaranteeofconvergencetominimalerrorweightvector. • Butinpractice,multi-layernetworksoftenworkverywell.

  9. Summary • Perceptronscanonlybe100%accurateonlyonlinearlyseparableproblems. • Multi-layernetworks(oftencalledmulti-layerperceptrons,orMLPs)canrepresentanytargetfunction. • However,inmulti-layernetworks,thereisnoguaranteeofconvergencetominimalerrorweightvector. • Onecanshow,mathematically,thatone hiddenlayerissufficienttoapproximateany function toarbitraryaccuracywithaNN.ThisisknownastheUniversalApproximationTheorem(1989)(we say:“NNsareuniversal functionapproximators”);RNNsare TuringComplete.

  10. A “two”-layerneuralnetwork (activationrepresentsclassification) outputlayer hiddenlayer (internalrepresentation) (activationsrepresentfeaturevectorforonetrainingexample) inputs •Inputlayer— Itcontainsthoseunits (artificialneurons)whichreceiveinputfrom the outside worldonwhichnetworkwilllearn,recognizeaboutorotherwiseprocess. •Outputlayer— It containsunits that respondto the informationabouthowit’slearnedany task. •Hiddenlayer— Theseunits areinbetweeninputandoutputlayers.Thejob of hiddenlayeris totransformtheinputintosomethingthatoutputunitcanuseinsomeway. Mostneuralnetworksarefullyconnectedthatmeanstosayeachhiddenneuronisfully connectedtotheeveryneuroninitspreviouslayer(input)andtothenextlayer(output)layer.

  11. ClassificationPipeline

  12. DifferentTypesofNeuralNetworks Perceptron— NeuralNetworkhavingtwoinputunitsandoneoutputunitswithnohiddenlayers.These arealso knownas‘singlelayerperceptrons. RadialBasisFunctionNetwork— Thesenetworksaresimilar tothefeedforwardneuralnetworkexcept radialbasisfunctionisusedasactivationfunctionoftheseneurons. MultilayerPerceptron— Thesenetworksusemorethanonehiddenlayerofneurons,unlikesinglelayer perceptron.Thesearealsoknownasdeepfeedforwardneuralnetworks. RecurrentNeuralNetwork—Typeofneuralnetworkinwhichhiddenlayerneuronshasself-connections.Recurrentneuralnetworkspossessmemory.Atanyinstance,hiddenlayerneuronreceivesactivationfromthelowerlayeraswellasitpreviousactivationvalue. Long/ShortTermMemoryNetwork(LSTM)—Typeofneuralnetworkinwhichmemorycellis incorporatedinsidehiddenlayerneuronsiscalledLSTMnetwork. ConvolutionalNeuralNetwork— GetacompleteoverviewofConvolutionalNeuralNetworksthroughour blogLog AnalyticswithMachineLearning and DeepLearning.

  13. Example:ALVINN • (Pomerleau,1993) • ALVINNlearnstodriveanautonomousvehicle • atnormalspeedsonpublichighways. • Input:30x32gridofpixelintensitiesfrom • camera

  14. (Note:biasunitsand weightsnotshown) Eachoutputunitcorrespondtoaparticularsteeringdirection.Themosthighlyactivatedonegivesthedirectiontosteer.

  15. Example:DeepMind(DeepQlearningfor Atari,2014)

  16. Activationfunctions • Advantagesofsigmoidfunction:nonlinear,differentiable,hasreal-valuedoutputs,andapproximatesathresholdfunction.

  17. Sigmoidactivationfunction: o(wx), where (z) 1 1ez

  18. Thederivativeofthesigmoidactivationfunctioniseasily • expressedintermsofthefunctionitself: • d(z)(z)(1(z)) dz Thisisusefulinderivingtheback-propagationalgorithm.

  19. (z)(1(z)) 1 (1ez)1 1ez (z)    1    1  1   1ez  1ez  d1(1ez)2 d (1ez) 2   1 1    dz dz 1ez 1ez      1 1   1   ez  z (1ez)2   1e (1ez)2   1e 1 z    z 2z 2 (1e ) (1e )   z e  z e (1ez)2  (1ez)2 d(z) (z)(1(z)) And thus themath Gods said… dz

  20. Neuralnetworknotation (activationrepresentsclassification) (internal representation) (activationsrepresentfeaturevectorforonetrainingexample)

  21. Neuralnetworknotation (activationrepresentsclassification) (internal representation) (activationsrepresentfeaturevectorforonetrainingexample) Sigmoidfunction:

  22. Neuralnetworknotation xi:activationofinputnodei. hj:activationofhiddennodej. (activation ok:activationofoutputnodek. representsclassification) wji:weightfromnodeitonodej. o:sigmoidfunction. (internal representation) (activationsrepresentfeaturevectorforonetrainingexample) Foreachnodejinhiddenlayer,   hj  wjixi wj0 iinputlayer  Sigmoidfunction: Foreachnodekinoutputlayer,   ok wkjhj wk0 jhiddenlayer 

  23. Classificationwithatwo-layerneuralnetwork (“Forwardpropagation”) Assumetwo-layernetworks(i.e.,onehiddenlayer): Presentinputtotheinputlayer. Forwardpropagatetheactivationstimestheweightstoeachnodeinthehiddenlayer. Applyactivationfunction(sigmoid)tosumofweightstimesinputs toeachhiddenunit. Forwardpropagatetheactivationstimesweightsfromthehiddenlayertotheoutputlayer. Applyactivationfunction(sigmoid)tosumofweightstimesinputs toeachoutputunit. Interprettheoutputlayerasaclassification.

  24. SimpleExample Input: HiddenLayer: o1 o1 o1 o1 .1 .1 −.5 0.470 −.5 −.2 −.2 −.1 −.1 0.547 h2 h1 .1 .1 −.2 .3 −.4 −.2 −.4 .1 .2 .3 .1 .2 1 x1 0.4 x2 0.1 1 x1 0.4 x2 0.1

  25. OutputLayer: 0.461 0.455 .1 −.5 −.2 −.1 0.547 0.470 .1 −.2 −.4 .3 .1 .2 1 x1 0.4 x2 0.1

  26. “Softmax”operation Oftenusedtoturnoutputvaluesintoaprobabilitydistribution eoi ysm(oi) , ysm= .501 ysm= .499 K eok k1 0.461 0.455 .1 whereKis thenumberofoutputunits. −.5 −.2 −.1 0.547 0.470 .1 −.2 −.4 .3 .1 .2 1 x1 0.4 x2 0.1

  27. Whatkindsofproblemsaresuitableforneuralnetworks? • Havesufficienttrainingdata • Longtrainingtimesareacceptable • Notnecessaryforhumanstounderstandlearnedtarget • functionorhypothesis

  28. Advantagesofneuralnetworks • Designedtobeparallelized(e.g.splitminibatches,useGPUs) • Robustonnoisytrainingdata • Fasttoevaluatenewexamples

  29. Trainingamulti-layerneuralnetwork Repeatforagivennumberofepochsoruntilaccuracyontrainingdataisacceptable: Foreachtrainingexample: 1.Presentinputtotheinputlayer. Forwardpropagatetheactivationstimestheweightstoeachnode inthehiddenlayer. Forwardpropagatetheactivationstimesweightsfromthehiddenlayertotheoutputlayer. Ateachoutputunit,determinetheerror. Runtheback-propagationalgorithmone layer atatimetoupdateallweightsinthenetwork.

  30. Trainingamultilayerneuralnetworkwithback- • propagation(stochasticgradientdescent) • Supposetrainingexamplehasform(x,t) • (i.e.,bothinputandtargetarevectors). • Error(or“loss”)Eissum-squarederroroveralloutputunits: E(w)1 2  koutputlayer (to)2 k k • Goaloflearningistominimizethemeansum-squarederror • overthetrainingset.

  31. Trainingamultilayerneuralnetworkwithback- • propagation(stochasticgradientdescent) • Idea--Minimizesum-of-squareserror E(w)1 2  koutputlayer (to)2 k k • overtheentiretrainingdataset. • Notethatwe“tune”theparametersoftheNN(the weights)during • training. Theweightsofthenetworkaretrainedsothattheerrorgoesdownhilluntilitreachesa localminimum, justlikeaball rollingundergravity.

  32. GeoffreyHinton:NNtrainingwithMNIST

  33. Aiva:AIComposedMusic(2017)

  34. Laterintheslideswewillderivetheback-propagationequations(youcanalsofindaderivationinthetext).Laterintheslideswewillderivetheback-propagationequations(youcanalsofindaderivationinthetext). Thederivationcanbesomewhatchallenging,however,youonlyneedonebasictooltoderivethem:multi-variatedifferentiation(e.g.chainrule,partialderivatives). Fornow,let’sjust walkthroughthe basicalgorithm.

  35. Backpropagationalgorithm • (StochasticGradientDescent) • Initializethenetworkweightswtosmallrandomnumbers(e.g., • between−0.05and0.05). • Untiltheterminationconditionismet,Do: • Foreach(x,t)trainingset,Do: • Propagatetheinputforward: • Inputxtothenetworkandcomputetheactivationhjof • eachhiddenunitj. • Computetheactivationokofeachoutputunitk.

  36. 2.Calculateerrorterms Foreachoutputunitk,calculateerrortermk: Foreachhiddenunitj,calculateerrortermj:   j hj(1hj) wkj k koutput units 

  37. 2.Calculateerrorterms Foreachoutputunitk,calculateerrortermk: Foreachhiddenunitj,calculateerrortermj:   j hj(1hj) wkj k koutput units 

  38. 3.Updateweights HiddentoOutputlayer:Foreachweightwkj wkjwkjwkj where wkjkhj InputtoHiddenlayer:Foreachweightwji wji wji wji where wji jxi

  39. BackpropagationAlgorithm(BP) – ForwardsPhase:computetheactivationofeachneuroninthehiddenlayersandoutputsusing:    ok  wkjhj wk0  wjixiwj0 hj jhiddenlayer  iinputlayer • Backwardspass • Computetheerrorattheoutputusing:    – Computetheerroratthehiddenlayer(s)using:jhj(1hj) wkjk koutput units  – Updatetheoutputlayerweightsusing:wkj wkj wkj wkj khj where • Updatethehiddenlayerweightsusing:wjiwjiwji • wherewjijxi • (Ifusingsequentialupdating)randomizetheorderoftheinputvectorsso thatyoudon’ttrain in exactly the sameordereach iteration.

  40. TrainingTime • TheAimistobalancebetweenGeneralization&Memorization • (Minimizingcostfunctionisnotnecessarilygoodidea). • Usingtwo(orthree)disjointsets: • Training-TestingSets • Training-Testing-ValidationSets • Aslongastheerrorforthetraining-testingsetdecreases,trainingcontinues(unlessmax#iterationsachieved). • Whentheerrorbeginstoincrease,thenetisstartingtomemorize.

  41. SomeProsandConsofBP • Connectionism • BiologicalIssues • Noexcitatoryorinhibitoryforrealneurons • NoGlobalconnectioninMLP • Nobackwardpropagationinrealneurons • Usefulinparallelhardwareimplementation • ComputationalEfficiency • LearningAlgorithmissaidtobecomputationallyefficient,whenitscomplexityispolynomial. • TheBPalgorithmiscomputationallyefficient. • InMLPwithatotalofWweights,itscomplexityislinearinW • LocalMinima • – Presenceoflocalminimaisasignificantissue,particularlyforhighdimensionaldata.

  42. Batch(or “True”) GradientDescent: Changeweightsonlyafteraveraginggradientsfromalltraining examples: Weightsfromhiddenunitstooutputunits: Weightsfrominputunitstohiddenunits:

  43. Mini-BatchGradientDescent:Changeweightsonlyafteraveraginggradientsfromasubset ofBtrainingexamples: Ateachiterationt:GetnextsubsetofBtrainingexamples,Bt,untilallexampleshavebeenprocessed. Weightsfromhiddenunitstooutputunits: Weightsfrominputunitstohiddenunits:

  44. LocalMinima,Momentum,etc. • RecallthatBPis aninstanceof“hillclimbing”(e.g.gradientdescent). Withnon-convexproblemswearenotguaranteedtosettleintoaglobalminimum. • Ifwethinkoftheanalogyofaballrollingdownahill,wecanconsidergivingtheballsome“weight”byimplementinga momentumterm. • Thepurposeofthemomentumtermistomitigatetheinstanceofgetting“stuck”inalocalminimum(i.e. a“valley”)andtoavoid performanceoscillationsduringtraining.

  45. Momentum Introduceamomentumterm,inwhichchangeinweightisdependentonpastweightchange: (hidden-to-output)(input-to-hidden) wheretistheiterationthroughthemainloopofback-propagation.αisaparameterbetween0and1;αdeterminesthe“strength”of themomentumterm. Theideaistokeepweightchangesmovinginthesame direction.

  46. Updateweights,withmomentum HiddentoOutputlayer:Foreachweightwkj wkjwkjwkj where InputtoHiddenlayer:Foreachweightwji wji wji wji where

  47. BackpropExample

  48. Trainingset: 1 0 Test set: 1 Label:0.9 1 Label:.8 0 1 Label:-.3 o1 .1 .1 .1 h2 1 h1 .1 .1 .1 .1 .1 .1 1 x1 x2

  49. Trainingset: 1 0 Test set: 1 Label:.9 1 Label:.8 0 1 Label:-.3 Target:.9 o1 .1 .1 .1 h2 1 h1 .1 .1 .1 .1 .1 .1 1 x1 1 x2 0

More Related
SlideServe
Audio
Live Player
Audio Wave
Play slide audio to activate visualizer