1 / 63

Feedback from last week

Feedback from last week. Too much slides / Too much information in too little time  AI is a complex topic with different subjects  Overall lecture is not enough time (1.5h + Practice)

shaunm
Download Presentation

Feedback from last week

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Feedback from last week Too much slides / Too much information in too little time  AI is a complex topic with different subjects  Overall lecture is not enough time (1.5h + Practice)  Use the summary and the learning results: Each slide is just for you to understand everything in total, but the focus of what you need to learn is clear based on learning results  Trust us: First lecture of his kind – We will combine learning results with exam questions Live Coding/ Practice Coding: the code could not be read on the beamer:  Uploading practice code and live code before the lecture  Use jupyter notebook for better explanation

  2. Objectives for Lecture 3: Regression Depth of understanding • After the lecture you are able to…

  3. Chapter: Motivation • Chapter: Linear Models • Chapter: Loss functions • Chapter: Regularization & Validation • Chapter: Practicalconsiderations • Chapter: Summary

  4. Motivation – Regression Example Data points Output variable or Labels Weight in kg Regression result Size in m Input variables

  5. Motivation – Algorithms in Machine Learning

  6. Motivation – Algorithms in Machine Learning • House pricing • Sales • Persons weight • Object detection • Spam detection • Cancer detection • Genome patterns • Google news • Pointcloud (Lidar) processing

  7. Motivation – Regression in Automotive Technology Sensor calibration • Usuallyelectricquantitiesaremeasured • Necessarytoconvertthemtophysicalquantities • Examples: • Accelerometers • Gyroscopes • Displacementsensors

  8. Motivation – Regression in Automotive Technology Parameter estimation • Vehicleparameters like areoftenonlyroughlyknown • Estimation via regressiontechniques

  9. Motivation – Regression in Automotive Technology Vehiclepricing • Regression iswidelyusedforfinancialrelations • Allowstocompressdatainto a simple modelandevaluate derivatives

  10. Motivation – Whyshouldyouuse Regression? Model structure Training Data Previouslyunseensetsof input variables Predictionsaboutoutput variables Predictive Model Based on thecombinationofdataandmodelstructure, itispossibletopredicttheoutcomeof a processorsystem Training datasetisusuallyonly a representation at sparsepointsandcontains lots ofnoise Allowsusageofinformation in simulation, optimization, etc.

  11. Relation ofstatisticsandmachinelearning Howcanweextractinformationfromdataandusethemtoreasonandpredictin beforehandunseencases? (learning) Nearly all classic machinelearningmethodscanbereinterpreted in termsofstatistics Focus in machinelearningismainly on prediction Statisticsoftenfocusses on relationanalysis Lots ofadvancedregressiontechniquesbuild upon a statisticalinterpretationofregression

  12. Chapter: Motivation • Chapter: Linear Models • Chapter: Loss functions • Chapter: Regularization& Validation • Chapter: Practicalconsiderations • Chapter: Summary

  13. Linear Basis Function Model Bias Term Input Variables Output Variables Weight Parameters Basis Functions Weight Parameters Basis Functions

  14. Representingthedatasetas a matrix Weightvector Output vector Design Matrix

  15. Nonlinearregression

  16. Workflow – How do weobtainmodelparameters?

  17. Basis functions – examples Linear function Polynomialfunction Sinusoidalfunction Gaussianbasisfunction

  18. Basis functions – Polynomials • Globallydefined on theindependent variable domain • Design matrixbecomesill-conditionedfor large inputdomain variables forstandardpolynomials • Hyperparameter: • Polynomialdegree

  19. Basis functions– Gaussians • Locally defined on theindependent variable domain • Sparse design matrix • Infinitlydifferentiable • Hyperparameter: • NumberofGaussianfunctions • Width ofeachbasisfunction • Meanofeachbasisfunction

  20. Basis functions – comparisonoflocaland global Global basisfunction Localbasisfunction Spreadparameter: 0.3

  21. Basis functions – other

  22. Chapter: Motivation • Chapter: Linear Models • Chapter: Loss functions • Chapter: Regularization & Validation • Chapter: Practicalconsiderations • Chapter: Summary

  23. Loss functions • The lossfunctionsmeasurestheaccuracyofthemodelbased on thetrainingdataset • The bestmodelwecanobtain, istheminimumlossmodel • Choice of a lossfunctionis fundamental in theregressionproblem • Minimizethelossfunctionforthetrainingdatasetconsistingofindependent variables andtarget variables byvariationofthebasisfunctionweights.

  24. Loss functions – MeanSquared Error (MSE or L2) Pro‘s: Veryimportant in practicalapplications Solution canbeeasilyobtainedanalytically Con‘s: Not robust tooutliers Examples: Basic regression Energyoptimization Control applications

  25. Loss functions – Mean Absolute Error (MAE or L1) Pro‘s: Robust tooutliers Con‘s: Noanalyticalsolution Non-differentiable in theorigin Examples: Financial applications

  26. Loss functions – Huber Loss Pro‘s: Combinesstrengthsandweaknessesof L1 and L2 lossfunctions Robust + differentiable Con‘s: More hyperparameters Noanalyticalsolution

  27. Loss functions – Comparison L2 lossisdifferentiable L1 lossismore intuitive Huber Loss combinestheoreticalstrengthsofboth Practicalhints: Start with L2 losswheneverpossible Think aboutphysicalinsightsandyourintent!

  28. Analytic Solution – Low dimensional example Solve theoptimizationproblem with themodel Insert themodelanddatapoints In general, optimal solutionsareobtained at thepointswherethegradientvanishestozero.

  29. Analytic Solution – Low dimensional example Calculate thegradient andsetitequaltozero

  30. Analytic Solution – Low dimensional example Solve theresultingequation (also called normal equation):

  31. Analytic Solution – General form Minimizing MSE lossfunctioncanberewritten in matrix form Optimum valueforisequaltosettingthegradienttozeroandsolvefor The importanceofthislossfunctionistightlyrelatedtothefactthattheanalyticalsolutionisavailableandcanbecalculatedexplicitlyforlow- to medium sizeddatasets!

  32. SequentialAnalytic Solution - Motivation Actualbestestimate RLS Update rule New datapoint • Considerthefollowingcases: • Applyregressionduringoperationoftheproduct • Thereis not enoughmemorytostore all datapoints • A possiblesolutionisgivenbyRecursive Least Squares (RLS)

  33. SequentialAnalyticSolution – The algorithm Predictionbased on oldparameters Residual Old parameterestimate Correctiongain New datapoint: Update theparameters Andthememorymatrix withbeingtheidentitymatrixofappropriatedimension

  34. SequentialAnalyticSolution – Forgettingfactor • Some applicationsshowslowlyvaryingconditions in thelongterm, but canbeconsideredstationary on shortto medium time periods • Agingofproductsleadstoslightparameterchanges • Vehiclemassisusuallyconstantover a significantperiodof time • The RLS algorithmcan deal withthisbyintroductionof a forgettingfactor. This leadsto a reductionofweightforoldsamples.

  35. Numerical Iterative Solutions Con‘s: • Knowledge aboutnumericoptimizationnecessary Optimum Costfunction Parameter • Regression canbesolvednumerically • Importantfor large-scaleproblemsandfor non-quadraticlossfunctions • Popularmethods: • Gradient descent • Gauss-Newton • Levenberg-Marquardt Pro‘s: • Verygeneric

  36. Constraints on theweights • Weightscanbeinterpretedasphysicalquantities • Temperature (non-negative) • Spring constants (non-negative) • Mass (non-negative) • A valid rangeisknownfortheweights • Tireandotherfrictionmodels • Efficiency ( 0 – 100 % ) • Improvesrobustness • More difficulttosolve

  37. Howtosolvetheregressionproblem? Isthecostfunctionquadratic? no yes Are thereparameterconstraints? no yes Isthedatasetvery large? yes no Is all dataavailableinstantanously? yes no SequentialAnalytic Solution Numeric Iterative Solution Analytic Solution

  38. Chapter: Motivation • Chapter: Linear Models • Chapter: Loss functions • Chapter: Regularization & Validation • Chapter: Practicalconsiderations • Chapter: Summary

  39. Howtochoosethemodel? Underfitted Welldone Overfitted • Toomanyfeatures • Unrelevantfeatures • Not enoughfeatures • Wrongstructure

  40. Overfitting – Choice ofhyperparameters Figuresource: Bishop – Pattern Recognition andMachine Learning Overfittingisthefailuretogeneralizeproperlybetweenthedatapoints Costfunctiondecreaseswithincreasedmodelcomplexity Noise andunrelevanteffectsbecometooimportant

  41. Overfitting – Curseofdimensionality 16 samples in one, twoandthree dimensional space • Overfittingoccursif • datapointsaresparse • Model complexityis high • Sparsityofdatapointsisdifficulttograsp • Sparsityincreases fast withincreasedinputdimension

  42. Validation datasets Validation Data Train model A Evaluate Training Data Available Data Train model on complete Dataset Train model B Evaluate Evaluate Train model C Best model Difficulttojudgeoverfitting in high-dimensional domainsandautonomoussystems A standardtechniqueisto separate thedataintotrainingandvalidationdata

  43. Validation datasets Figuresource: Bishop – Pattern Recognition andMachine Learning Increased Model Complexity Different hyperparameterscanbeusedto tune themodel Validation techniqueworksfor all ofthem

  44. Common pitfallswithvalidationdatasets • Beawarethatyourvalidationdataset must reflectthefuturepropertiesoftheunderlyingphysicalrelationship. • Do not reusevalidationdatasets. Ifthe same validationsetisusedagainandagainfortestingthemodelperformanceitissomehow incorporated intothemodellingprocessanddoes not givetheexpectedresultsanymore! • Split thedatabeforefittingthemodelistherefore essential. Taking 2/3 ofthedataastrainingdatais a goodstartingvalue. Visualizeyourdataasmuchaspossible!

  45. k-Fold Cross-validation Training Data Validation Data Validation Data Validation Data Validation Data fold • In caseof limited datasizesets, onemay not wanttoremove a substantial partofthedataforthefittingprocess • Onecanusesmallervalidationssetstoestimatethetruepredictionerrorbysplittingthedatainto multiple ‚folds‘ • Varianceoftheestimationerroris an indicatorformodelstability

  46. Regularization • From a design pointofview, wewanttochoosemodelstructurebased on underlyingphysicalprinciplesandnot on thecharacteristicsofthedataset • Polynomialbasisfunctionstendtohave large coefficientsforsparsedatasets • Gaussianbasisfunctionstendtooverfitlocally, whichleadstosingle, large coefficients • A techniquetocircumventthisisregularization • Penalize high coefficients in theoptimizationpreventstheseeffects • Weightingofpenaltytermgives an intuitive hyperparametertocontrolmodelcomplexity

  47. TypicalRegularization – Ridge Regression Regularization Term Other names: L2 regularization, Thikonovregularization Preventsoverfittingwell Analyticsolutionisavailableas an extensiontothe MSE problem Difficulttoapplyand tune in high-dimensional featurespaces

  48. TypicalRegularization – Lasso Regression Regularization Term • Other names: L1 regularization • Tendstoproducesparsesolutionsandcanthereforebeappliedforfeatureselection • Sparsesolutionmeans, thatseveralcoefficientsgotozero:

More Related