1 / 28

Machine learning for economists

Machine learning for economists. ME!. Hannes Rosenbusch Social Psychology PhD project on data science methods for psychology. Agenda. What is machine learning ? Important concepts Classic prediction models Coding. Machine learning is…. Actually it is this …. y = m * x + b

kittredge
Download Presentation

Machine learning for economists

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine learningforeconomists

  2. ME! • Hannes Rosenbusch • SocialPsychology • PhD project on data sciencemethodsforpsychology

  3. Agenda • What is machine learning? • Important concepts • Classic predictionmodels • Coding

  4. Machine learning is…

  5. Actuallyit is this… • y = m * x + b • Predictionmodels (supervised machine learning) • Patternrecognition (unsupervised)

  6. What is thedifferencetomywork?

  7. What is thedifferencetomywork? • Slightdifference in focus

  8. Am I already machine learning? • Machine learning = regressionmodels + focus on prediction? • Kind of, yes. • I thought I canaskfor a bigger computer?!?!

  9. Benefits of ML • Reminduswhat is useful (prediction) • Unbiasedquantifications of predictionaccuracy • Better, cooler, more accurate predictionmodels

  10. How toquantifyaccuracy • We needtoquantifyaccuracy  manymetricesavailable

  11. Basic concepts

  12. How toquantifyaccuracy • We needtoquantifyaccuracy  manymetricesavailable • … and make predictionsfor “new samples” • Fit model on sample A + Evaluatemodel on sample A  BIAS • Evaluate model on sample B instead

  13. Basic concepts

  14. Hold out method • Training set & Test set • Build model with training set (50-80% of data) • Evaluateaccuracywith rest data • How much data do I need?

  15. Cross-validation • Split data intok sets • Build model withall but one set • Evaluateaccuracy on left out set • Rotateleft-out set

  16. Cross-validation

  17. Let’s get tothemodels • Linearandlogisticregression are machine learning! • But there are others! • Today we look at three • regularization-focused, tree-based, similarity-based

  18. Cool models 1: Penalizedregression • Normalregression: shrinkresiduals • Resulting model potentiallyunstable (multicolinearity) • Bettertransferabilityto new samples • Solution: minimizeresiduals AND coefficients (draw)

  19. Cool models 2: decision trees/forests • Yes/no rulesinsteadof betaweights • Where do rulescomefrom? • Optimization in training sample • Minimizehetereogeneity in leaves

  20. Cool models 2: decision trees/forests • Yes/no rulesinsteadof betaweights • Where do rulescomefrom? • Optimization in training sample • Minimizehetereogeneity in leaves

  21. Cool models 2: decision trees/forests • Random forest • Combination of different decision trees • Take a random sample of data • Take a random sample of variables at the splits • Build tree • Repeat 500 times

  22. Cool models 3: nearestneighborsmodels • Predict case that is most similarto new observation

  23. Hyperparameters • Youneedtotell these modelshowto act • Ridgeregression How muchpenalization? • Trees  How many branches? • Nearestneighbors  How manyneighbors? • Tuning means adjusting hyperparameters to make model accurate • Choosethe hyperparameter thatgiveshighestaccuracy on new data

  24. Workflow of machine learning

  25. In sum • Machine learning is notmagic…it is prediction • Socialscientists are very well preparedtoacquire ML skills • Extensionsto classic methods are: • Focus on prediction • Out-of-sample evaluation • Cool new models

  26. Break andthencoding

  27. Making it happen • Python offers simpleimplementations of: • many ML models • splittinginto training and test set • quantifications of accuracy • #1 ML package: sklearn

  28. Practicechallenge • Predictthemedianincome of US counties • You get a dataset with plenty of predictors • (census, electionresults, psych. Tests, Twitter) • Together we implement: • Regression model • Cross-validation • Evaluation

More Related