1 / 71

Machine Learning Algorithms in Computational Learning Theory

Machine Learning Algorithms in Computational Learning Theory. TIAN HE JI GUAN WANG. Shangxuan Xiangnan Kun Peiyong Hancheng. 25 th Jan 2013. Outlines. Introduction Probably Approximately Correct Framework (PAC) PAC Framework Weak PAC-Learnability Error Reduction

presley
Download Presentation

Machine Learning Algorithms in Computational Learning Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine LearningAlgorithms inComputational Learning Theory TIAN HE JI GUAN WANG Shangxuan XiangnanKun PeiyongHancheng 25th Jan 2013

  2. Outlines • Introduction • Probably Approximately Correct Framework (PAC) • PAC Framework • Weak PAC-Learnability • Error Reduction • Mistake Bound Model of Learning • Mistake Bound Model • Predicting from Expert Advice • The Weighted Majority Algorithm • Online Learning from Examples • The Winnow Algorithm • PAC versus Mistake Bound Model • Conclusion • Q & A

  3. Machine Learning Machine cannot learn but can be trained.

  4. Machine Learning • Definition "A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E". ---- Tom M.Mitchell • Algorithm types • Supervised learning : Regression, Label • Unsupervised learning : Clustering, Data Mining • Reinforcement learning : Act better with observations.

  5. Machine Learning • Other Examples • Medical diagnosis • Handwritten character recognition • Customer segmentation (marketing) • Document segmentation (classifying news) • Spam filtering • Weather prediction and climate tracking • Gene prediction • Face recognition

  6. Computational Learning Theory • Why learning works • Under what conditions is successful learning possible and impossible? • Under what conditions is a particular learning algorithm assured of learning successfully? • We need particular settings (models) • Probably approximately correct (PAC) • Mistake bound models

  7. Probably Approximately Correct Framework (PAC) • PAC Learnability • Weak PAC-Learnability • Error Reduction • Occam’s Razor

  8. PAC Learning • PAC Learning • Any hypothesis that is consistent with a sufficiently large set of training examples is unlikely to be wrong. • Stationarity : The future being like the past. • Concept: An efficiently computable function of a domain. Function : {0,1} n -> {0,1} . • A concept class is a collection of concepts.

  9. PAC Learnability • Learnability • Requirements for ALG • ALG must, with arbitrarily high probability (1-d), output a hypothesis having arbitrarily low error(e). • ALG must do as efficiently as in time that grows at most polynomially with 1/d and 1/e.

  10. PAC Learning for Decision Lists • A Decision List (DL) is a way of presenting certain class of functions over n-tuples. • Example if x4 = 1 then f(x) = 0 else if x2 = 1 then f(x) = 1 else f(x) = 0. Upper bound on the number of all possible boolean decision lists on n variables is : n!4 n= O(n n )

  11. PAC Learning for Decision Lists • Algorithms : A greedy approach (Rivest, 1987) • If the example set S is empty, halt. • Examine each term of length k until a term t is found s.t. all examples in S which make t true are of the same type v. • Add (t, v) to decision list and remove those examples from S. • Repeat 1-3. • Clearly, it runs in polynomial time.

  12. What does PAC do? • A supervised learning framework to classify data

  13. How can we use PAC? • Use PAC as a general framework to guide us on efficient sampling for machine learning • Use PAC as a theoretical analyzer to distinguish hard problems from easy problems • Use PAC to evaluate the performance of some algorithms • Use PAC to solve some real problems

  14. What we are going to cover? • Explore what PAC can learn • Apply PAC to real data with noise • Give a probabilistic analysis on the performance of PAC

  15. PAC Learning for Decision Lists • Algorithms : A greedy approach

  16. Analysis of Greedy Algorithm • The output • Performance Guarantee

  17. PAC Learning for Decision Lists 1. For a given S, by partitioning the set of all concepts that agree with f on S into a “bad” set and a “good”, we want to achieve 2. Consider any h, the probability that we pick S such that h ends up in bad set is 3. 4. Putting together

  18. The Limitation of PAC for DLs • What if the examples are like below?

  19. Other Concept Classes • Decision tree : Dts of restricted size are not PAC-learnable, although those of arbitrary size are. • AND-formulas: PAC-learnable. • 3-CNF formulas: PAC-learnable. • 3-term DNF formulas: In fact, it turns out that it is an NP-hard problem, given S, to come up with a 3-term DNF formula that is consistent with S. Therefore this concept class is not PAC-learnable—but only for now, as we shall soon revisit this class with a modified definition of PAC-learning.

  20. Revised Definition for PAC Learning

  21. Weak PAC-Learnability Benefits: • To loose the requirements for a highly accurate algorithm • To reduce the running time as |S| can be smaller • To find a “good” concept using the simple algorithm A

  22. Confidence Boosting Algorithm

  23. Boosting the Confidence

  24. Boosting the Confidence

  25. Boosting the Confidence

  26. Error Reduction by Boosting • The basic idea exploits the fact that you can learn a little on every distribution and with more iterations we can get much lower error rate.

  27. Error Reduction by Boosting • Detailed Steps: 1. Some algorithm A produces a hypothesis that has an error probability of no more than p = 1/2−γ (γ>0). We would like to decrease this error probability to 1/2−γ′ with γ′> γ. 2. We invoke A three times, each time with a slightly different distribution and get hypothesis h1, h2 and h3, respectively. 3. Final hypothesis then becomes h=Maj(h1, h2,h3).

  28. Error Reduction by Boosting • Learn h1 from D1 with error p • Modify D1 so that the total weight of incorrectly marked examples are 1/2, thus we get D2. Pick sample S2 from this distribution and use A to learn h2. • Modify D2 so that h1 and h2 always disagree, thus we get D3. Pick sample S3 from this distribution and use A to learn h3.

  29. Error Reduction by Boosting • The total error probability h is at most 3p^2−2p^3, which is less than p when p∈(0,1/2). The proof of how to get this probability is shown in [1]. • Thus there exists γ′> γ such that the error probability of our new hypothesis is at most 1/2−γ′. [1] http://courses.csail.mit.edu/6.858/lecture-12.ps

  30. Error Reduction by Boosting

  31. Adaboost • Defines a classifier using an additive model:

  32. Adaboost

  33. Adaboost Example

  34. Adaboost Example

  35. Adaboost Example

  36. Adaboost Example

  37. Adaboost Example

  38. Adaboost Example

  39. Error Reduction by Boosting Fig. Error curves for boosting C4.5 on the letter dataset as reported by Schapire et al.[]. Training and test error curves are lower and upper curves respectively.

  40. PAC learning conclusion • Strong PAC learning • Weak PAC learning • Error reduction and boosting

  41. Mistake Bound Model of Learning • Mistake Bound Model • Predicting from Expert Advice • The Weighted Majority Algorithm • Online Learning from Examples • The Winnow Algorithm

  42. Mistake Bound Model of Learning | Basic Settings • x – examples • c – the target function, ct ∈ C • x1, x2… xt an input series • at the tth stage • The algorithm receives xt • The algorithm predicts a classification for xt, bt • The algorithm receives the true classification, ct(x). • a mistake occurs if ct(xt) ≠ bt

  43. Mistake Bound Model of Learning | Basic Settings • A hypotheses class C has an algorithm A with mistake M: • if for any concept c ∈ C, and • for any ordering of examples, • the total number of mistakes ever made by A is bounded by M.

  44. Mistake Bound Model of Learning | Basic Settings • Predicting from Expert Advice • The Weighted Majority Algorithm • Online Learning from Examples • The Winnow Algorithm

  45. Predicting from Expert Advice • The Weighted Majority Algorithm • Deterministic • Randomized Predicting from Expert Advice

  46. Predicting from Expert Advice | Basic Flow Combining Expert Advice Truth Prediction Assumption: prediction ∈ {0, 1}.

  47. Predicting from Expert Advice | Trial (1) Receiving prediction from experts (2) Making its own prediction (3) Being told the correct answer

  48. Predicting from Expert Advice | An Example • Task : predicting whether it will rain today. • Input : advices of n experts ∈ {1 (yes), 0 (no)}. • Output : 1 or 0. • Goal: make the least number of mistakes.

  49. The Weighted Majority Algorithm | Deterministic

  50. The Weighted Majority Algorithm | Deterministic 1 0 1 1 1 1 1 2 1 1 0 1 0 1 0.50 1 2 0.50 0 1 1 0 1 0.50 0.50 0.50 0.50 1 1 1 0 1 1 0.50 0.25 0.50 0.50 0.75 1 1 1 0 1 0.25 0.25 0.50 0.25 0.75 1 1

More Related