1 / 26

7. Computational Learning Theory

7. Computational Learning Theory. 신수용. 개요. Introduction PAC Sample Complexity for Finite Hypothesis Spaces for Infinite Hypothesis Spaces Mistake Bound Model. 7.1 Introduction. Machine Learning 학습시의 의문점들 학습 알고리즘과 무관하게 학습문제가 쉬운지 어려운지 판단할 수 있는가?

valvo
Download Presentation

7. Computational Learning Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 7. Computational Learning Theory 신수용

  2. 개요 • Introduction • PAC • Sample Complexity for Finite Hypothesis Spaces • for Infinite Hypothesis Spaces • Mistake Bound Model

  3. 7.1 Introduction • Machine Learning 학습시의 의문점들 • 학습 알고리즘과 무관하게 학습문제가 쉬운지 어려운지 판단할 수 있는가? • 성공적인 학습을 위한 training example의 개수를 판단할 수 있는가? • Learner가 trainer에게 질문을 하는 것은 어떤 영향을 미치는가? • Learner가 학습을 마치기 전까지 행하는 mistake의 수는 얼마인가? • 학습 문제의 계산 복잡도를 판단할 수 있는가?

  4. 7.1 (2) • Our Goal • Sample Complexity • Learner가 성공적인 가설을 성립하기까지 필요한 training example의 개수 • Computational Complexity • Learner가 성공적인 가설을 성립하기까지 필요한 계산 시간 • Mistake bound • Learner가 성공적인 가설을 성립하기까지 잘못 분류하는 training example의 개수

  5. 7.2 Probably Learning an Approximately Correct Hypothesis • Problem Setting • X : set of all possible instances • C : set of target concept • D : probability distribution • 조건 : stationary

  6. 7.2 PAC (2) • Error of a Hypothesis • True Error(errorD(h)) • with respect to target concept c and distribution D is the probability that h will misclassify an instance drawn at random according to D • training error • the fraction of training examples misclassified by h

  7. 7.2 PAC (3) • Sample error • with respect to a set S of samples to be the fraction of S misclassified by h • S = training example -> sample error = training error • PAC Learnability • errorD(h) = 0 : 효과없음 • 모든 가능한 instance를 제공하지 않으면, 제공된 training example에 대해서 다양한 가설이 존재해서 적합한 가설을 선택하기 어려움

  8. 7.2 PAC (4) • Training example이 임의로 선택되기 때문에, 오류가 있는 training example이 선택될 수 있음 • 조건 완화 • error가 0이 아니어도 됨 • 충분히 작은 상수 보다 작은 수의 error는 허용 • 모든 training 예제에 대해서 성공하지 않아도 됨 • 충분히 작은 상수 보다 작은 수의 실패 확률은 허용 • the learner probably learn a hypothesis that is approximately correct(PAC learning)

  9. 7.2 PAC (5) • PAC • PAC-learnable • PAC인 경우에 polynomial-time algorithm and polynomial function이 존재 • C is PAC-learnable : 0 < < 1/2, 0 < < 1/2, learner L will with probability at least (1 - ) output a hypothesis such that in time that is polynomial in

  10. 7.3 Sample complexity for finite hypothesis spaces • Sample complexity for finite hypothesis spaces • consistent learner • outputs hypothesis that perfectly fit the training data • version space (VSH,D) • set of all hypothesis that correctly classify the training examples D.

  11. 7.3 Sample complexity for finite hypothesis spaces (1) • Consistent learner에게 필요한 예제의 수는 version space가 적합한 가설만을 가지게 되는 예제의 수와 같다. • . • Version space is said to be with respect to c and D, if every hypothesis h in VSH,D has error less than with respect to c and D • all the hypotheses consistent with the observed training examples happen to have true error less than

  12. 7.3 Sample complexity for finite hypothesis spaces (3) • -exhausting the version space • H is finite, D is a sequence of , the probability that the version space VSH,D is not -exhausted is less than or equal to • PAC 도입

  13. 7.3 (4) • Agnostic Learning and Inconsistent Hypothesis • agnostic learner • target concept가 H에 의해서 표현 가능한지 여부에 대해서 가정하지 않고, 최소한의 training error를 가지는 hypothesis를 찾는 학습자 • hbest • hypothesis from H having lowest training error over the training example

  14. 7.3 (5) • 실제 학습시 error : • 지금까지 제한 조건 : errorD(hbest) = 0 • Hoeffding bounds • deviation between the true probability of some event and its observed frequency over m independent trials

  15. 7.3 (6) • Conjunctions of Boolean Literals Are PAC-Learnable • |H| = 3n • FIND-S algorithm을 쓰면 boolean conjunction은 PAC-learnable • PAC-Learnability of Other Concept Classes • Unbiased Learners

  16. 7.3 (7) • Set C of all definable target concepts corresponds to the power set of X : • unbiased class : exponential sample complexity • k-terms DNF and k-CNF concepts • polynomial sample complexity를 가지지만 polynomial time에 학습불가능한 경우 • k-DNF

  17. 7.4 Sample Complexity for Infinite Hypothesis Spaces • Sample Complexity를 |H| term을 사용해서 표현시 나쁜점 • quite weak bounds • infinite hypothesis space는 표현 불가능 • H’sComplexity의 다른 표현법(instead of |H|) • Vapnik-Chervonenkis dimension of H • VC dimension, or VC(H) • 보다 tight한 bound를 제공 • number of distinct instances from X that can be completely discriminated using H

  18. 7.4 Sample Complexity for Infinite Hypothesis Spaces (2) • Shattering a Set of Instances • Shatter • A set of instances S is shattered by hypothesis space H if and only if for every dichotomy of S there exists some hypothesis in H consistent with this dichotomy • The Vapnik-Chervonenkis Dimension • size of the largest finite subset of X shattered by H. • If arbitrarily large finite sets of X can be shattered by H, then

  19. 7.4 Sample Complexity for Infinite Hypothesis Spaces (3)

  20. 7.4 Sample Complexity for Infinite Hypothesis Spaces (4) • Illustrative Examples • X : set of real numbers • H : set of intervals on the real number line • S = {3.1, 5.7} • 4개의 hypothesis가 필요 • (1 < x <2), (1 < x <4), (4 < x < 7), (1 < x <7) • VC(H) = 2 (at least) • S = {1.5, 4.2, 5.1} • Not shattered -> VC(H) = 2 (final) • H : infinite -> VC(H) : finite

  21. 7.4 (5) • X : points on the x, y planes • H : all linear decision surfaces in the plane • VC(H) = 3 • VC dimension of linear decision surfaces in an r dimensional space : r + 1 • ex) (a) (b)

  22. 7.4 (6) • Sample Complexity and the VC Dimension • training example 개수의 upper tight bound • sufficient number for successful learning • lower bound on sample complexity • necessary number for successful learning

  23. 7.5 The Mistake Bound Model of Learning • Mistake bound • total number of mistakes it makes before it converges to the correct hypothesis • Mistake Bound for the FIND-S Algorithm • FIND-S • Initialize h to the most specific hypothesis • For each positive training instance x • remove from h any literal that is not satisfied by x • Output hypothesis h.

  24. 7.5 The Mistake Bound Model of Learning (2) • 최악 경우의 mistake bound : n + 1 • ex) • Mistake Bound for the HALVING Algorithm • Having Algorithm • version space + majority vote • worst case mistake bound

  25. 7.5 The Mistake Bound Model of Learning (3) • Optimal Mistake Bounds • MA(c) • A가 c를 학습하기 위해서 발생시키는 최대 error • optimal mistake bound (Opt(C)) • minimum over all possible learning algorithm A of MA(C)

  26. 7.5 (4) • VC와 Mistake Bound와의 관계 • WEIGHTED-MAJORITY Algorithm • weighted vote 이용 • Relative mistake bound for Weighted-Majority

More Related