Computational Learning Theory: Introduction to PAC, Sample Complexity, and Mistake Bound Model

7. Computational Learning Theory 신수용

개요 • Introduction • PAC • Sample Complexity for Finite Hypothesis Spaces • for Infinite Hypothesis Spaces • Mistake Bound Model

7.1 Introduction • Machine Learning 학습시의 의문점들 • 학습 알고리즘과 무관하게 학습문제가 쉬운지 어려운지 판단할 수 있는가? • 성공적인 학습을 위한 training example의 개수를 판단할 수 있는가? • Learner가 trainer에게 질문을 하는 것은 어떤 영향을 미치는가? • Learner가 학습을 마치기 전까지 행하는 mistake의 수는 얼마인가? • 학습 문제의 계산 복잡도를 판단할 수 있는가?

7.1 (2) • Our Goal • Sample Complexity • Learner가 성공적인 가설을 성립하기까지 필요한 training example의 개수 • Computational Complexity • Learner가 성공적인 가설을 성립하기까지 필요한 계산 시간 • Mistake bound • Learner가 성공적인 가설을 성립하기까지 잘못 분류하는 training example의 개수

7.2 Probably Learning an Approximately Correct Hypothesis • Problem Setting • X : set of all possible instances • C : set of target concept • D : probability distribution • 조건 : stationary

7.2 PAC (2) • Error of a Hypothesis • True Error(errorD(h)) • with respect to target concept c and distribution D is the probability that h will misclassify an instance drawn at random according to D • training error • the fraction of training examples misclassified by h

7.2 PAC (3) • Sample error • with respect to a set S of samples to be the fraction of S misclassified by h • S = training example -> sample error = training error • PAC Learnability • errorD(h) = 0 : 효과없음 • 모든 가능한 instance를 제공하지 않으면, 제공된 training example에 대해서 다양한 가설이 존재해서 적합한 가설을 선택하기 어려움

7.2 PAC (4) • Training example이 임의로 선택되기 때문에, 오류가 있는 training example이 선택될 수 있음 • 조건 완화 • error가 0이 아니어도 됨 • 충분히 작은 상수 보다 작은 수의 error는 허용 • 모든 training 예제에 대해서 성공하지 않아도 됨 • 충분히 작은 상수 보다 작은 수의 실패 확률은 허용 • the learner probably learn a hypothesis that is approximately correct(PAC learning)

7.2 PAC (5) • PAC • PAC-learnable • PAC인 경우에 polynomial-time algorithm and polynomial function이 존재 • C is PAC-learnable : 0 < < 1/2, 0 < < 1/2, learner L will with probability at least (1 - ) output a hypothesis such that in time that is polynomial in

7.3 Sample complexity for finite hypothesis spaces • Sample complexity for finite hypothesis spaces • consistent learner • outputs hypothesis that perfectly fit the training data • version space (VSH,D) • set of all hypothesis that correctly classify the training examples D.

7.3 Sample complexity for finite hypothesis spaces (1) • Consistent learner에게 필요한 예제의 수는 version space가 적합한 가설만을 가지게 되는 예제의 수와 같다. • . • Version space is said to be with respect to c and D, if every hypothesis h in VSH,D has error less than with respect to c and D • all the hypotheses consistent with the observed training examples happen to have true error less than

7.3 Sample complexity for finite hypothesis spaces (3) • -exhausting the version space • H is finite, D is a sequence of , the probability that the version space VSH,D is not -exhausted is less than or equal to • PAC 도입

7.3 (4) • Agnostic Learning and Inconsistent Hypothesis • agnostic learner • target concept가 H에 의해서 표현 가능한지 여부에 대해서 가정하지 않고, 최소한의 training error를 가지는 hypothesis를 찾는 학습자 • hbest • hypothesis from H having lowest training error over the training example

7.3 (5) • 실제 학습시 error : • 지금까지 제한 조건 : errorD(hbest) = 0 • Hoeffding bounds • deviation between the true probability of some event and its observed frequency over m independent trials

7.3 (6) • Conjunctions of Boolean Literals Are PAC-Learnable • |H| = 3n • FIND-S algorithm을 쓰면 boolean conjunction은 PAC-learnable • PAC-Learnability of Other Concept Classes • Unbiased Learners

7.3 (7) • Set C of all definable target concepts corresponds to the power set of X : • unbiased class : exponential sample complexity • k-terms DNF and k-CNF concepts • polynomial sample complexity를 가지지만 polynomial time에 학습불가능한 경우 • k-DNF

7.4 Sample Complexity for Infinite Hypothesis Spaces • Sample Complexity를 |H| term을 사용해서 표현시 나쁜점 • quite weak bounds • infinite hypothesis space는 표현 불가능 • H’sComplexity의 다른 표현법(instead of |H|) • Vapnik-Chervonenkis dimension of H • VC dimension, or VC(H) • 보다 tight한 bound를 제공 • number of distinct instances from X that can be completely discriminated using H

7.4 Sample Complexity for Infinite Hypothesis Spaces (2) • Shattering a Set of Instances • Shatter • A set of instances S is shattered by hypothesis space H if and only if for every dichotomy of S there exists some hypothesis in H consistent with this dichotomy • The Vapnik-Chervonenkis Dimension • size of the largest finite subset of X shattered by H. • If arbitrarily large finite sets of X can be shattered by H, then

7.4 Sample Complexity for Infinite Hypothesis Spaces (3)

7.4 Sample Complexity for Infinite Hypothesis Spaces (4) • Illustrative Examples • X : set of real numbers • H : set of intervals on the real number line • S = {3.1, 5.7} • 4개의 hypothesis가 필요 • (1 < x <2), (1 < x <4), (4 < x < 7), (1 < x <7) • VC(H) = 2 (at least) • S = {1.5, 4.2, 5.1} • Not shattered -> VC(H) = 2 (final) • H : infinite -> VC(H) : finite

7.4 (5) • X : points on the x, y planes • H : all linear decision surfaces in the plane • VC(H) = 3 • VC dimension of linear decision surfaces in an r dimensional space : r + 1 • ex) (a) (b)

7.4 (6) • Sample Complexity and the VC Dimension • training example 개수의 upper tight bound • sufficient number for successful learning • lower bound on sample complexity • necessary number for successful learning

7.5 The Mistake Bound Model of Learning • Mistake bound • total number of mistakes it makes before it converges to the correct hypothesis • Mistake Bound for the FIND-S Algorithm • FIND-S • Initialize h to the most specific hypothesis • For each positive training instance x • remove from h any literal that is not satisfied by x • Output hypothesis h.

7.5 The Mistake Bound Model of Learning (2) • 최악 경우의 mistake bound : n + 1 • ex) • Mistake Bound for the HALVING Algorithm • Having Algorithm • version space + majority vote • worst case mistake bound

7.5 The Mistake Bound Model of Learning (3) • Optimal Mistake Bounds • MA(c) • A가 c를 학습하기 위해서 발생시키는 최대 error • optimal mistake bound (Opt(C)) • minimum over all possible learning algorithm A of MA(C)

7.5 (4) • VC와 Mistake Bound와의 관계 • WEIGHTED-MAJORITY Algorithm • weighted vote 이용 • Relative mistake bound for Weighted-Majority

Computational Learning Theory: Introduction to PAC, Sample Complexity, and Mistake Bound Model