1 / 16

Announcements

Announcements. Exam grading Projects Next: Generative Models Chapter 6 Bayesian Learning. Shattering. We say that a set S of examples is shattered by a set of functions H if for every partition of the examples in S into positive and negative examples

leoma
Download Presentation

Announcements

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Announcements • Exam grading • Projects • Next: Generative Models • Chapter 6 Bayesian Learning CS446-Spring 06

  2. Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • (Intuition: A rich set of functions shatters large sets of points) CS446-Spring 06

  3. Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • (Intuition: A rich set of functions shatters large sets of points) • Left bounded intervals on the real axis:[0,a), for some real number a>0 + + + + + - - a 0 CS446-Spring 06

  4. Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • (Intuition: A rich set of functions shatters large sets of points) • Left bounded intervals on the real axis:[0,a), for some real number a>0 • Sets of two points cannot be shattered • (we mean: given two points, you can label them in such a way that • no concept in this class that will be consistent with their labeling) + + + + + - + + + + + - - a a - + 0 0 CS446-Spring 06

  5. Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • Intervals on the real axis:[a,b], for some real numbers b>a This is the set of functions (concept class) considered here - - + + + + + - - b a CS446-Spring 06

  6. Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • Intervals on the real axis:[a,b], for some real numbers b>a • All sets of one or two points can be shattered • but sets of three points cannot be shattered - - - - + + + + + - + + + + + - - b b + - + b a CS446-Spring 06

  7. Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • Half-spaces in the plane: + + + - - - + - CS446-Spring 06

  8. Shattering • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • Half-spaces in the plane: • sets of one, two or three points can be shattered • but there is no set of four points that can be shattered + + + + - - - - + - - + CS446-Spring 06

  9. VC Dimension • An unbiased hypothesis space H shatters the entire instance space X, i.e, • it is able to induce every possible partition on the set of all possible instances. • The larger the subset X that can be shattered, the more expressive a • hypothesis space is, i.e., the less biased. CS446-Spring 06

  10. VC Dimension • We say that a set S of examplesis shattered by a set of functions H if • for every partition of the examples in S into positive and negative examples • there is a function in H that gives exactly these labels to the examples • The VC dimension of hypothesis space H over instance space X • is the size of the largest finite subset of X that is shattered by H. • If there exists a subset of size d can be shattered, then VC(H) >=d • If no subset of sizedcan be shattered, then VC(H) < d • VC(Half intervals) = 1 (no subset of size 2 can be shattered) • VC( Intervals) = 2 (no subset of size 3 can be shattered) • VC(Half-spaces in the plane) = 3 (no subset of size 4 can be shattered) CS446-Spring 06

  11. Sample complexity with VC Dimension • Using VC(H) as a measure of expressiveness we have the following • for infinite hypothesis spaces. • Given a sample D of m examples • If we can find some h  H that is consistent with all m examples • with • Then with probability at least (1-),h has error less than . • (again when m is polynomial we have a PAC learning algorithm; • to be efficient, we need to produce the hypothesis h efficiently. • Note: to shatter m examples |H|>2m, so log(|H|)¸VC(H) CS446-Spring 06

  12. Homework • H = Axis parallel rectangles in R2 • Four real numbers define a rectangle • |H| is infinite • Five sample rectangles from H are shown • What is the VC dimension of H • Can we PAC learn? • Can we efficiently PAC learn? CS446-Spring 06

  13. VC Dimension & Learning • Infinite |H| does not mean unbounded expressivity • Exhaust the representational capacity of H • VC(H) is a worst-case capacity measure • Distribution and labelings over X may not be unfavorable CS446-Spring 06

  14. VC(H) Growth Functionlog(labelings) vs. |S| 1,000,000 All Labelings 10,000 labelings(|S|) Labelings Possible by H 100 1 5 |S| 10 15 20 CS446-Spring 06

  15. Suppose… • All hH are very low accuracy, say < 0.1% correct • VC(H) is 100 • Training set S contains 80 labeled examples What is the probability that an arbitrary h gets the first training example right? What is the probability that an arbitrary h gets all 80 training examples right? What is the best some hH can possibly do on all 80 elements of S? CS446-Spring 06

  16. Some Interesting Concept Classeswhat is the VC dimension? • Signum(sin(x)) on the real line R • Convex polygons in the plane RxR • d-input linear threshold unit in Rd CS446-Spring 06

More Related