1 / 16

Efficient classification for metric data

Efficient classification for metric data. Lee-Ad Gottlieb Weizmann Institute Aryeh Kontorovich Ben Gurion U. Robert Krauthgamer Weizmann Institute. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A. Classification problem.

shiloh
Download Presentation

Efficient classification for metric data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient classification for metric data Lee-Ad Gottlieb Weizmann Institute Aryeh Kontorovich Ben Gurion U. Robert Krauthgamer Weizmann Institute TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAA

  2. Classification problem • Probabilistic concept learning • S is a set of n examples (x,y) drawn from X x {-1,1} according to some unknown probability distribution P. • The learner produces hypothesish: X → {-1,1} • A good hypothesis (classifier) minimizes the generalization error • P{(x,y): h(x) ≠ y} • A popular solution uses kernels • Data represented as vectors, kernels take the dot-product of vectors Efficient classification for metric data

  3. Finite metric space • (X,d) is a metric space if • X = set of points • d = distance function • Nonnegative • Symmetric • Triangle inequality • Classification for metric data? • Problem: • No vector representation → • No notion of dot-product → • Can’t use kernels • What can be done in this setting? Tel-Aviv 95km 62km 151km Jerusalem Haifa Efficient classification for metric data

  4. Preliminary definition • The Lipschitz constantL of a function f: X → R is the smallest value that satisfies for all points xi,xj in X • L ≥ |f(xi)-f(xj)| / d(xi,xj) • Consider a hypothesis consistent with all of S • Its Lipschitz constant is determined by the closest pair of differently labeled points • L ≥ 2 / d(xi,xj) for all xi in S−, xj in S+ Efficient classification for metric data

  5. Classification for metric data • A powerful framework for this problem was introduced by von Luxburg & Bousquet (vLB, JMLR ‘04) • The natural hypotheses (classifiers) to consider are maximally smooth Lipschitz functions • Given the classifier h, the problem of evaluating of h for new points in X reduces to the problem of finding a Lipschitz function consistent with h • Lipschitz extension problem, a classic problem in Analysis • For example • f(x) = mini [yi + 2d(x, xi)/d(S+,S−)] over all (xi,xj) in S • Function evaluation reduces to exact Nearest Neighbor Search (assuming zero training error) • Strong theoretical motivation for the NNS classification heuristic Efficient classification for metric data

  6. Two new directions • The framework of vLB leaves open two further questions: • Efficient evaluation of the classifier h on X • In arbitrary metric space, exact NNS requires Θ(n) time • Can we do better? • Bias – variance tradeoff • Which sample points in S should h ignore? q ~1 ~1 -1 +1 Efficient classification for metric data

  7. Doubling Dimension • Definition: Ball B(x,r) = all points within distance r from x. • The doubling constant(of a metric M) is the minimum value ¸>0such that every ball can be covered by ¸balls of half the radius • First used by [Ass-83], algorithmically by [Cla-97]. • The doubling dimension is dim(M)=log ¸(M) [GKL-03] • A metric is doubling if its doubling dimension is constant • Packing property of doubling spaces • A set with diameter D and min. inter-point distance a, contains at most (D/a)O(log¸)points Here ≤7. Efficient classification for metric data

  8. Application I • We provide generalization bounds for Lipschitz functions on spaces with low doubling dimension • vLB provided similar bounds using covering numbers and Rademacher averages • Fat-shattering analysis: • Lipschitz function shatters a set → inter-point distance is at least 2/L • Packing property → set has (DL)O(log¸) points • So the fat-shattering dimension is low Efficient classification for metric data

  9. Application I • Theorem: • For any f that classifies a sample of size n correctly, we have with probability at least 1− • P {(x, y) : sgn(f(x)) ≠ y} ≤ 2/n (d log(34en/d) log(578n) + log(4/)) . • Likewise, if f is correct on all but k examples, we have with probability at least 1− • P {(x, y) : sgn(f(x)) ≠ y} ≤ k/n + [2/n (d ln(34en/d) log2(578n) + ln(4/))]1/2. • In both cases, d ≤ ⌈8LD]log¸+1. Efficient classification for metric data

  10. Application II • Evaluation of h for new points in X • Lipschitz extension function • f(x) = mini [yi + 2d(x, xi)/d(S+,S−)] • Requires exact nearest neighbor search, which can be expensive! • New tool: (1+)-approximate nearest neighbor search • ¸O(1) log n + ¸O(-log) time • [KL-04, HM-05, BKL-06, CG-06] • If we evaluate f(x) using an approximate NNS, we can show that the result agrees with (the sign of) at least one of • g(x) = (1+) f(x) +  • h(x) = (1+) f(x) -  • Note that g(x) ≥ f(x) ≥ h(x) • g(x) and h(x) have Lipschitz constant (1+)L, so they and the approximate function generalizes well Efficient classification for metric data

  11. Bias variance tradeoff • Which sample points in S should h ignore? • If f is correct on all but k examples, we have with probability at least 1− • P {(x, y):sgn(f(x)) ≠ y} ≤ k/n+ [2/n (d ln(34en/d)log2(578n) +ln(4/))]1/2. • Where d ≤ ⌈8LD]¸+1. -1 +1 Efficient classification for metric data

  12. Bias variance tradeoff • Algorithm • Fix a target Lipschitz constant L • O(n2) possibilities • Locate all pairs of points from S+ and S- whose distance is less than 2L • At least one of these points has to be taken as an error • Goal: Remove as few points as possible Efficient classification for metric data

  13. Bias variance tradeoff • Algorithm • Fix a target Lipschitz constant L • Out of O(n2) possibilities • Locate all pairs of points from S+ and S- whose distance is less than 2L • At least one of these points has to be taken as an error • Goal: Remove as few points as possible • Minimum vertex cover • NP-Complete • Admits a 2-approximation in O(E) time Efficient classification for metric data

  14. Bias variance tradeoff • Algorithm • Fix a target Lipschitz constant L • Out of O(n2) possibilities • Locate all pairs of points from S+ and S- whose distance is less than 2L • At least one of these points has to be taken as an error • Goal: Remove as few points as possible • Minimum vertex cover • NP-Complete • Admits a 2-approximation in O(E) time • Minimum vertex cover on a bipartite graph • Equivalent to maximum matching (Konig’s theorem) • Admits an exact solution in O(n2.376) randomized time Efficient classification for metric data

  15. Bias variance tradeoff • Algorithm: • For each of O(n2) values of L • Run matching algorithm to find minimum error • Evaluate generalization bound for this value of L • O(n4.376) randomized time • Better algorithm • Binary search over O(n2) values of L • For each value • Run matching algorithm Find minimum error in O(n2.376 log n) randomized time Evaluate generalization bound for this value of L • Run greedy 2-approximation Approximate minimum error in O(n2 log n) time Evaluate approximate generalization bound for this value of L Efficient classification for metric data

  16. Conclusion • Results: • Generalization bounds for Lipschitz classifiers in doubling spaces • Efficient evaluation of the Lipschitz extension hypothesis using approximate NNS • Efficient calculation of the bias variance tradeoff • Continuing research • Similar results for continuous labels Efficient classification for metric data

More Related