Learning with Similarity Functions. MariaFlorina Balcan & Avrim Blum CMU, CSD. Kernels and Similarity Functions. Kernels have become a powerful tool in ML. Useful in practice for dealing with many different kinds of data.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Kernels have become a powerful tool in ML.
Our Goal: analyze more general similarity functions.
MariaFlorina Balcan
(x)
1
w
KernelsIf margin in space, only need 1/2 examples to learn well.
MariaFlorina Balcan
Goal:definition ofgood similarity functionfor a learning problem that:
1) Talks in terms of natural direct properties:
2) If K satisfies these properties for our given problem, then has implications to learning.
3) Is broad: includes usual notion of “good kernel”.
(induces a large margin separator in space)
MariaFlorina Balcan
B
C

A
+
A First Attempt: Definition satisfying properties (1) and (2)Let P be a distribution over labeled examples (x, l(x))
Ey~P[K(x,y)l(y)=l(x)] ¸ Ey~P[K(x,y)l(y)l(x)]+
Note: this might not be a legal kernel.
MariaFlorina Balcan
Ey~P[K(x,y)l(y)=l(x)] ¸ Ey~P[K(x,y)l(y)l(x)]+
Algorithm
MariaFlorina Balcan
Ey~P[K(x,y)l(y)=l(x)] ¸ Ey~P[K(x,y)l(y)l(x)]+
Algorithm
Guarantee: with probability ¸1, error · + .
Proof
MariaFlorina Balcan
more similar to negs than to typical pos
+
+
+
+
+
+






A First Attempt: Not Broad EnoughEy~P[K(x,y)l(y)=l(x)] ¸ Ey~P[K(x,y)l(y)l(x)]+
MariaFlorina Balcan
Ey~P[K(x,y)l(y)=l(x)] ¸ Ey~P[K(x,y)l(y)l(x)]+
R
+
+
+
+
+
+






Idea: would work if we didn’t pick y’s rom topleft.
Broaden to say:OK if 9 large region R s.t. most x are on average more similar to y2R of same label than to y2 R of other label.
MariaFlorina Balcan
Ey~P[w(y)K(x,y)l(y)=l(x)] ¸ Ey~P[w(y)K(x,y)l(y)l(x)]+
MariaFlorina Balcan
Ey~P[w(y)K(x,y)l(y)=l(x)] ¸ Ey~P[w(y)K(x,y)l(y)l(x)]+
Algorithm
F(x) = [K(x,y1), …,K(x,yd), K(x,zd),…,K(x,zd)].
Point is: with probability ¸ 1, exists linear separator of error · + at margin /4.
(w = [w(y1), …,w(yd),w(zd),…,w(zd)])
MariaFlorina Balcan
Algorithm
F(x) = [K(x,y1), …,K(x,yd), K(x,zd),…,K(x,zd)].
Guarantee: with prob. ¸ 1, exists linear separator of error · + at margin /4.
legal kernel
Implications
K arbitrary sim. function
(,)goodsim. function
(+,/4)goodkernelfunction
MariaFlorina Balcan
Main Definition: K:(x,y) ! [1,1] is an(,)good similarityfor P if exists a weighting functionw(y) 2 [0,1] at leasta 1probability mass of x satisfy:
Ey~P[w(y)K(x,y)l(y)=l(x)] ¸ Ey~P[w(y)K(x,y)l(y)l(x)]+
Theorem
Our current proofs incur some penalty:
’ = + extra, ’ = 3extra.
MariaFlorina Balcan
Theorem
’ = + extra, ’ = 3extra.
Proof Sketch
MariaFlorina Balcan
Algorithm
F(x) = [K1(x,y1), …,Kr(x,yd), K1(x,zd),…,Kr(x,zd)].
Guarantee: The induced distribution F(P) in R2dr has a separator of error · + at margin at least
MariaFlorina Balcan
Open Problems
MariaFlorina Balcan