And Other Cool Stuff. Online Learning. Your guide:. Avrim Blum. Carnegie Mellon University. [Machine Learning Summer School 2012]. Itinerary. Stop 1: M inimizing regret and combining advice . Randomized Wtd Majority / Multiplicative Weights alg Connections to game theory
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Your guide:
Avrim Blum
Carnegie Mellon University
[Machine Learning Summer School 2012]
+
+

+





+


+

+
+



Kernel functions and Learningw ¢ x = (x(1) + x(2) – x(5) + x(9)) ¢ x.
Kernel should be pos. semidefinite (PSD)
z2
x2
X
X
X
X
X
X
X
X
X
X
X
X
O
X
X
O
O
O
X
X
O
X
x1
O
O
X
O
O
z1
O
O
O
O
O
X
X
O
O
X
X
X
X
X
O
X
X
X
X
z3
X
X
X
X
X
X
X
Example+
+
+
+

+




Moreover, generalize well if good marginAssume F(x)· 1.
y
Moreover, generalize well if good margin
But there is a little bit of a disconnect...
Ey~D[K(x,y)l(y)=l(x)] ¸Ey~D[K(x,y)l(y)l(x)]+
Average similarity to points of the same label
Average similarity to points of opposite label
gap
“most x are on average more similar to points y of their own type than to points y of the other type”
Ey~D[K(x,y)l(y)=l(x)] ¸ Ey~D[K(x,y)l(y)l(x)]+
Average similarity to points of the same label
Average similarity to points of opposite label
gap
Note: it’s possible to satisfy this and not be PSD.
Ey~D[K(x,y)l(y)=l(x)] ¸ Ey~D[K(x,y)l(y)l(x)]+
Average similarity to points of the same label
Average similarity to points of opposite label
gap
How can we use it?
At least a 1 prob mass of x satisfy:
Ey~D[K(x,y)l(y)=l(x)] ¸ Ey~D[K(x,y)l(y)l(x)]+
At least a 1 prob mass of x satisfy:
Ey~D[K(x,y)l(y)=l(x)] ¸ Ey~D[K(x,y)l(y)l(x)]+
+ learning problem that…
+
_
But not broad enoughAvg simil to negs is ½, but to pos is only ½¢1+½¢(½) = ¼.
+ learning problem that…
+
_
But not broad enoughEy[K(x,y)l(y)=l(x),R(y)]¸Ey[K(x,y)l(y)l(x), R(y)]+
Ey[K(x,y)l(y)=l(x),R(y)]¸Ey[K(x,y)l(y)l(x), R(y)]+
Ey[K(x,y)l(y)=l(x),R(y)]¸Ey[K(x,y)l(y)l(x), R(y)]+
could be unlabeled
If K is (,,)good, then can learn to error ’ = O() with O((1/(’2)) log(n)) labeled examples.
Algorithm
F(x) = [K1(x,y1), …,Kr(x,y1), …, K1(x,yn),…,Kr(x,yn)].
Algorithm
F(x) = [K1(x,y1), …,Kr(x,y1), …, K1(x,yn),…,Kr(x,yn)].
Guarantee:Whp the induced distribution F(P) in Rnrhas a separator of error · + at L1 margin at least
Sample complexity is roughly: O((1/(2)) log(nr))
Only increases by log(r) factor!
Learning with Multiple Similarity Functions learning problem that…
Itinerary learning problem that…
MariaFlorina Balcan
Avrim Blum
Shai Fine
Yishay Mansour
Georgia Tech
CMU
IBM
TelAviv
[In COLT 2012]
Distributed Learning learning problem that…
Many ML problems today involve massive amounts of data distributed across multiple locations.
Distributed Learning learning problem that…
Many ML problems today involve massive amounts of data distributed across multiple locations.
Distributed Learning learning problem that…
Many ML problems today involve massive amounts of data distributed across multiple locations.
Distributed Learning learning problem that…
Many ML problems today involve massive amounts of data distributed across multiple locations.
Distributed Learning learning problem that…
Many ML problems today involve massive amounts of data distributed across multiple locations.
Distributed Learning learning problem that…
Many ML problems today involve massive amounts of data distributed across multiple locations.
Distributed Learning learning problem that…
Many ML problems today involve massive amounts of data distributed across multiple locations.
Distributed Learning learning problem that…
Many ML problems today involve massive amounts of data distributed across multiple locations.
The distributed PAC learning model learning problem that…
+
+
+

+



The distributed PAC learning model learning problem that…
D = (D1 + D2 + … + Dk)/k
1 2 … k
D1 D2 … Dk
The distributed PAC learning model learning problem that…
Goal: learn good h over D, using as little communication as possible.
1 2 … k
D1 D2 … Dk
The distributed PAC learning model learning problem that…
Interesting special case to think about:
1 2
+
+
+
+
+
+
+
+
+
+
+
+




+
+
+
+












The distributed PAC learning model learning problem that…
Assume learning a class C of VCdimension d.
Some simple baselines. [viewing k << d]
D1 D2 … Dk
The distributed PAC learning model learning problem that…
D1 D2 … Dk
Dependence on 1/ learning problem that…²
Had linear dependence in d and 1/², or M and no dependence on 1/².
D1 D2 … Dk
Recap of learning problem that…Adaboost
+
+
+
+
+
+
+

+







Key points:
Distributed learning problem that…Adaboost
Si
Sj
wi,t
wj,t
Si
Sj
ht
ht
+
ht
+
+
+
+
+
+
+

+

+






(ht may do better on some than others)




Distributed learning problem that…Adaboost
Final result:
Agnostic learning learning problem that…
Recent result of [BalcanHanneke] gives robust halving alg that can be implemented in distributed setting.
Can we do better for specific classes of interest? learning problem that…
E.g., conjunctions over {0,1}d. f(x) = x2x5x9x15.
Can we do better for specific classes of interest? learning problem that…
E.g., conjunctions over {0,1}d. f(x) = x2x5x9x15.
1101111011010111
1111110111001110
1100110011001111
1100110011000110
Can we do better for specific classes of interest? learning problem that…
E.g., conjunctions over {0,1}d. f(x) = x2x5x9x15.
Only O(k) examples sent. O(kd) bits.
Can we do better for specific classes of interest? learning problem that…
General principle: can learn any intersection closed class (welldefined “tightest wrapper” around positives) this way.
+
+


+
+






Interesting class: parity functions learning problem that…
Examples x 2 {0,1}d. f(x) = x¢vf mod 2, for unknown vf.
Interesting class: parity functions learning problem that…
Examples x 2 {0,1}d. f(x) = x¢vf mod 2, for unknown vf.
S
h2 C
S
f(x)
x
??
Interesting class: parity functions learning problem that…
Examples x 2 {0,1}d. f(x) = x¢vf mod 2, for unknown vf.
Linear Separators learning problem that…
Linear separators over nearuniform D over Bd.
Can one do better?
+
+
+

+



Linear Separators learning problem that…
Thm: Over any nonconcentrated D [density bounded by c¢unif], can achieve #vectors communicated of O((d log d)1/2) rather than O(d) (for constant k, ²).
Algorithm:
Linear Separators learning problem that…
Thm: Over any nonconcentrated D [density bounded by c¢unif], can achieve #vectors communicated of O((d log d)1/2) rather than O(d) (for constant k, ²).
Algorithm:
Proof idea:
Conclusions and Open Questions learning problem that…
As we move to large distributed datasets, communication becomes increasingly crucial.
Open questions: