1 / 34

The Power of Selective Memory - PowerPoint PPT Presentation

The Power of Selective Memory. Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem. Outline. Online learning, loss bounds etc. Hypotheses space – PST Margin of prediction and hinge-loss An online learning algorithm Trading margin for depth of the PST

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'The Power of Selective Memory' - zarifa

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

The Power of Selective Memory

Shai Shalev-Shwartz

Joint work with

Ofer Dekel, Yoram Singer

Hebrew University, Jerusalem

• Online learning, loss bounds etc.

• Hypotheses space – PST

• Margin of prediction and hinge-loss

• An online learning algorithm

• Trading margin for depth of the PST

• Automatic calibration

• A self-bounded online algorithm for learning PSTs

• For

• Get an instance

• Predict a target based on

• Get true update and suffer loss

• Update prediction mechanism

• Relative loss bounds (external regret):

For any fixed hypothesis h :

Prediction Suffix Tree (PST)

Each hypothesis is parameterized by a triplet:

context function

0

-3

-1

1

4

-2

7

• Margin of prediction

• Hinge loss

• Define the complexity of hypothesis as

• We can also extend g s.t. and get

Algorithm I :Learning Unbounded-Depth PST

• Init:

• For t=1,2,…

• Get and predict

• Get and suffer loss

• Set

• Update weight vector

• Update tree

y = ?

y =

0

y = ?

y = +

0

y = ? ?

y = +

0

y = ? ?

y = + -

0

+

-.23

y = ? ? ?

y = + -

0

+

-.23

y = ? ? ?

y = + - +

0

+

-

.23

-.23

+

.16

y = ? ? ? -

y = + - +

0

+

-

.23

-.23

+

.16

y = ? ? ? -

y = + - + -

0

+

-

.23

-.42

+

-

.16

-.14

+

-.09

y = ? ? ? - +

y = + - + -

0

+

-

.23

-.42

+

-

.16

-.14

+

-.09

y = ? ? ? - +

y = + - + - +

0

+

-

.41

-.42

+

-

.29

-.14

-

+

.09

-.09

+

.06

• Let be a sequence of examples and assume that

• Let be an arbitrary hypothesis

• Let be the loss of on the sequence of examples. Then,

• Define

• Upper bound

• Lower bound

• Upper + lower bounds give the bound in the theorem

Where does the lower bound come from?

• For simplicity, assume that and

• Define a Hilbert space:

• The context function gt+1is the projection of gtonto the half-space where f is the function

y = + - + - + - + -

• The following hypothesis has cumulative loss of 2 and complexity of 2. Therefore, the number of mistakes is bounded above by 12.

y = + - + - + - + -

• The following hypothesis has cumulative loss of 1 and complexity of 4. Therefore, the number of mistakes is bounded above by 18.But, this tree is very shallow

0

+

-

1.41

-1.41

Problem: The tree we learned is much more deeper !

Lets force gt+1 to be sparse by “canceling” the new coordinate

Now we can show that:

• We got that

• If is much smaller than we can get a loss bound !

• Problem: What happens if is very small and therefore ?Solution: Tolerate small margin errors !

• Conclusion: If we tolerate small margin errors, we can get a sparser tree

• Problem: The value of is unknown

• Solution: Use the data itself to estimate it !

More specifically:

• Denote

• If we keep then we get a mistake bound

Algorithm II :Learning Self Bounded-Depth PST

• Init:

• For t=1,2,…

• Get and predict

• Get and suffer loss

• If do nothing! Otherwise:

• Set

• Set

• Set

• Update w and the tree as in Algo. I, up to depth dt

• Let be a sequence of examples and assume that

• Let be an arbitrary hypothesis

• Let be the loss of on the sequence of examples. Then,

• Under the previous conditions, the depth of all the trees learned by the algorithm is bounded above by

y = + - + - + - + - …

Only 3 mistakes

The last PST is of depth 5

The margin is 0.61 (after normalization)

The margin of the max margin tree (of infinite depth) is 0.7071

Example revisited

0

-

+

.55

-.55

+

-

-. 22

.39

-

+

.07

-.07

+

-

.05

-.05

-

.03

• Discriminative online learning of PSTs

• Loss bound

• Trade margin and sparsity

• Automatic calibration

Future work

• Experiments

• Features selection and extraction

• Support vectors selection