# The Power of Selective Memory - PowerPoint PPT Presentation

1 / 34

The Power of Selective Memory. Shai Shalev-Shwartz Joint work with Ofer Dekel, Yoram Singer Hebrew University, Jerusalem. Outline. Online learning, loss bounds etc. Hypotheses space – PST Margin of prediction and hinge-loss An online learning algorithm Trading margin for depth of the PST

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

The Power of Selective Memory

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## The Power of Selective Memory

Shai Shalev-Shwartz

Joint work with

Ofer Dekel, Yoram Singer

Hebrew University, Jerusalem

### Outline

• Online learning, loss bounds etc.

• Hypotheses space – PST

• Margin of prediction and hinge-loss

• An online learning algorithm

• Trading margin for depth of the PST

• Automatic calibration

• A self-bounded online algorithm for learning PSTs

### Online Learning

• For

• Get an instance

• Predict a target based on

• Get true update and suffer loss

• Update prediction mechanism

### Analysis of Online Algorithm

• Relative loss bounds (external regret):

For any fixed hypothesis h :

### Prediction Suffix Tree (PST)

Each hypothesis is parameterized by a triplet:

context function

0

-3

-1

1

4

-2

7

### Margin of Prediction

• Margin of prediction

• Hinge loss

### Complexity of hypothesis

• Define the complexity of hypothesis as

• We can also extend g s.t. and get

### Algorithm I :Learning Unbounded-Depth PST

• Init:

• For t=1,2,…

• Get and predict

• Get and suffer loss

• Set

• Update weight vector

• Update tree

y = ?

y =

0

y = ?

y = +

0

y = ??

y = +

0

y = ??

y = +-

0

+

-.23

y = ???

y = +-

0

+

-.23

y = ???

y = +-+

0

+

-

.23

-.23

+

.16

y = ???-

y = +-+

0

+

-

.23

-.23

+

.16

y = ???-

y = +-+-

0

+

-

.23

-.42

+

-

.16

-.14

+

-.09

y = ???-+

y = +-+-

0

+

-

.23

-.42

+

-

.16

-.14

+

-.09

y = ???-+

y = +-+-+

0

+

-

.41

-.42

+

-

.29

-.14

-

+

.09

-.09

+

.06

### Analysis

• Let be a sequence of examples and assume that

• Let be an arbitrary hypothesis

• Let be the loss of on the sequence of examples. Then,

### Proof Sketch

• Define

• Upper bound

• Lower bound

• Upper + lower bounds give the bound in the theorem

### Proof Sketch (Cont.)

Where does the lower bound come from?

• For simplicity, assume that and

• Define a Hilbert space:

• The context function gt+1is the projection of gtonto the half-space where f is the function

### Example revisited

y = +-+-+-+-

• The following hypothesis has cumulative loss of 2 and complexity of 2. Therefore, the number of mistakes is bounded above by 12.

### Example revisited

y = +-+-+-+-

• The following hypothesis has cumulative loss of 1 and complexity of 4. Therefore, the number of mistakes is bounded above by 18.But, this tree is very shallow

0

+

-

1.41

-1.41

Problem: The tree we learned is much more deeper !

### Geometric Intuition (Cont.)

Lets force gt+1 to be sparse by “canceling” the new coordinate

### Geometric Intuition (Cont.)

Now we can show that:

• We got that

• If is much smaller than we can get a loss bound !

• Problem: What happens if is very small and therefore ?Solution: Tolerate small margin errors !

• Conclusion: If we tolerate small margin errors, we can get a sparser tree

### Automatic Calibration

• Problem: The value of is unknown

• Solution: Use the data itself to estimate it !

More specifically:

• Denote

• If we keep then we get a mistake bound

### Algorithm II :Learning Self Bounded-Depth PST

• Init:

• For t=1,2,…

• Get and predict

• Get and suffer loss

• If do nothing! Otherwise:

• Set

• Set

• Set

• Update w and the tree as in Algo. I, up to depth dt

### Analysis – Loss Bound

• Let be a sequence of examples and assume that

• Let be an arbitrary hypothesis

• Let be the loss of on the sequence of examples. Then,

### Analysis – Bounded depth

• Under the previous conditions, the depth of all the trees learned by the algorithm is bounded above by

Performance of Algo. II

y = + - + - + - + - …

Only 3 mistakes

The last PST is of depth 5

The margin is 0.61 (after normalization)

The margin of the max margin tree (of infinite depth) is 0.7071

0

-

+

.55

-.55

+

-

-. 22

.39

-

+

.07

-.07

+

-

.05

-.05

-

.03

### Conclusions

• Discriminative online learning of PSTs

• Loss bound