Semi stochastic gradient descent methods
This presentation is the property of its rightful owner.
Sponsored Links
1 / 19

Semi-Stochastic Gradient Descent Methods PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on
  • Presentation posted in: General

Semi-Stochastic Gradient Descent Methods. Jakub Kone čný (joint work with Peter Richt árik ) University of Edinburgh. Introduction. Large scale problem setting. Problems are often structured Frequently arising in machine learning. is BIG. Structure – sum of functions. Examples.

Download Presentation

Semi-Stochastic Gradient Descent Methods

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Semi stochastic gradient descent methods

Semi-Stochastic Gradient Descent Methods

Jakub Konečný(joint work with Peter Richtárik)

University of Edinburgh


Introduction

Introduction


Large scale problem setting

Large scale problem setting

  • Problems are often structured

  • Frequently arising in machine learning

is BIG

Structure – sum of functions


Examples

Examples

  • Linear regression (least squares)

  • Logistic regression (classification)


Assumptions

Assumptions

  • Lipschitz continuity of derivative of

  • Strong convexity of


Gradient descent gd

Gradient Descent (GD)

  • Update rule

  • Fast convergence rate

    • Alternatively, for accuracy we need

      iterations

  • Complexity of single iteration –

    (measured in gradient evaluations)


Stochastic gradient descent sgd

Stochastic Gradient Descent (SGD)

  • Update rule

  • Why it works

  • Slow convergence

  • Complexity of single iteration –

    (measured in gradient evaluations)

a step-size parameter


Semi stochastic gradient descent methods

Goal

GD

SGD

Fast convergence

gradient evaluations in each iteration

Slow convergence

Complexity of iteration independent of

Combine in a single algorithm


Semi stochastic gradient descent s2gd

Semi-Stochastic Gradient DescentS2GD


Intuition

Intuition

  • The gradient does not change drastically

  • We could reuse the information from “old” gradient


Modifying old gradient

Modifying “old” gradient

  • Imagine someone gives us a “good” point and

  • Gradient at point , near , can be expressed as

  • Approximation of the gradient

Gradient change

We can try to estimate

Already computed gradient


The s2gd algorithm

The S2GD Algorithm

Simplification; size of the inner loop is random, following a geometric rule


Theorem

Theorem


Convergence rate

Convergence rate

  • How to set the parameters ?

For any fixed , can be made arbitrarily small by increasing

Can be made arbitrarily small, by decreasing


Setting the parameters

Setting the parameters

  • The accuracy is achieved by setting

  • Total complexity (in gradient evaluations)

Fix target accuracy

# of epochs

stepsize

# of iterations

# of epochs

cheap iterations

full gradient evaluation


Complexity

Complexity

  • S2GD complexity

  • GD complexity

    • iterations

    • complexity of a single iteration

    • Total


Related methods

Related Methods

  • SAG – Stochastic Average Gradient

    (Mark Schmidt, Nicolas Le Roux, Francis Bach, 2013)

    • Refresh single stochastic gradient in each iteration

    • Need to store gradients.

    • Similar convergence rate

    • Cumbersome analysis

  • MISO - Minimization by Incremental Surrogate Optimization (Julien Mairal, 2014)

    • Similar to SAG, slightly worse performance

    • Elegant analysis


Related methods1

Related Methods

  • SVRG – Stochastic Variance Reduced Gradient

    (Rie Johnson, Tong Zhang, 2013)

    • Arises as a special case in S2GD

  • Prox-SVRG

    (Tong Zhang, Lin Xiao, 2014)

    • Extended to proximal setting

  • EMGD – Epoch Mixed Gradient Descent

    (Lijun Zhang, MehrdadMahdavi , Rong Jin, 2013)

    • Handles simple constraints,

    • Worse convergence rate


Experiment

Experiment

  • Example problem, with


  • Login