Multi core structural svm training
Download
1 / 25

Multi-core Structural SVM Training - PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on

Multi-core Structural SVM Training. Kai-Wei Chang Department of Computer Science University of Illinois at Urbana-Champaign Joint Work With Vivek Srikumar and Dan Roth. Motivation. Decisions are structured in many applications.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Multi-core Structural SVM Training' - fennella-peppard


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Multi core structural svm training

Multi-core Structural SVM Training

Kai-Wei Chang

Department of Computer Science

University of Illinois at Urbana-Champaign

Joint Work With VivekSrikumar and Dan Roth


Motivation
Motivation

  • Decisions are structured in many applications.

    • Global decisions in which several local decisions play a role but there are mutual dependencies on their outcome.

  • It is essential to make coherent decisions in a way that takes the interdependencies into account.

  • E.g., Part-of-Speech tagging (sequential labeling)

    • Input: a sequence of words

    • Output: Part-of-speech tags {NN, VBZ, …}

      “A cat chases a mouse” => “DT NN VBZ DT NN”.

    • Assignment to can depend on both and .

    • Feature vector defined on both input and output variables:

      • can be “: Cat”, “ VBZ-VBZ”.


Inference with general constraint structure roth yih 04 07 recognizing entities and relations
Inference with General Constraint Structure [Roth&Yih’04,07]Recognizing Entities and Relations

Dole ’s wife, Elizabeth , is a native of N.C.

E1E2E3

R23

R12

Improvement over no inference: 2-5%


Structured learning and inference
Structured Learning and Inference

  • Structured prediction: predicting a structured output variable y based on the input variable x.

    • variables form a structure:sequences, clusters, trees, or arbitrary graphs.

    • Structure comes from interactions between the output variables through mutual correlations and constraints.

  • TODAY:

    • How to efficiently learn models that are used to make global decisions.

    • We focus on training a structural SVM model.

    • Various approaches have been proposed [Joachims et. al. 09, Chang and Yih 13, Lacoste-Julien et. al. 13] but they are single-threaded.

    • DEMI-DCD: a multi-threaded algorithm for training structural SVM.


Outline
Outline

  • Structural SVM: Inference and Learning

  • DEMI-DCD for Structural SVM

  • Related Work

  • Experiments

  • Conclusions


Outline1
Outline

  • Structural SVM: Inference and Learning

  • DEMI-DCD for Structural SVM

  • Related Work

  • Experiments

  • Conclusions


Structured prediction inference
Structured Prediction: Inference

Features on input-output

Weight parameters (tobe

estimated during learning)

Set of allowed structures

often specified by constraints

Features vector on input-output

Inference constitutes predicting the best scoring structure.

Efficient inference algorithms have been proposed for some specific structures.

Integer linear programing (ILP) solver can deal with general structures.


Structural svm
Structural SVM

For all samples and feasible structures

Score of

gold structure

Score of

predicted structure

Loss function

Slack variable

Given a set of training examples .

Solve the following optimization problem to learn w

is the scoring function used in inference.

We use (L2-loss structural SVM).


Dual problem of structural svm
Dual Problem of Structural SVM

Corresponds to different

Quadratic programming with bounded constraints

where

Number of variables can be exponentially large.

Relationship between and

For linear model: maintain the relationship between and though out the learning process [Hsieh et.al. 08].


Active set
Active Set

  • Maintain an active set of dual variables:

    • Identify that will be non-zero in the end of optimization process.

  • In a single-thread implementation, training consists of two phases:

    • Select and maintain (active set selection step).

      • Requiresolving a loss-augmented inference problem for each

      • Solving loss-augmented inferences is usually the bottleneck.

    • Update the values of (learning step).

      • Require (approximately) solving a sub-problem.

  • Related to cutting-plane methods.


Outline2
Outline

  • Structural SVM: Inference and Learning

  • DEMI-DCD for Structural SVM

  • Related Work

  • Experiments

  • Conclusions


Overview of demi dcd
Overview of DEMI-DCD

DEMI-DCD: Decouple Model-update and Inference with Dual Coordinate Descent.

  • Let be #threads, and split training data into

  • Active set selection (Inference) thread : select and maintain the active set for each example in

  • Learning thread: loop over all examples and update model .

  • and are shared between threads using shared memory buffers.

  • Learning

    Active Set Selection

    12


    Learning thread
    Learning Thread

    • Sequentially visit each instance and update

    • To update , solve the following one variable sub-problem

    • Then and

    • Shrinking heuristic: remove from if = 0 and

    13


    Synchronization
    Synchronization

    • DEMI-DCD requires little synchronization.

    • Only learning thread can write , and only one active set selection thread can modify .

    • Each thread maintains a local copy of and copies to/from a shared buffer after iterations.

    14


    Outline3
    Outline

    • Structural SVM: Inference and Learning

    • DEMI-DCD for Structural SVM

    • Related Work

    • Experiments

    • Conclusions


    A parallel dual coordinate descent algorithm
    A parallel Dual Coordinate Descent Algorithm

    A Master-Slave architecture (MS-DCD):

    • Given processors: split data into parts.

    • At each iterations:

      • Master sends current model to slave threads.

      • Each slave thread solves loss-augmented inference problems associated with a data block and updates the active set.

      • After all slave threads finish, master thread updates the model according to the active set.

    • Implemented in JLIS.

    Master

    Sent current w

    Slave

    Slave

    Slave

    Slave

    Solve loss-augmented inference

    and update A

    Update w based on A

    Master

    16


    Structured perceptron and its parallel version
    Structured Perceptron and its Parallel Version

    Structured Perceptron[Collins 02]:

    • At each iteration: pick an example and find the best structured output according to the current model .

    • Then update with a learning rate

      SP-IPM [McDonald et al 10]:

    • Split data into parts.

    • Train Structured Perceptron on each data block in parallel.

    • Mixed the models using a linear combination.

    • Repeat Step 2 and use the mixed model as the initial model.

    17


    Outline4
    Outline

    • Structural SVM: Inference and Learning

    • DEMI-DCD for Structural SVM

    • Related Work

    • Experiments

    • Conclusions


    Experiment settings
    Experiment Settings

    • POS tagging (POS-WSJ):

      • Assign POS label to each word in a sentence.

      • We use standard Penn Treebank Wall Street Journal corpus with 39,832 sentences.

    • Entity and Relation Recognition (Entity-Relation):

      • Assign entity types to mentions and identify relations among them.

      • 5,925 training samples.

      • Inference is solved by an ILP solver.

    • We compare the following methods:

      • DEMI-DCD: the proposed method.

      • MS-DCD: A master-slave style parallel implementation of DCD.

      • SP-IPM: parallel structured Perceptron.


    Convergence on primal function value
    Convergence on Primal Function Value

    Log-scale

    Relative primal function value difference along training time

    POS-WSJ Entity-Relation


    Test performance
    Test Performance

    SP-IPM converges to a different model

    Test Performance along training time

    POS-WSJ


    Test performance1
    Test Performance

    Test Performance along training time

    Entity-Relation Task

    Entity F1 Relation F1


    Moving average of cpu usage
    Moving Average of CPU Usage

    DEMI-DCD fully utilizes CPU power

    CPU usage drops because of the synchronization

    POS-WSJ Entity-Relation


    Outline5
    Outline

    • Structural SVM: Inference and Learning

    • DEMI-DCD for Structural SVM

    • Related Work

    • Experiments

    • Conclusions


    Conclusion
    Conclusion

    We proposed DEMI-DCD for training structural SVM on multi-core machine.

    The proposed method decouples the model update and inference phases of learning.

    As a result, it can fully utilize all available processors to speed up learning.

    Software will be available at:http://cogcomp.cs.illinois.edu/page/software

    Thank you.


    ad