Structured Prediction: A Large Margin Approach

Structured Prediction:A Large Margin Approach Ben Taskar University of Pennsylvania Joint work with: V. Chatalbashev, M. Collins, C. Guestrin, M. Jordan, D. Klein, D. Koller, S. Lacoste-Julien, C. Manning

“Don’t worry, Howard. The big questions are multiple choice.”

Handwriting Recognition x y brace Sequential structure

Object Segmentation x y Spatial structure

Natural Language Parsing x y The screen was a sea of red Recursive structure

Bilingual Word Alignment En vertu de les nouvelles propositions , quel est le coût prévu de perception de les droits ? x y What is the anticipated cost of collecting fees under the new proposal ? What is the anticipated cost of collecting fees under the new proposal? En vertu des nouvelles propositions, quel est le coût prévu de perception des droits? Combinatorial structure

Protein Structure and Disulfide Bridges AVITGACERDLQCG KGTCCAVSLWIKSV RVCTPVGTSGEDCH PASHKIPFSGQRMH HTCPCAPNLACVQT SPKKFKCLSK Protein: 1IMT

Local Prediction Classify using local information  Ignores correlations & constraints! b r a c e

building tree shrub ground Local Prediction

Structured Prediction • Use local information • Exploit correlations b r a c e

building tree shrub ground Structured Prediction

Outline • Structured prediction models • Sequences (CRFs) • Trees (CFGs) • Associative Markov networks (Special MRFs) • Matchings • Structured large margin estimation • Margins and structure • Min-max formulation • Linear programming inference • Certificate formulation

Structured Models Mild assumption: linear combination scoring function space of feasible outputs

a-z a-z a-z a-z a-z Chain Markov Net (aka CRF*) y x *Lafferty et al. 01

Associative Markov Nets Edge features Point features spin-images, point height length of edge, edge orientation “associative” restriction j yj jk yk

CFG Parsing #(NP  DT NN) … #(PP  IN NP) … #(NN  ‘sea’)

Bilingual Word Alignment • position • orthography • association En vertu de les nouvelles propositions , quel est le coût prévu de perception de le droits ? What is the anticipated cost of collecting fees under the new proposal ? k j

Disulfide Bonds: Non-bipartite Matching RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6 2 3 1 5 1 4 2 6 4 6 5 3 Fariselli & Casadio `01, Baldi et al. ‘04

2 3 1 4 6 5 Scoring Function RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6 RSCCPCYWGGCPWGQNCYPEGCSGPKV 1 2 3 4 5 6 • amino acid identities • phys/chem properties

Structured Models Mild assumptions: linear combination sum of part scores scoring function space of feasible outputs

Supervised Structured Prediction Model: Prediction Learning Data Estimatew Example: Weighted matching Generally: Combinatorialoptimization Local (ignores structure) Margin Likelihood (intractable)

OCR Example • We want: • Equivalently: “brace” “brace” “aaaaa” “brace” “aaaab” a lot! … “brace” “zzzzz”

S S A E B F G C H D S S S S A A A A B B B B S C C C C D D D D A B D F Parsing Example • We want: • Equivalently: ‘It was red’ ‘It was red’ ‘It was red’ ‘It was red’ ‘It was red’ a lot! … ‘It was red’ ‘It was red’

1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 Alignment Example • We want: • Equivalently: ‘What is the’ ‘Quel est le’ 1 2 3 1 2 3 ‘What is the’ ‘Quel est le’ ‘What is the’ ‘Quel est le’ ‘What is the’ ‘Quel est le’ ‘What is the’ ‘Quel est le’ a lot! … ‘What is the’ ‘Quel est le’ ‘What is the’ ‘Quel est le’

S S B B D E A A C C 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 S A B S C D A E C D 1 2 3 1 2 3 Structured Loss b c a r e 2 b r o r e 2 b r o c e 1 b r a c e 0 0 1 2 2 0 1 2 3 ‘What is the’ ‘Quel est le’ ‘It was red’

Large margin estimation • Given training examples , we want: • Maximize margin • Mistake weighted margin: # of mistakes in y *Collins 02, Altun et al 03, Taskar 03

Large margin estimation • Eliminate • Add slacks for inseparable case

Large margin estimation • Brute force enumeration • Min-max formulation • ‘Plug-in’ linear program for inference

Min-max formulation Structured loss (Hamming): Inference LP Inference Key step: discrete optim. continuous optim.

y  z Map for Markov Nets

Markov Net Inference LP normalization agreement Has integral solutions z for chains, trees Can be fractional for untriangulated networks

Associative MN Inference LP “associative” restriction • For K=2, solutions are always integral (optimal) • For K>2, within factor of 2 of optimal

CFG Chart • CNF tree = set of two types of parts: • Constituents (A, s, e) • CF-rules (A  B C, s, m, e)

CFG Inference LP root inside outside Has integral solutions z

Matching Inference LP En vertu de les nouvelles propositions , quel est le coût prévu de perception de le droits ? k What is the anticipated cost of collecting fees under the new proposal ? degree j Has integral solutions z

LP Duality • Linear programming duality • Variables  constraints • Constraints  variables • Optimal values are the same • When both feasible regions are bounded

Min-max Formulation LP duality

Min-max formulation summary • Formulation produces concise QP for • Low-treewidth Markov networks • Associative MNs (K=2) • Context free grammars • Bipartite matchings • Approximate for untriangulated MNs, AMNs with K>2 *Taskar et al 04

Unfactored Primal/Dual QP duality Exponentially many constraints/variables

Factored Primal/Dual By QP duality Dual inherits structure from problem-specific inference LP Variables  correspond to a decomposition of  variables of the flat case

The Connection b c a r e 2 .2 b r o r e 2 .15 b r o c e .25 1 b r a c e .4 0 r c a 1 1 .65 .8 .6 e b  c r o .4 .35 .2

Duals and Kernels • Kernel trick works: • Factored dual • Local functions (log-potentials) can use kernels

Alternatives: Perceptron • Simple iterative method • Unstable for structured output: fewer instances, big updates • May not converge if non-separable • Noisy • Voted / averaged perceptron [Freund & Schapire 99, Collins 02] • Regularize / reduce variance by aggregating over iterations

Alternatives: Constraint Generation • Add most violated constraint • Handles more general loss functions • Only polynomial # of constraints needed • Need to re-solve QP many times • Worst case # of constraints larger than factored [Collins 02; Altun et al, 03; Tsochantaridis et al, 04]

raw pixels quadratic kernel cubic kernel Handwriting Recognition Length: ~8 chars Letter: 16x8 pixels 10-fold Train/Test 5000/50000 letters 600/6000 words Models: Multiclass-SVMs* CRFs M3 nets 30 better 25 20 Test error (average per-character) 15 10 45% error reduction over linear CRFs 33% error reduction over multiclass SVMs 5 0 MC–SVMs M^3 nets CRFs *Crammer & Singer 01

Hypertext Classification • WebKB dataset • Four CS department websites: 1300 pages/3500 links • Classify each page: faculty, course, student, project, other • Train on three universities/test on fourth better relaxed dual 53% errorreduction over SVMs 38% error reduction over RMNs loopy belief propagation *Taskar et al 02

3D Mapping Data provided by: Michael Montemerlo & Sebastian Thrun Laser Range Finder GPS IMU Label: ground, building, tree, shrub Training: 30 thousand points Testing: 3 million points

Structured Prediction: A Large Margin Approach

Structured Prediction: A Large Margin Approach

Presentation Transcript

The History and Future of Weather Radar and Storm Prediction

Smart Home Technologies

Texas Margin Tax on Business Entities

Data Mining: Classification

Prediction of Natural Disasters

IEC5310 Computer Architecture Chapter 4 Exploiting ILP with Software Approach

Protein Structure Prediction

Application of DICOM Structured Report

Two-level Adaptive Branch Prediction

Chapter 6. Classification and Prediction

CS252 Graduate Computer Architecture Lecture 11 Vectors, Branch Prediction, Dependence Speculation, and Data Prediction

Prediction (Classification, Regression)

Gene Prediction

Classification and Prediction

Secondary Structure Prediction

Protein structure prediction

Scalable Real-Time Negotiation Toolkit Organizational-Structured, Distributed Resource Allocation

Structured P2P Networks

XML 入门与精通

Structured Prediction with Perceptrons and CRFs

Structured P2P Networks