learning structured ouputs a reinforcement learning approach
Download
Skip this Video
Download Presentation
Learning structured ouputs A reinforcement learning approach

Loading in 2 Seconds...

play fullscreen
1 / 61

Learning structured ouputs A reinforcement learning approach - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

Learning structured ouputs A reinforcement learning approach. P. Gallinari F. Maes, L. Denoyer [email protected] www-connex.lip6.fr University Pierre et Marie Curie – Paris – Fr Computer Science Lab. Outline. Motivation and examples Approaches for structured learning

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Learning structured ouputs A reinforcement learning approach' - chidi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
learning structured ouputs a reinforcement learning approach

Learning structured ouputsA reinforcement learning approach

P. Gallinari

F. Maes, L. Denoyer

[email protected]

www-connex.lip6.fr

University Pierre et Marie Curie – Paris – Fr

Computer Science Lab.

outline
Outline
  • Motivation and examples
  • Approaches for structured learning
    • Generative models
    • Discriminant models
  • Reinforcement learning for structured prediction
  • Experiments

ANNPR 2008 - P. Gallinari

machine learning and structured data
Machine learning and structured data
  • Different types of problems
    • Model, classify, cluster structured data
    • Predict structured outputs
    • Learn to associate structured representations
  • Structured data and applications in many domains
    • chemistry, biology, natural language, web, social networks, data bases, etc

ANNPR 2008 - P. Gallinari

sequence labeling pos

Coord. Conj.

noun

Verb 3rd pers

adverb

Noun plural

determiner

Verb gerund

Verb plural

adjective

Sequence labeling: POS

ANNPR 2008 - P. Gallinari

segmentation labeling syntactic chunking washington univ tagger

adverbial Phrase

Noun Phrase

Noun Phrase

Verb Phrase

Noun Phrase

Segmentation + labeling: syntactic chunking (Washington Univ. tagger)

ANNPR 2008 - P. Gallinari

syntaxic parsing stanford parser
Syntaxic Parsing (Stanford Parser)

ANNPR 2008 - P. Gallinari

tree mapping
Tree mapping:
  • Problem
      • query heterogeneous XML collections
      • Web information extraction
  • Need to know the correspondence between the structured representations  usually made by hand
  • Learn the correspondence between the different sources

Labeled tree mapping problem

ANNPR 2008 - P. Gallinari

others
Others
  • Taxonomies
  • Social networks
  • Adversial computing: Webspam, Blogspam, …
  • Translation
  • Biology
  • …..

ANNPR 2008 - P. Gallinari

is structure really useful can we make use of structure
Is structure really useful ?Can we make use of structure ?
  • Yes
    • Evidence from many domains or applications
    • Mandatory for many problems
      • e.g. 10 K classes classification problem
  • Yes but
    • Complex or long term dependencies often correspond to rare events
    • Practical evidence for large size problems
        • Simple models sometimes offer competitive results
          • Information retrieval
          • Speech recognition, etc

ANNPR 2008 - P. Gallinari

why is structure prediction difficult
Why is structure prediction difficult
  • Size of the output space
    • # of potential labeling for a sequence
    • # of potential joint labeling and segmentation
    • # of potential labeled trees for an input sequence
    • Exponential output space
      • Inference often amounts at a combinatorial search problem
      • Scaling issues in most models

ANNPR 2008 - P. Gallinari

structured learning
Structured learning
  • X, Y : input and output spaces
  • Structured output
    • y  Ydecomposes into parts of variable size
      • y = (y1, y2,…, yT)
    • Dependencies
      • Relations between y parts
      • Local, long term, global
  • Cost function
    • O/ 1 loss:
    • Hamming loss:
    • F-score
    • BLEU etc

ANNPR 2008 - P. Gallinari

slide12
General approach
    • Predictive approach:
    • where F : X x Y R is a score function used to rank potential outputs
    • F trained to optimize some loss function
  • Inference problem
    • |Y| sometimes exponential
    • Argmax is often intractable: hypothesis
      • decomposability of the score function over the parts of y
      • Restricted set of outputs

ANNPR 2008 - P. Gallinari

slide13
Structured algorithms differ by:
    • Feature encoding
    • Hypothesis on the output structure
    • Hypothesis on the cost function
    • Type of structure prediction problem

ANNPR 2008 - P. Gallinari

three main families of sp models

Three main families of SP models

Generative models

Discriminant models

Search models

generative models
Generative models
  • 30 years of generative models
  • Hidden Markov Models
    • Many extensions
      • Hierarchical HMMs, Factorial HMMs, etc
  • Probabilistic Context Free grammars
  • Tree models, Graphs models

ANNPR 2008 - P. Gallinari

usual hypothesis
Usual hypothesis
  • Features : “natural” encoding of the input
  • Local dependency hypothesis
    • On the outputs (Markov)
    • On the inputs
  • Cost function
    • Usually joint likelihood
    • Decomposes, e.g. sum of local cost on each subpart
  • Inference and learning usually use dynamic programming
    • HMMs
      • Decoding complexity O(n|Q|2) for a sequence of length n and |Q| states
    • PCFGs
      • Decoding Complexity: O(m3n3), n = length of the sentence, m = # non terminals in the grammar
    • Prohibitive for many problems

ANNPR 2008 - P. Gallinari

summary generative sp models
Summary generative SP models
  • Well known approaches
  • Large number of existing models
  • Local hypothesis
  • Often imply using dynamic programming
    • Inference
    • learning

ANNPR 2008 - P. Gallinari

discriminant models
Discriminant models
  • Structured Perceptron (Collins 2002)
  • Breakthrough
    • Large margin methods (Tsochantaridis et al. 2004, Taskar 2004)
  • Many others now
  • Considered as an extension of multi-class clasification

ANNPR 2008 - P. Gallinari

usual hypothesis1
Usual hypothesis
  • Joint representation of input – output Φ(x, y)
    • Encode potential dependencies among and between input and output
      • e.g. histogram of state transitions observed in training set, frequency of (xi,yj), POS tags, etc
    • Large feature sets (102 -> 104)
  • Linear score function:
  • Decomposability of features set (outputs) and of the loss function

ANNPR 2008 - P. Gallinari

extension of large margin methods
Extension of large margin methods
  • 2 problems
    • Classical 0/1 loss is meaningless with structured outputs
      • Structured outputs is not a simple extension of multiclass classification
      • Generalize max margin principle to other loss functions than 0/1 loss
    • Number of constraints for the QP problem is proportional to |Y|, i.e. potentially exponential
      • Different solutions
        • Consider only a polynomial number of “active” constraints -> bounds on the optimality of the solution
  • Learning require solving an Argmax problem at each iteration
    • i.e. Dynamic programming

ANNPR 2008 - P. Gallinari

summary discriminant approaches
Summary: discriminant approaches
  • Hypothesis
    • Local dependencies for the output
    • Decomposability of the loss function
  • Long term dependencies in the input
  • Nice convergence properties + bounds
  • Complexity
    • Learning often does not scale

ANNPR 2008 - P. Gallinari

precursors
Precursors
  • Incremental Parsing, Collins 2004
  • SEARN, Daume et al. 2006
  • Reinforcement Learning, Maes et al. 2007

ANNPR 2008 - P. Gallinari

general idea
General idea
  • The structured output will be built incrementally
    • ŷ = (ŷ1, ŷ2,…, ŷT)
    • Different from solving directly the prediction preblem
  • It will by build via Machine learning
  • Loss :
  • Goal
    • Construct incrementally ŷ so as to minimize the loss
    • Training: learn how to search the solution space
    • Inference: build ŷ as a sequence of successive decisions

ANNPR 2008 - P. Gallinari

example sequence labelling
Example : sequence labelling

Example

2 labels R et B

Search space :

(input sequence, {sequence of labels})

For a size 3 sequence

x = x1x2x3 :

A node represents a state in the search space

ANNPR 2008 - P. Gallinari

example actions and decisions
Example:actions and decisions

Sequence of size 3

target :

Actions

Π(s, a)

Π(s, a)

Π(s, a)

Π(s, a) : decision function

Π(s, a)

ANNPR 2008 - P. Gallinari

example expected loss
Example : expected loss

C=0

CT=1

Sequence of size 3

target :

C=0

C=1

C=1

CT=2

C=0

CT=2

C=1

C=1

CT=3

C=0

CT=0

C=0

C=0

C=1

CT=1

Loss does not always separate !!

C=0

C=1

CT=1

C=1

CT=2

ANNPR 2008 - P. Gallinari

inference
Inference
  • Suppose we have a policy function F which decides at each step which action to take
  • Inference could be performed by computing
    • ŷ1= F(x,. ), ŷt= F(ŷ1,… ŷt-1),…, ŷT= F(ŷ1,… , ŷT-1)
    • ŷ = F(ŷ1,… , ŷT)
  • No Dynamic programming needed

ANNPR 2008 - P. Gallinari

example state space exploration guided by local costs
Example: state space exploration guided by local costs

C=0

CT=1

Sequence of size 3

target :

C=0

C=1

C=1

CT=2

C=0

CT=2

C=1

C=1

CT=3

C=0

CT=0

C=0

Goal:

generalize to unseen situation

C=0

C=1

CT=1

C=0

C=1

CT=1

C=1

CT=2

ANNPR 2008 - P. Gallinari

training
Training
  • Learn a policyF
    • From a set of (input, output) pairs
    • Which takes the optimal action at each state
    • Which is able to generalize to unknown situations

ANNPR 2008 - P. Gallinari

reinforcement learning for sp
Reinforcement learning for SP
  • Formalizes search based ideas using Markov Decision Processes as model and Reinforcement Learning for training
  • Provides a general framework for this approach
  • Many RL algorithm could be used for training
  • Differences with classical use of RL
    • Specific hypothesis
      • Deterministic environment
    • Problem size
      • State : up to 106 variables
    • Generalization properties

ANNPR 2008 - P. Gallinari

markov decision process
Markov Decision Process
  • A MDP is a tuple (S, A, P, R)
    • S is the State space,
    • A is the action space,
    • P is a transition function describing the dynamic of the environment
      • P(s, a, s’) = P(st+1 = s’| st = s, at = a)
    • R is a reward function
      • R(s, a, s’) = E[rt|st+1 = s’, st = s, at = a)

ANNPR 2008 - P. Gallinari

example prediction problem
Example: Prediction problem

ANNPR 2008 - P. Gallinari

example state space
Example: State space
  • Deterministic MDP
    • States: input + partial output
    • Actions: elementary modifications of the partial output
    • Rewards: quality of prediction

ANNPR 2008 - P. Gallinari

example partially observable reward
Example: partially observable reward
  • The reward function is only partially observable : limited set of training (input, output)

ANNPR 2008 - P. Gallinari

learning the mpdp
Learning the MPDP
  • The reward function cannot be computed on the whole MDP
  • Approximated RL
    • Learn the policy on a subspace
    • Generalize to the whole space
  • The policy is learned as a linear function
    • Φ(s, a) is a joint description of (s, a)
    • θ is a ,parameter vector
  • Learning algorithm
    • Sarsa, Q-learning, etc

ANNPR 2008 - P. Gallinari

inference1
Inference
  • State at step t
    • Initial input
    • Partial output
  • Once the MDP is learned
    • Using the state at step t, compute state t+1
    • Until a solution is reached
  • Greedy approach
    • Only the best action is chosen at each time

ANNPR 2008 - P. Gallinari

structured outputs and mdp
Structured outputs and MDP
  • State st
    • Input x + partial output ŷt
    • Initial state : (x,)
  • Actions
    • Task dependent
      • POS: new tag for the current word
      • Tree mapping: insert a new path in a partial tree
  • Inference
    • Greedy approach
      • Only the best action is chosen at each time
  • Reward
    • Final:
    • Heuristic:

ANNPR 2008 - P. Gallinari

exemple sequence labeling
Exemple: sequence labeling
  • Left Right model
    • Actions: label
  • Order free model
    • Actions: label + position
  • Loss : Hamming cost or F score
  • Tasks
    • Named entity recognition (Shared task at CoNNL 2002 - 8 000 train, 1500 test)
    • Chunking – Noun Phrases (CoNNL 2002)
    • Handwriten word recognition (5000 train, 1000 test)
  • Complexity of inference
    • O(sequence size * number of labels)

ANNPR 2008 - P. Gallinari

tree mapping document mapping
Tree mapping: Document mapping
  • Problem

Labeled tree mapping problem

Different instances

Flat text to XML

HTML to XML

XML to XML….

ANNPR 2008 - P. Gallinari

document mapping problem
Document mapping problem
  • Central issue: Complexity
    • Large collections
    • Large feature space: 103 to 106
    • Large search space (exponential)
    • Other SP methods fail

ANNPR 2008 - P. Gallinari

xml structuration
XML structuration
  • Action
    • Attach path ending with the current leaf to a position in the current partial tree
    • Φ(a,s) encodes a series of potential (state, action) pairs
  • Loss: F-Score for trees

ANNPR 2008 - P. Gallinari

slide44

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Francis MAES

Title of the

section

TITLE

AUTHOR

SECTION

INPUT DOCUMENT

Example

Welcome to

INEX

TEXT

TARGET DOCUMENT

TITLE

DOCUMENT

ANNPR 2008 - P. Gallinari

slide45

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

ANNPR 2008 - P. Gallinari

slide46

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

ANNPR 2008 - P. Gallinari

slide47

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Example

TITLE

DOCUMENT

ANNPR 2008 - P. Gallinari

slide48

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Example

TITLE

DOCUMENT

ANNPR 2008 - P. Gallinari

slide49

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Example

TITLE

DOCUMENT

ANNPR 2008 - P. Gallinari

slide50

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Example

Francis MAES

TITLE

AUTHOR

DOCUMENT

ANNPR 2008 - P. Gallinari

slide51

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Example

Francis MAES

TITLE

AUTHOR

DOCUMENT

ANNPR 2008 - P. Gallinari

slide52

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Francis MAES

Title of the

section

TITLE

AUTHOR

SECTION

Example

TITLE

DOCUMENT

ANNPR 2008 - P. Gallinari

slide53

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Francis MAES

Title of the

section

TITLE

AUTHOR

SECTION

Example

TITLE

DOCUMENT

ANNPR 2008 - P. Gallinari

slide54

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Francis MAES

Title of the

section

TITLE

AUTHOR

SECTION

Example

Welcome to

INEX

TEXT

TITLE

DOCUMENT

ANNPR 2008 - P. Gallinari

slide55

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Francis MAES

Title of the

section

TITLE

AUTHOR

SECTION

Example

Welcome to

INEX

TEXT

TITLE

DOCUMENT

ANNPR 2008 - P. Gallinari

slide56

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Francis MAES

Title of the

section

TITLE

AUTHOR

SECTION

Example

Welcome to

INEX

This is a

footnote

TEXT

TITLE

DOCUMENT

ANNPR 2008 - P. Gallinari

slide57

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Francis MAES

Title of the

section

TITLE

AUTHOR

SECTION

Example

Welcome to

INEX

This is a

footnote

TEXT

TITLE

DOCUMENT

ANNPR 2008 - P. Gallinari

slide58

HTML

HEAD

BODY

TITLE

IT

H1

P

FONT

Example

Francis MAES

Title of the

section

Welcome

to INEX

This is a

footnote

Francis MAES

Title of the

section

TITLE

AUTHOR

SECTION

INPUT DOCUMENT

Example

Welcome to

INEX

TEXT

TARGET DOCUMENT

TITLE

DOCUMENT

ANNPR 2008 - P. Gallinari

results
Results

ANNPR 2008 - P. Gallinari

conclusion
Conclusion
  • Learn to explore the state space of the problem
  • Alternative to DP or classical search algorithms
  • Could be used with any decomposable cost function
  • Fewer assumptions than other methods (e.g. no Markov hypothesis)
  • Leads to solutions that may scale with complex and large SP problems

ANNPR 2008 - P. Gallinari

more on this
More on this
  • Paper at European Workshop on Reinforcement Learning 2008
  • Preliminary paper ECML 2007
  • Library available
  • http://nieme.lip6.fr

ANNPR 2008 - P. Gallinari

ad