Learning structured ouputs a reinforcement learning approach
This presentation is the property of its rightful owner.
Sponsored Links
1 / 61

Learning structured ouputs A reinforcement learning approach PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on
  • Presentation posted in: General

Learning structured ouputs A reinforcement learning approach. P. Gallinari F. Maes, L. Denoyer [email protected] www-connex.lip6.fr University Pierre et Marie Curie – Paris – Fr Computer Science Lab. Outline. Motivation and examples Approaches for structured learning

Download Presentation

Learning structured ouputs A reinforcement learning approach

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Learning structured ouputs a reinforcement learning approach

Learning structured ouputsA reinforcement learning approach

P. Gallinari

F. Maes, L. Denoyer

[email protected]

www-connex.lip6.fr

University Pierre et Marie Curie – Paris – Fr

Computer Science Lab.


Outline

Outline

  • Motivation and examples

  • Approaches for structured learning

    • Generative models

    • Discriminant models

  • Reinforcement learning for structured prediction

  • Experiments

ANNPR 2008 - P. Gallinari


Machine learning and structured data

Machine learning and structured data

  • Different types of problems

    • Model, classify, cluster structured data

    • Predict structured outputs

    • Learn to associate structured representations

  • Structured data and applications in many domains

    • chemistry, biology, natural language, web, social networks, data bases, etc

ANNPR 2008 - P. Gallinari


Sequence labeling pos

Coord. Conj.

noun

Verb 3rd pers

adverb

Noun plural

determiner

Verb gerund

Verb plural

adjective

Sequence labeling: POS

ANNPR 2008 - P. Gallinari


Segmentation labeling syntactic chunking washington univ tagger

adverbial Phrase

Noun Phrase

Noun Phrase

Verb Phrase

Noun Phrase

Segmentation + labeling: syntactic chunking (Washington Univ. tagger)

ANNPR 2008 - P. Gallinari


Syntaxic parsing stanford parser

Syntaxic Parsing (Stanford Parser)

ANNPR 2008 - P. Gallinari


Tree mapping

Tree mapping:

  • Problem

    • query heterogeneous XML collections

    • Web information extraction

  • Need to know the correspondence between the structured representations  usually made by hand

  • Learn the correspondence between the different sources

    Labeled tree mapping problem

  • ANNPR 2008 - P. Gallinari


    Others

    Others

    • Taxonomies

    • Social networks

    • Adversial computing: Webspam, Blogspam, …

    • Translation

    • Biology

    • …..

    ANNPR 2008 - P. Gallinari


    Is structure really useful can we make use of structure

    Is structure really useful ?Can we make use of structure ?

    • Yes

      • Evidence from many domains or applications

      • Mandatory for many problems

        • e.g. 10 K classes classification problem

    • Yes but

      • Complex or long term dependencies often correspond to rare events

      • Practical evidence for large size problems

        • Simple models sometimes offer competitive results

          • Information retrieval

          • Speech recognition, etc

    ANNPR 2008 - P. Gallinari


    Why is structure prediction difficult

    Why is structure prediction difficult

    • Size of the output space

      • # of potential labeling for a sequence

      • # of potential joint labeling and segmentation

      • # of potential labeled trees for an input sequence

      • Exponential output space

        • Inference often amounts at a combinatorial search problem

        • Scaling issues in most models

    ANNPR 2008 - P. Gallinari


    Structured learning

    Structured learning

    • X, Y : input and output spaces

    • Structured output

      • y  Ydecomposes into parts of variable size

        • y = (y1, y2,…, yT)

      • Dependencies

        • Relations between y parts

        • Local, long term, global

    • Cost function

      • O/ 1 loss:

      • Hamming loss:

      • F-score

      • BLEU etc

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    • General approach

      • Predictive approach:

      • where F : X x Y R is a score function used to rank potential outputs

      • F trained to optimize some loss function

    • Inference problem

      • |Y| sometimes exponential

      • Argmax is often intractable: hypothesis

        • decomposability of the score function over the parts of y

        • Restricted set of outputs

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    • Structured algorithms differ by:

      • Feature encoding

      • Hypothesis on the output structure

      • Hypothesis on the cost function

      • Type of structure prediction problem

    ANNPR 2008 - P. Gallinari


    Three main families of sp models

    Three main families of SP models

    Generative models

    Discriminant models

    Search models


    Generative models

    Generative models

    • 30 years of generative models

    • Hidden Markov Models

      • Many extensions

        • Hierarchical HMMs, Factorial HMMs, etc

    • Probabilistic Context Free grammars

    • Tree models, Graphs models

    ANNPR 2008 - P. Gallinari


    Usual hypothesis

    Usual hypothesis

    • Features : “natural” encoding of the input

    • Local dependency hypothesis

      • On the outputs (Markov)

      • On the inputs

    • Cost function

      • Usually joint likelihood

      • Decomposes, e.g. sum of local cost on each subpart

    • Inference and learning usually use dynamic programming

      • HMMs

        • Decoding complexity O(n|Q|2) for a sequence of length n and |Q| states

      • PCFGs

        • Decoding Complexity: O(m3n3), n = length of the sentence, m = # non terminals in the grammar

      • Prohibitive for many problems

    ANNPR 2008 - P. Gallinari


    Summary generative sp models

    Summary generative SP models

    • Well known approaches

    • Large number of existing models

    • Local hypothesis

    • Often imply using dynamic programming

      • Inference

      • learning

    ANNPR 2008 - P. Gallinari


    Discriminant models

    Discriminant models

    • Structured Perceptron (Collins 2002)

    • Breakthrough

      • Large margin methods (Tsochantaridis et al. 2004, Taskar 2004)

    • Many others now

    • Considered as an extension of multi-class clasification

    ANNPR 2008 - P. Gallinari


    Usual hypothesis1

    Usual hypothesis

    • Joint representation of input – output Φ(x, y)

      • Encode potential dependencies among and between input and output

        • e.g. histogram of state transitions observed in training set, frequency of (xi,yj), POS tags, etc

      • Large feature sets (102 -> 104)

    • Linear score function:

    • Decomposability of features set (outputs) and of the loss function

    ANNPR 2008 - P. Gallinari


    Extension of large margin methods

    Extension of large margin methods

    • 2 problems

      • Classical 0/1 loss is meaningless with structured outputs

        • Structured outputs is not a simple extension of multiclass classification

        • Generalize max margin principle to other loss functions than 0/1 loss

      • Number of constraints for the QP problem is proportional to |Y|, i.e. potentially exponential

        • Different solutions

          • Consider only a polynomial number of “active” constraints -> bounds on the optimality of the solution

    • Learning require solving an Argmax problem at each iteration

      • i.e. Dynamic programming

    ANNPR 2008 - P. Gallinari


    Summary discriminant approaches

    Summary: discriminant approaches

    • Hypothesis

      • Local dependencies for the output

      • Decomposability of the loss function

    • Long term dependencies in the input

    • Nice convergence properties + bounds

    • Complexity

      • Learning often does not scale

    ANNPR 2008 - P. Gallinari


    Reinforcement learning for structured outputs

    Reinforcement learning for structured outputs


    Precursors

    Precursors

    • Incremental Parsing, Collins 2004

    • SEARN, Daume et al. 2006

    • Reinforcement Learning, Maes et al. 2007

    ANNPR 2008 - P. Gallinari


    General idea

    General idea

    • The structured output will be built incrementally

      • ŷ = (ŷ1, ŷ2,…, ŷT)

      • Different from solving directly the prediction preblem

    • It will by build via Machine learning

    • Loss :

    • Goal

      • Construct incrementally ŷ so as to minimize the loss

      • Training: learn how to search the solution space

      • Inference: build ŷ as a sequence of successive decisions

    ANNPR 2008 - P. Gallinari


    Example sequence labelling

    Example : sequence labelling

    Example

    2 labels R et B

    Search space :

    (input sequence, {sequence of labels})

    For a size 3 sequence

    x = x1x2x3 :

    A node represents a state in the search space

    ANNPR 2008 - P. Gallinari


    Example actions and decisions

    Example:actions and decisions

    Sequence of size 3

    target :

    Actions

    Π(s, a)

    Π(s, a)

    Π(s, a)

    Π(s, a) : decision function

    Π(s, a)

    ANNPR 2008 - P. Gallinari


    Example expected loss

    Example : expected loss

    C=0

    CT=1

    Sequence of size 3

    target :

    C=0

    C=1

    C=1

    CT=2

    C=0

    CT=2

    C=1

    C=1

    CT=3

    C=0

    CT=0

    C=0

    C=0

    C=1

    CT=1

    Loss does not always separate !!

    C=0

    C=1

    CT=1

    C=1

    CT=2

    ANNPR 2008 - P. Gallinari


    Inference

    Inference

    • Suppose we have a policy function F which decides at each step which action to take

    • Inference could be performed by computing

      • ŷ1= F(x,. ), ŷt= F(ŷ1,… ŷt-1),…, ŷT= F(ŷ1,… , ŷT-1)

      • ŷ = F(ŷ1,… , ŷT)

    • No Dynamic programming needed

    ANNPR 2008 - P. Gallinari


    Example state space exploration guided by local costs

    Example: state space exploration guided by local costs

    C=0

    CT=1

    Sequence of size 3

    target :

    C=0

    C=1

    C=1

    CT=2

    C=0

    CT=2

    C=1

    C=1

    CT=3

    C=0

    CT=0

    C=0

    Goal:

    generalize to unseen situation

    C=0

    C=1

    CT=1

    C=0

    C=1

    CT=1

    C=1

    CT=2

    ANNPR 2008 - P. Gallinari


    Training

    Training

    • Learn a policyF

      • From a set of (input, output) pairs

      • Which takes the optimal action at each state

      • Which is able to generalize to unknown situations

    ANNPR 2008 - P. Gallinari


    Reinforcement learning for sp

    Reinforcement learning for SP

    • Formalizes search based ideas using Markov Decision Processes as model and Reinforcement Learning for training

    • Provides a general framework for this approach

    • Many RL algorithm could be used for training

    • Differences with classical use of RL

      • Specific hypothesis

        • Deterministic environment

      • Problem size

        • State : up to 106 variables

      • Generalization properties

    ANNPR 2008 - P. Gallinari


    Markov decision process

    Markov Decision Process

    • A MDP is a tuple (S, A, P, R)

      • S is the State space,

      • A is the action space,

      • P is a transition function describing the dynamic of the environment

        • P(s, a, s’) = P(st+1 = s’| st = s, at = a)

      • R is a reward function

        • R(s, a, s’) = E[rt|st+1 = s’, st = s, at = a)

    ANNPR 2008 - P. Gallinari


    Example prediction problem

    Example: Prediction problem

    ANNPR 2008 - P. Gallinari


    Example state space

    Example: State space

    • Deterministic MDP

      • States: input + partial output

      • Actions: elementary modifications of the partial output

      • Rewards: quality of prediction

    ANNPR 2008 - P. Gallinari


    Example partially observable reward

    Example: partially observable reward

    • The reward function is only partially observable : limited set of training (input, output)

    ANNPR 2008 - P. Gallinari


    Learning the mpdp

    Learning the MPDP

    • The reward function cannot be computed on the whole MDP

    • Approximated RL

      • Learn the policy on a subspace

      • Generalize to the whole space

    • The policy is learned as a linear function

      • Φ(s, a) is a joint description of (s, a)

      • θ is a ,parameter vector

    • Learning algorithm

      • Sarsa, Q-learning, etc

    ANNPR 2008 - P. Gallinari


    Inference1

    Inference

    • State at step t

      • Initial input

      • Partial output

    • Once the MDP is learned

      • Using the state at step t, compute state t+1

      • Until a solution is reached

    • Greedy approach

      • Only the best action is chosen at each time

    ANNPR 2008 - P. Gallinari


    Structured outputs and mdp

    Structured outputs and MDP

    • State st

      • Input x + partial output ŷt

      • Initial state : (x,)

    • Actions

      • Task dependent

        • POS: new tag for the current word

        • Tree mapping: insert a new path in a partial tree

    • Inference

      • Greedy approach

        • Only the best action is chosen at each time

    • Reward

      • Final:

      • Heuristic:

    ANNPR 2008 - P. Gallinari


    Exemple sequence labeling

    Exemple: sequence labeling

    • Left Right model

      • Actions: label

    • Order free model

      • Actions: label + position

    • Loss : Hamming cost or F score

    • Tasks

      • Named entity recognition (Shared task at CoNNL 2002 - 8 000 train, 1500 test)

      • Chunking – Noun Phrases (CoNNL 2002)

      • Handwriten word recognition (5000 train, 1000 test)

    • Complexity of inference

      • O(sequence size * number of labels)

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    ANNPR 2008 - P. Gallinari


    Tree mapping document mapping

    Tree mapping: Document mapping

    • Problem

      Labeled tree mapping problem

      Different instances

      Flat text to XML

      HTML to XML

      XML to XML….

    ANNPR 2008 - P. Gallinari


    Document mapping problem

    Document mapping problem

    • Central issue: Complexity

      • Large collections

      • Large feature space: 103 to 106

      • Large search space (exponential)

      • Other SP methods fail

    ANNPR 2008 - P. Gallinari


    Xml structuration

    XML structuration

    • Action

      • Attach path ending with the current leaf to a position in the current partial tree

      • Φ(a,s) encodes a series of potential (state, action) pairs

    • Loss: F-Score for trees

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Francis MAES

    Title of the

    section

    TITLE

    AUTHOR

    SECTION

    INPUT DOCUMENT

    Example

    Welcome to

    INEX

    TEXT

    TARGET DOCUMENT

    TITLE

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Example

    TITLE

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Example

    TITLE

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Example

    TITLE

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Example

    Francis MAES

    TITLE

    AUTHOR

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Example

    Francis MAES

    TITLE

    AUTHOR

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Francis MAES

    Title of the

    section

    TITLE

    AUTHOR

    SECTION

    Example

    TITLE

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Francis MAES

    Title of the

    section

    TITLE

    AUTHOR

    SECTION

    Example

    TITLE

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Francis MAES

    Title of the

    section

    TITLE

    AUTHOR

    SECTION

    Example

    Welcome to

    INEX

    TEXT

    TITLE

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Francis MAES

    Title of the

    section

    TITLE

    AUTHOR

    SECTION

    Example

    Welcome to

    INEX

    TEXT

    TITLE

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Francis MAES

    Title of the

    section

    TITLE

    AUTHOR

    SECTION

    Example

    Welcome to

    INEX

    This is a

    footnote

    TEXT

    TITLE

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Francis MAES

    Title of the

    section

    TITLE

    AUTHOR

    SECTION

    Example

    Welcome to

    INEX

    This is a

    footnote

    TEXT

    TITLE

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Learning structured ouputs a reinforcement learning approach

    HTML

    HEAD

    BODY

    TITLE

    IT

    H1

    P

    FONT

    Example

    Francis MAES

    Title of the

    section

    Welcome

    to INEX

    This is a

    footnote

    Francis MAES

    Title of the

    section

    TITLE

    AUTHOR

    SECTION

    INPUT DOCUMENT

    Example

    Welcome to

    INEX

    TEXT

    TARGET DOCUMENT

    TITLE

    DOCUMENT

    ANNPR 2008 - P. Gallinari


    Results

    Results

    ANNPR 2008 - P. Gallinari


    Conclusion

    Conclusion

    • Learn to explore the state space of the problem

    • Alternative to DP or classical search algorithms

    • Could be used with any decomposable cost function

    • Fewer assumptions than other methods (e.g. no Markov hypothesis)

    • Leads to solutions that may scale with complex and large SP problems

    ANNPR 2008 - P. Gallinari


    More on this

    More on this

    • Paper at European Workshop on Reinforcement Learning 2008

    • Preliminary paper ECML 2007

    • Library available

    • http://nieme.lip6.fr

    ANNPR 2008 - P. Gallinari


  • Login