1 / 61

Learning structured ouputs A reinforcement learning approach

Learning structured ouputs A reinforcement learning approach. P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr www-connex.lip6.fr University Pierre et Marie Curie – Paris – Fr Computer Science Lab. Outline. Motivation and examples Approaches for structured learning

chidi
Download Presentation

Learning structured ouputs A reinforcement learning approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning structured ouputsA reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr www-connex.lip6.fr University Pierre et Marie Curie – Paris – Fr Computer Science Lab.

  2. Outline • Motivation and examples • Approaches for structured learning • Generative models • Discriminant models • Reinforcement learning for structured prediction • Experiments ANNPR 2008 - P. Gallinari

  3. Machine learning and structured data • Different types of problems • Model, classify, cluster structured data • Predict structured outputs • Learn to associate structured representations • Structured data and applications in many domains • chemistry, biology, natural language, web, social networks, data bases, etc ANNPR 2008 - P. Gallinari

  4. Coord. Conj. noun Verb 3rd pers adverb Noun plural determiner Verb gerund Verb plural adjective Sequence labeling: POS ANNPR 2008 - P. Gallinari

  5. adverbial Phrase Noun Phrase Noun Phrase Verb Phrase Noun Phrase Segmentation + labeling: syntactic chunking (Washington Univ. tagger) ANNPR 2008 - P. Gallinari

  6. Syntaxic Parsing (Stanford Parser) ANNPR 2008 - P. Gallinari

  7. Tree mapping: • Problem • query heterogeneous XML collections • Web information extraction • Need to know the correspondence between the structured representations  usually made by hand • Learn the correspondence between the different sources Labeled tree mapping problem ANNPR 2008 - P. Gallinari

  8. Others • Taxonomies • Social networks • Adversial computing: Webspam, Blogspam, … • Translation • Biology • ….. ANNPR 2008 - P. Gallinari

  9. Is structure really useful ?Can we make use of structure ? • Yes • Evidence from many domains or applications • Mandatory for many problems • e.g. 10 K classes classification problem • Yes but • Complex or long term dependencies often correspond to rare events • Practical evidence for large size problems • Simple models sometimes offer competitive results • Information retrieval • Speech recognition, etc ANNPR 2008 - P. Gallinari

  10. Why is structure prediction difficult • Size of the output space • # of potential labeling for a sequence • # of potential joint labeling and segmentation • # of potential labeled trees for an input sequence • … • Exponential output space • Inference often amounts at a combinatorial search problem • Scaling issues in most models ANNPR 2008 - P. Gallinari

  11. Structured learning • X, Y : input and output spaces • Structured output • y  Ydecomposes into parts of variable size • y = (y1, y2,…, yT) • Dependencies • Relations between y parts • Local, long term, global • Cost function • O/ 1 loss: • Hamming loss: • F-score • BLEU etc ANNPR 2008 - P. Gallinari

  12. General approach • Predictive approach: • where F : X x Y R is a score function used to rank potential outputs • F trained to optimize some loss function • Inference problem • |Y| sometimes exponential • Argmax is often intractable: hypothesis • decomposability of the score function over the parts of y • Restricted set of outputs ANNPR 2008 - P. Gallinari

  13. Structured algorithms differ by: • Feature encoding • Hypothesis on the output structure • Hypothesis on the cost function • Type of structure prediction problem ANNPR 2008 - P. Gallinari

  14. Three main families of SP models Generative models Discriminant models Search models

  15. Generative models • 30 years of generative models • Hidden Markov Models • Many extensions • Hierarchical HMMs, Factorial HMMs, etc • Probabilistic Context Free grammars • Tree models, Graphs models ANNPR 2008 - P. Gallinari

  16. Usual hypothesis • Features : “natural” encoding of the input • Local dependency hypothesis • On the outputs (Markov) • On the inputs • Cost function • Usually joint likelihood • Decomposes, e.g. sum of local cost on each subpart • Inference and learning usually use dynamic programming • HMMs • Decoding complexity O(n|Q|2) for a sequence of length n and |Q| states • PCFGs • Decoding Complexity: O(m3n3), n = length of the sentence, m = # non terminals in the grammar • Prohibitive for many problems ANNPR 2008 - P. Gallinari

  17. Summary generative SP models • Well known approaches • Large number of existing models • Local hypothesis • Often imply using dynamic programming • Inference • learning ANNPR 2008 - P. Gallinari

  18. Discriminant models • Structured Perceptron (Collins 2002) • Breakthrough • Large margin methods (Tsochantaridis et al. 2004, Taskar 2004) • Many others now • Considered as an extension of multi-class clasification ANNPR 2008 - P. Gallinari

  19. Usual hypothesis • Joint representation of input – output Φ(x, y) • Encode potential dependencies among and between input and output • e.g. histogram of state transitions observed in training set, frequency of (xi,yj), POS tags, etc • Large feature sets (102 -> 104) • Linear score function: • Decomposability of features set (outputs) and of the loss function ANNPR 2008 - P. Gallinari

  20. Extension of large margin methods • 2 problems • Classical 0/1 loss is meaningless with structured outputs • Structured outputs is not a simple extension of multiclass classification • Generalize max margin principle to other loss functions than 0/1 loss • Number of constraints for the QP problem is proportional to |Y|, i.e. potentially exponential • Different solutions • Consider only a polynomial number of “active” constraints -> bounds on the optimality of the solution • Learning require solving an Argmax problem at each iteration • i.e. Dynamic programming ANNPR 2008 - P. Gallinari

  21. Summary: discriminant approaches • Hypothesis • Local dependencies for the output • Decomposability of the loss function • Long term dependencies in the input • Nice convergence properties + bounds • Complexity • Learning often does not scale ANNPR 2008 - P. Gallinari

  22. Reinforcement learning for structured outputs

  23. Precursors • Incremental Parsing, Collins 2004 • SEARN, Daume et al. 2006 • Reinforcement Learning, Maes et al. 2007 ANNPR 2008 - P. Gallinari

  24. General idea • The structured output will be built incrementally • ŷ = (ŷ1, ŷ2,…, ŷT) • Different from solving directly the prediction preblem • It will by build via Machine learning • Loss : • Goal • Construct incrementally ŷ so as to minimize the loss • Training: learn how to search the solution space • Inference: build ŷ as a sequence of successive decisions ANNPR 2008 - P. Gallinari

  25. Example : sequence labelling Example 2 labels R et B Search space : (input sequence, {sequence of labels}) For a size 3 sequence x = x1x2x3 : A node represents a state in the search space ANNPR 2008 - P. Gallinari

  26. Example:actions and decisions Sequence of size 3 target : Actions Π(s, a) Π(s, a) Π(s, a) Π(s, a) : decision function Π(s, a) ANNPR 2008 - P. Gallinari

  27. Example : expected loss C=0 CT=1 Sequence of size 3 target : C=0 C=1 C=1 CT=2 C=0 CT=2 C=1 C=1 CT=3 C=0 CT=0 C=0 C=0 C=1 CT=1 Loss does not always separate !! C=0 C=1 CT=1 C=1 CT=2 ANNPR 2008 - P. Gallinari

  28. Inference • Suppose we have a policy function F which decides at each step which action to take • Inference could be performed by computing • ŷ1= F(x,. ), ŷt= F(ŷ1,… ŷt-1),…, ŷT= F(ŷ1,… , ŷT-1) • ŷ = F(ŷ1,… , ŷT) • No Dynamic programming needed ANNPR 2008 - P. Gallinari

  29. Example: state space exploration guided by local costs C=0 CT=1 Sequence of size 3 target : C=0 C=1 C=1 CT=2 C=0 CT=2 C=1 C=1 CT=3 C=0 CT=0 C=0 Goal: generalize to unseen situation C=0 C=1 CT=1 C=0 C=1 CT=1 C=1 CT=2 ANNPR 2008 - P. Gallinari

  30. Training • Learn a policyF • From a set of (input, output) pairs • Which takes the optimal action at each state • Which is able to generalize to unknown situations ANNPR 2008 - P. Gallinari

  31. Reinforcement learning for SP • Formalizes search based ideas using Markov Decision Processes as model and Reinforcement Learning for training • Provides a general framework for this approach • Many RL algorithm could be used for training • Differences with classical use of RL • Specific hypothesis • Deterministic environment • Problem size • State : up to 106 variables • Generalization properties ANNPR 2008 - P. Gallinari

  32. Markov Decision Process • A MDP is a tuple (S, A, P, R) • S is the State space, • A is the action space, • P is a transition function describing the dynamic of the environment • P(s, a, s’) = P(st+1 = s’| st = s, at = a) • R is a reward function • R(s, a, s’) = E[rt|st+1 = s’, st = s, at = a) ANNPR 2008 - P. Gallinari

  33. Example: Prediction problem ANNPR 2008 - P. Gallinari

  34. Example: State space • Deterministic MDP • States: input + partial output • Actions: elementary modifications of the partial output • Rewards: quality of prediction ANNPR 2008 - P. Gallinari

  35. Example: partially observable reward • The reward function is only partially observable : limited set of training (input, output) ANNPR 2008 - P. Gallinari

  36. Learning the MPDP • The reward function cannot be computed on the whole MDP • Approximated RL • Learn the policy on a subspace • Generalize to the whole space • The policy is learned as a linear function • Φ(s, a) is a joint description of (s, a) • θ is a ,parameter vector • Learning algorithm • Sarsa, Q-learning, etc ANNPR 2008 - P. Gallinari

  37. Inference • State at step t • Initial input • Partial output • Once the MDP is learned • Using the state at step t, compute state t+1 • Until a solution is reached • Greedy approach • Only the best action is chosen at each time ANNPR 2008 - P. Gallinari

  38. Structured outputs and MDP • State st • Input x + partial output ŷt • Initial state : (x,) • Actions • Task dependent • POS: new tag for the current word • Tree mapping: insert a new path in a partial tree • Inference • Greedy approach • Only the best action is chosen at each time • Reward • Final: • Heuristic: ANNPR 2008 - P. Gallinari

  39. Exemple: sequence labeling • Left Right model • Actions: label • Order free model • Actions: label + position • Loss : Hamming cost or F score • Tasks • Named entity recognition (Shared task at CoNNL 2002 - 8 000 train, 1500 test) • Chunking – Noun Phrases (CoNNL 2002) • Handwriten word recognition (5000 train, 1000 test) • Complexity of inference • O(sequence size * number of labels) ANNPR 2008 - P. Gallinari

  40. ANNPR 2008 - P. Gallinari

  41. Tree mapping: Document mapping • Problem Labeled tree mapping problem Different instances Flat text to XML HTML to XML XML to XML…. ANNPR 2008 - P. Gallinari

  42. Document mapping problem • Central issue: Complexity • Large collections • Large feature space: 103 to 106 • Large search space (exponential) • Other SP methods fail ANNPR 2008 - P. Gallinari

  43. XML structuration • Action • Attach path ending with the current leaf to a position in the current partial tree • Φ(a,s) encodes a series of potential (state, action) pairs • Loss: F-Score for trees ANNPR 2008 - P. Gallinari

  44. HTML HEAD BODY TITLE IT H1 P FONT Example Francis MAES Title of the section Welcome to INEX This is a footnote Francis MAES Title of the section TITLE AUTHOR SECTION INPUT DOCUMENT Example Welcome to INEX TEXT TARGET DOCUMENT TITLE DOCUMENT ANNPR 2008 - P. Gallinari

  45. HTML HEAD BODY TITLE IT H1 P FONT Example Francis MAES Title of the section Welcome to INEX This is a footnote ANNPR 2008 - P. Gallinari

  46. HTML HEAD BODY TITLE IT H1 P FONT Example Francis MAES Title of the section Welcome to INEX This is a footnote ANNPR 2008 - P. Gallinari

  47. HTML HEAD BODY TITLE IT H1 P FONT Example Francis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT ANNPR 2008 - P. Gallinari

  48. HTML HEAD BODY TITLE IT H1 P FONT Example Francis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT ANNPR 2008 - P. Gallinari

  49. HTML HEAD BODY TITLE IT H1 P FONT Example Francis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT ANNPR 2008 - P. Gallinari

  50. HTML HEAD BODY TITLE IT H1 P FONT Example Francis MAES Title of the section Welcome to INEX This is a footnote Example Francis MAES TITLE AUTHOR DOCUMENT ANNPR 2008 - P. Gallinari

More Related