1 / 102

Learning structured ouputs

Learning structured ouputs. P. Gallinari Patrick.gallinari@lip6.fr www-connex.lip6.fr University Pierre et Marie Curie – Paris – Fr NATO ASI Mining Massive Data Sets for security. Outline. Motivation and examples Approaches for structured learning Generative models Discriminant models

glynn
Download Presentation

Learning structured ouputs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning structured ouputs P. Gallinari Patrick.gallinari@lip6.fr www-connex.lip6.fr University Pierre et Marie Curie – Paris – Fr NATO ASI Mining Massive Data Sets for security MMDSS - P. Gallinari

  2. Outline • Motivation and examples • Approaches for structured learning • Generative models • Discriminant models • Search models MMDSS - P. Gallinari

  3. Machine learning and structured data • Different types of problems • Model, classify, cluster structured data • Predict structured outputs • Learn to associate structured representations • Structured data and applications in many domains • chemistry, biology, natural language, web, social networks, data bases, etc MMDSS - P. Gallinari

  4. Coord. Conj. noun Verb 3rd pers adverb Noun plural determiner Verb gerund Verb plural adjective Sequence labeling: POS MMDSS - P. Gallinari

  5. PENN tag set MMDSS - P. Gallinari

  6. adverbial Phrase Noun Phrase Noun Phrase Verb Phrase Noun Phrase Segmentation + labeling: syntactic chunking (Washington Univ. tagger) MMDSS - P. Gallinari

  7. Segmentation + labeling: Named Entity recognition • Entities • locations, persons, organizations • Time expressions: dates, times • Numeric expression: $ amount, percentages • NEW YORK (Reuters) -Goldman Sachs Group Inc. agreed on Thursday to pay $9.3 million to settle charges related to a former economist …. Goldman's GS.N settlement with securities regulators stemmed from charges that it failed to properly oversee John Youngdahl, a one-time economist …. James Comey, U.S. Attorney for the Southern District of New York, announced on Thursday a seven-count indictment of Youngdahl for insider trading, making false statements, perjury, and other charges. Goldman agreed to pay a $5 million fine and disgorge $4.3 million from illegal trading profits. MMDSS - P. Gallinari

  8. Information extraction MMDSS - P. Gallinari

  9. Syntaxic Parsing (Stanford Parser) MMDSS - P. Gallinari

  10. Document mapping problem • Problem: query heterogeneous XML databases or collections • Need to know the correspondence between the structured representations  uually made by hand • Learn the correspondence between the different sources Labeled tree mapping problem MMDSS - P. Gallinari

  11. Others • Taxonomies • Social networks • Adversial computing: Webspam, Blogspam, … • Translation • Biology • ….. MMDSS - P. Gallinari

  12. Is structure really useful ?Can we make use of structure ? • Yes • Evidence from many domains or applications • Mandatory for many problems • e.g. 10 K classes classification problem • Yes but • Complex or long term dependencies often correspond to rare events • Practical evidence for large size problems • Simple models sometimes offer competitive results • Information retrieval • Speech recognition, etc MMDSS - P. Gallinari

  13. Structured learning • X, Y : input and output spaces • Structured output • y  Ydecomposes into parts of variable size • y = (y1, y2,…, yT) • Dependencies • Relations between y parts • Local, long term, global • Cost function • O/ 1 loss: • Hamming loss: • F-score: • BLEU etc MMDSS - P. Gallinari

  14. General approach • Predictive approach: • where F : X x Y R is a score function used to rank potential outputs • F trained to optimize some loss function • Inference problem • |Y| sometimes exponential • Argmax is often intractable: hypothesis • decomposability of the score function over the parts of y • Restricted set of outputs MMDSS - P. Gallinari

  15. Structured algorithms differ by: • Feature encoding • Hypothesis on the output structure • Hypothesis on the cost function MMDSS - P. Gallinari

  16. Generative models Hidden Markov Models Probabilistic Context Free grammars Tree labeling model

  17. Usual hypothesis • Features : “natural” encoding of the input • Hypothesis on the output structure : local output dependencies, Markov property • Score decomposes, e.g. sum of local cost on each subpart • Inference : usually dynamic programming MMDSS - P. Gallinari

  18. HMMs • Sequence labeling – segmentation • Dependencies • Outputs : Markov • independence • Decoding and learning • Dynamic programming • Viterbi Argmax …. • Forward Backward • Decoding complexity O(n|Q|2) for a sequence of length n and |Q| states MMDSS - P. Gallinari

  19. Start State space for an input sequence of size 3 • Consider a simple HMM MMDSS - P. Gallinari

  20. Probabilistic Context Free Grammar (after Manning & Shultze) • Set of terminals {w1,…,wv} • Set of non terminals {N1,…,Nn} • N1: start symbol • Set of rules {Ni zi} with zi sequence of terminals and non terminals • To each rule is associated a probability P(Ni zi) • Special case: Chomsky Normal Form grammars • zi = wj • zi = NkNm MMDSS - P. Gallinari

  21. S NP VP VP astronomers V V NP NP saw NP PP PP stars P NP with ears MMDSS - P. Gallinari

  22. Nj Wp……… Wq • Notations • Sentence • Wp,q= wpwp+1…wq • Nidominates sequence Wp,q if Ni may rewrite wpwp+1…wq • Assumptions • Context free • Probability of a subtree does not depend on words outside the subtree • Independence from N.. Ancestors • The probability does not depend on nodes in the derivation outside the subtree MMDSS - P. Gallinari

  23. N1 Nj W1…Wk-1 Wk ….…… Wl Wl+1…Wn Inside and outside probabilities • As for the forward – backward variables in HMMS, 2 probabilities may be defined • Inside • Probability of generating wk…wl starting from Nj • Outside • Probability of generating Nj and all words outside wk…wl MMDSS - P. Gallinari

  24. Nj Np Nq Wk……… Wm Wm+1 ….....… Wl Probability of a sentence: CKY algorithm • Probability of sentence w1,n • Left Right induction on the sequence • For k = 1 .. n • For l= k+1 .. n, calculate MMDSS - P. Gallinari

  25. Inference and learning • Inference • Similar to probability of a sentence with Max instead of S • Complexity: O(m3n3) • n = length of the sentence, m = # non terminals in the grammar • Learning • Inside – outside • Each step is O(m3n3) MMDSS - P. Gallinari

  26. Tree generative models Classification / clustering of structured documents (Denoyer et al. 2004) Document annotation / conversion (Wisniewski et al. 2006)

  27. Context-XML semi-structured documents <article> <hdr> <bdy> <fig> <fgc> text <sec> <st> text <p> text MMDSS - P. Gallinari

  28. Structural probability Content probability Document model ! Scalability ! MMDSS - P. Gallinari

  29. Document Model: Structure • Belief Networks MMDSS - P. Gallinari

  30. Document Model: Content • Model for each node • 1st order dependency • Use of a local generative model for each label MMDSS - P. Gallinari

  31. Final network MMDSS - P. Gallinari

  32. Likelihood maximization Discriminant learning Logistic function Error minimization Fisher Kernel Different learning techniques MMDSS - P. Gallinari

  33. Document mapping problem • Problem • Learn from examples how to map heterogeneous sources onto a predefined target schema • Preserve the document semantic • Sources: semistructured, HTML, PDF, flat text, etc Labeled tree mapping problem Different instances Flat text to XML HTML to XML XML to XML…. MMDSS - P. Gallinari

  34. Document mapping problem • Central issue: Complexity • Large collections • Large feature space: 103 to 106 • Large search space (exponential) • Approach • Learn generative models of XML target documents from a training set • Decoding of unknown sources according to the learned model MMDSS - P. Gallinari

  35. Problem formulation Given ST a target format dsin(d) an input document Find the most probable target document Learned transformation model Decoding MMDSS - P. Gallinari

  36. General restructuration model MMDSS - P. Gallinari

  37. Example : HTML to XML (Tree annotation) • Hypothesis • Input document • HTML tags mostly for visualization • Remove tags • Keep only the segmentation (leaves) • Annotation • Leaves are the same in the HTML and XML document • Target document model: node label depends only on its local context • Context = content, left sibling, father MMDSS - P. Gallinari

  38. Model and training • Probability of target tree • Solve • Exact Dynamic Programming decoding • O(|Leaf nodes|3.|tags|) • Approximate solution with LASO (Hal Daume ICML 2005) • O(|Leaf nodes|.|tags||tree nodes|) MMDSS - P. Gallinari

  39. Experiments : HTML to XML • IEEE collection / INEX corpus • 12 K documents, • Average: 500 leaf nodes, 200 int nodes, 139 tags • Movie DB • 10 K movie descriptions (IMDB) • Average: 100 leaf nodes, 35 int. nodes, 28 tags • Shakespeare 39 plays • Few doc, but: • Average: 4100 leaf nodes, 850 int nodes, 21 tags • Mini-Shakespeare • Randomly chosen 60 scenes from the plays • 85 leaf nodes, 20 int. nodes, 7 tags • For all collections ½ train, ½ test MMDSS - P. Gallinari

  40. Performance MMDSS - P. Gallinari

  41. MMDSS - P. Gallinari

  42. Summary • 30 years of generative models • Hierarchical HMMs, Factorial HMMs, etc • Local dependency hypothesis • On the outputs • On the inputs • Inference and learning often use dynamic programming • Prohibitive for some/ many problems • Other methods: loopy propagation, search e.g. ,A*, .. • Cost function : joint likelihoood - decomposes MMDSS - P. Gallinari

  43. Discriminant models Structured Percepron (Collins 2002) Large margin methods (Tsochantaridis et al. 2004, Taskar 2004)

  44. Usual hypothesis • Joint representation of input – output Φ(x, y) • Encode potential dependencies among and between input and output • e.g. histogram of state transitions observed in training set, frequency of (xi,yj), POS tags, etc • Large feature sets (102 -> 104) • Linear score function: • Decomposability of features set (outputs) and of the loss function MMDSS - P. Gallinari

  45. Structured Perceptron (Collins 2002) • Discriminant model based on a Perceptron variant for sequence labeling • Initially proposed for POS and Chunking • Possible extension to other structured outputs tasks • Inference: Viterbi • Encodes input and output (local) dependencies • Simplicity MMDSS - P. Gallinari

  46. Algorithm • Training algorithm • Initialize : • Repeat n times over all training examples (x,y) • If update parameters MMDSS - P. Gallinari

  47. Inference : DP • Restricted to 0/ 1 cost • Also • Convergence and generalization bounds (Freund & Shapire, 99 ) • # mistakes depends only on on the margin, not on the size of output space (potential candidates) MMDSS - P. Gallinari

  48. Extension of large margin methods • 2 problems • Generalize max margin principle to other loss functions than O/1 loss • Number of constraints proportional to |Y|, i.e. potentially exponential MMDSS - P. Gallinari

  49. SVM ISO (Tsochantaridis et al. 2004) • Extension of multi-class SVMs • Principle: MMDSS - P. Gallinari

  50. SVM formulation non linearly separable case, 0/1 cost (1 slack var. per non linear constraint (Crammer – Singer 91)): MMDSS - P. Gallinari

More Related