1 / 21

Hidden Markov Models for Information Extraction

Hidden Markov Models for Information Extraction. Recent Results and Current Projects Joseph Smarr & Huy Nguyen Advisor: Chris Manning. HMM Approach to IE. HMM states are associated with a semantic type background-text , person-name , etc. Constrained EM learns transitions and emissions

neena
Download Presentation

Hidden Markov Models for Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Modelsfor Information Extraction Recent Results and Current Projects Joseph Smarr & Huy Nguyen Advisor: Chris Manning

  2. HMM Approach to IE • HMM states are associated with a semantic type • background-text, person-name, etc. • Constrained EM learns transitions and emissions • Viterbi alignment of a document marks tagged ranges of text with the same semantic type Extract range with highest probability 2 3 4 5 6 2 SpeakerisHuy Nguyenthis week

  3. Existing Work • Leek (97 [UCSD MS thesis]) • Early results, fixed structures • Freitag & McCallum (99, 00) • Grow complex structures

  4. Limitations of Existing Work • Only one field extracted at a time • Relative position of fields is ignored • e.g. authors usually come before titles in citations • Similar-looking fields aren’t competed for • e.g. acquired company vs. purchasing company • Simple model of unknown words • Use <UNK> for all words seen less than N times • No separation of content and context • e.g. can’t plug in generic date extractors, etc.

  5. Current Research Goals • Flexibly train and combine extractors for multiple fields of information • Learn structures suited for individual fields • Can be recombined and reused with many HMMs • Learn intelligent context structures to link targets • Canonical ordering of fields • Common prefixes and suffixes • Construct merged HMM for actual extraction • Context/target split makes search problem tractable • Transitions between models are compiled out in merge

  6. Current Research Goals • Richer models for handling unknown words • Estimate likelihood of novel words in each state • Featural decomposition for finer-grained probs • e.g. Nguyen  UNK[Capitalized, No-numbers] • Character-level models for higher precision • e.g. phone numbers, room numbers, dates, etc. • Conditional training to focus on extraction task • Classical joint estimation often wastes states modeling patterns in English background text • Conditional training is slower, but only rewards structure that increases labeling accuracy

  7. Learning Target Structures • Goal: Learn flexible structure tailored to composition of particular fields • Representation: Disjunction of multi-state chains • Learning method: • Collect and isolate all examples of the target field • Initialization: single state • Search operators (greedy search): • extend current chain(s) • Start a new chain • Stopping criteria: MDL score

  8. Example Target HMM: dlramt mln billion U.S. Canadian dlrs dollars yen pesos 13.5 240 100 START END undisclosed withheld amount

  9. Learning Context Structures • Goal: Learn structure to connect multiple target HMMs • Captures canonical ordering of fields • Identifies prefix and suffix patterns around targets • Initialization: • Background state connected to each target • Find minimum # words between each target type in corpus • Connect targets directly if distance is 0 • Add context state between targets if they’re close • Search operators (greedy search): • Add prefix/suffix between background and target • Lengthen an existing chain • Start a new chain (by splitting an existing one) • Stopping criteria: MDL score

  10. Example of Context HMM The yesterday Reuters Background START END Purchaser Context Acquired purchased acquired bought

  11. Merging Context and Targets • In context HMM, targets are collapsed into a single state that always emits “purchaser” etc. • Target HMMs have single START and END state • Glue target HMMs into place by “compiling out” start/end transitions and creating one big HMM • Challenge: create supportive structure without being overly restrictive • Too little structure  hard to find regularities • Too much structure  can’t generate all docs

  12. Background START END START END Purchaser Context Acquired Background START END Context Acquired Example of Merging HMMs

  13. Tricks and Optimizations • Mandatory end state • Allows explicit modeling of document end • Structural enhancements • Add transitions from start directly to targets • Add transitions from target/suffix directly to end • Allow “skip-ahead” transitions • Separation of core structure learning • Structure learning is performed on “skeleton” structure • Enhancements are added during parameter estimation • Keeps search tractable while exploiting rich transitions

  14. Sample of Recent F1 Results

  15. Unknown Word Results

  16. Conditional Training • Observation: Joint HMMs waste states modeling patterns in background text • Improves document likelihood (like n-grams) • Doesn’t improve labeling accuracy (can hurt it!) • Ideally focus on prefixes, suffixes, etc. only • Idea: Maximize conditional probability of labels P(labels|words) instead of P(labels, words) • Should only reward modeling helpful patterns • Can’t use standard Baum-Welch training • Solution: use numerical optimization (CG)

  17. b a|b|c a|o c|e o e T T Potential of Conditional Training • Don’t waste states modeling background patterns • Toy data model: ((abc)*(eTo))* [T is target] • e.g. abcabcabcabceToabcabceToabcabcabc • Modeling abc improves joint likelihood but provides no help for labeling targets Optimal Joint Model Optimal Labeling Model

  18. Running Conditional Training • Gradient descent requires differentiable function • Value: • Deriv: • Likelihood and expectations are easily computed with existing HMM algorithms • Compute values with and without type constraints Forward algorithm Param expectations

  19. Challenges for Cond. Training • Need additional constraint to keep numbers small • Can’t guarantee you’ll get a probability distribution • But it’s ok if you’re just summing and multiplying! • Solution: sum of all params must equal a constant • Need to fix parameter space ahead of time • Can’t add states, new words, etc. • Solution: start with large ergodic model in which all states emit entire vocabulary (use UNK tokens) • Need sensible initialization • Uniform structure has high variance • Fixed structure usually dictates training

  20. Results on Toy Data Set • Results on (([ae][bt][co])*(eto))* • Contains spurious prefix/target/suffix-like symbols • Joint training always labels every t • Conditional training eventually gets it perfectly

  21. Current and Future Work • Richer search operators for structure learning • Richer models of unknown words (char-level) • Reduce variance of conditional training • Build reusable repository of target HMMs • Integrate with larger IE framework(s) • Semantic Web / KAON • LTG • Applications • Semi-automatic ontology markup for web pages • Smart email processing

More Related