1 / 27

Reduction of Maximum Entropy Models to Hidden Markov Models

Introduction. Hidden Markov Models Digression (why HMMs aren't Bayes Nets)What are maxent models (=logistic regression = probabilistic perceptron)Why maxent models are HMMsWhy lots of other things are HMMs (hidden variable logistic regression, maximum entropy markov models, conditional random f

zasha
Download Presentation

Reduction of Maximum Entropy Models to Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Reduction of Maximum Entropy Models to Hidden Markov Models Joshua Goodman Machine Learning and Applied Statistics Microsoft Research

    2. Introduction Hidden Markov Models Digression (why HMMs arent Bayes Nets) What are maxent models (=logistic regression = probabilistic perceptron) Why maxent models are HMMs Why lots of other things are HMMs (hidden variable logistic regression, maximum entropy markov models, conditional random fields, maxent with continuous outputs, etc.) Some quick experiments (new models work better) Conclusion: Unifies several model types

    3. HMM Review Pinball machine example

    4. HMMs are Bayes Nets (so far)

    5. HMMs are not Bayes Nets This is not another X is a Bayes net talk HMMs are a different tool Full HMM allows non-emitting ? transitions. Dont output anything or advance the clock Example: we dont care about bumpers

    6. Pinball machine

    7. Hard to map ?s to Bayes net. Mapping epsilons to Bayes nets requires an infinite number of states, even for a finite output.

    8. Removal of epsilon arcs

    9. Reduction of Maxent to HMMs What maxent models are How to make an HMM for a particular maxent model Start simple, leave out arcs, add stuff in a piece at a time.

    10. Why Maxent Models Lot of applications: Sentence breaking, language modeling, prepositional phrase attachment, part of speech tagging, parsing, grammar checking, word sense disambiguation, named entity recognition, finding noun phrases, pronoun resolution, lots more in other fields Very good at combining information from different sources. Nice mathematical properties Preserve marginals, convex space (global optimum), maximum likelihood.

    11. Maximum Entropy Models Same as logistic regression Same as perceptron (single layer neural net) trained to minimize entropy (log loss). Consider trying to find P(y|x1x2xn) Well use multiplicative form of equations

    12. Maxent Formula with an HMM

    13. Maxent Formula with an HMM

    14. How to make this with an HMM

    15. Learning

    16. Multiple training instances 4 training instances R,111 L,001 L,010 R,101

    17. Why lots of other things are HMMs Maxent Models with Continuous outputs Maxent Models with hidden variables Lots of pictures that go by really quick Cloud notation for simplicity

    18. Continuous Outputs Can train HMMs for either discrete or continuous outputs. Leads immediately to continuous maxent training.

    19. Maxent style Models with Hidden Variables Hidden variable has value N or P Non-emitting transition to maxent models with features dependent on value of hidden variable Automatically learn with the forward-backward algorithm

    20. Hidden Variables depends on Maxent Model Hidden variable has value N or P Value of hidden variable depends on maxent model Again, automatically learn with the forward-backward algorithm

    21. Maximum Entropy Markov Model (MEMM)

    22. Conditional Random Fields (CRF)

    23. Experimental Results: Subject Verb Agreement This is a hard problem for conventional learners Need to first locate the subject; we assume no labeled training data Actual task: given context, determine if word is is or are Hidden variable maxent model is ideal Treat subject, and whether singular or plural as hidden variable

    24. Maxent Model for Subject/Verb

    25. Subject Verb Results

    26. Conclusions Hidden Markov Models can show connections between large number of models Hidden var, continuous, MEMMs, CRFs, more More natural for these problems than graphical models Graphical models require an infinite # of states for these problems Useful for at least one problem Future: unify HMM with ? + graphical models? See my semiring parsing paper and work by Pfeffer, Koller, etc.

    27. Actual geometry HMM vars are 0<?<1 Maxent vars are 0< ?<? Introduce more vars, ? Ratio between ?,? leads to full range needed values

More Related