1 / 30

Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski

Homotopy-based Semi-Supervised Hidden Markov Models for Sequence Labeling. Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university. Outline. Motivation & Contributions Experiments Homotopy method More experiments.

zyta
Download Presentation

Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Homotopy-based Semi-Supervised Hidden Markov Models for Sequence Labeling Gholamreza Haffari Anoop Sarkar Presenter: Milan Tofiloski Natural Language Lab Simon Fraser university

  2. Outline • Motivation & Contributions • Experiments • Homotopy method • More experiments

  3. Maximum Likelihood Principle • Parameter setting for the joint probability of input-output which maximizes probability of the given data: • L : labeled data • U : unlabeled data

  4. Deficiency of MLE • Usually |U| >> |L|, then • Which means the relationship of input-output is ignored when estimating the parameters ! • MLE focuses on modeling the input distribution P(x) • But we are interested in modeling the joint distribution P(x,y)

  5. Remedy for the Deficiency • Balance the effect of lab and unlab data: • Find which maximally take advantage of lab and unlab data • MLE

  6. An experiment with HMM Lower is Better MLEPerformance • MLE can hurt the performance • Balancing lab and unlab data related terms is beneficial

  7. Our Contributions • Introducing a principled way to choose  for HMM in sequence labeling (tagging) tasks • Introducing an efficient dynamic programming algorithm to compute second order statistics in HMM

  8. Outline • Motivation & Contributions • Experiments • Homotopy method • More experiments

  9. EDITOR EDITOR EDITOREDITOREDITOR EDITORTITLE A . Elmagarmid , editor . Transaction TITLE TITLE TITLE TITLE TITLE TITLE PUB Models for Advanced Database Applications , Morgan PUB PUB PUBDATE DATE - Kaufmann , 1992 . Task • Field segmentation in information extraction • 13 tag fields: AUTHOR, TITLE, …

  10. Experimental Setup • Use an HMM with 13 states • Freeze the transition (state->state) probabilities to what has been observed in the lab data • Use the Homotopy method to just learn the emission (state->alphabet) probabilities • Do add- smoothing for the initial values of emission and transition probabilities • Data statistics: • Average seq. length : 36.7 • Average number of segments in a seq: 5.4 • Size of Lab/Unlab data is 300/700

  11. Baselines • Held-out: put aside part of the lab data as a held-out set, and use it t choose  • Oracle: choose  based on test data using per position accuracy • Supervised: forgetting about unlab data, and just using lab data

  12. Homotopy vs Baselines Higher is Better • Even very small values of  can be useful. • In homotopy =.004, and in supervised  = 0 • Sequence of most probable states decoding See paper for more results

  13. Outline • Motivation & Contributions • Experiments • Homotopy method • More experiments

  14.      Discontinuity Bifurcation Path of Solutions • Look at  as  changes from 0 to 1 • Choose the best  based on the path

  15. EM() EMfor HMM • Let be a state->state or state->observation event in our HMM • To find best parameter values  which (locally) maximizes the objective function for a fixed : Repeat until convergence

  16. Fixed Points of EM • Useful fact • At the fixed points , then • This is similar to using Homotopy for root finding • Same numerical techniques should be applicable here

  17. Homotopy for Root Finding • To find a root of G() • start from a root of a simple problem F() • trace the roots of intermediate problems while morphing F to G • To find  which satisfy the above: • Set the derivative to zero: gives differential equation • Numerically solve the resulting differential eqn.

  18. M . v = 0 Solving the Differential Eqn Repeat until • Update in a proper direction parallel to v=Kernel(M) • Update M Jaccobian of EM1

  19. Challenging for HMM Forward-Backward Jaccobian of EM1 • So, we need to compute the covariance matrix of the events • The entry in the row and column of the covariance matrix is See the paper for details

  20. k2 k1 … … … … … xi xi+1 xj Expected Quadratic Counts for HMM • Dynamic programming algorithm to efficiently compute • Pre-compute a table Zx for each sequence • Having table Zx, the EQC can be computed efficiently • The time complexity is where K is the number of states in the HMM (see paper for more details)

  21. How to Choose  based on Path • monotone: the first point at which the monotonocity of  changes • MaxEnt: choose  for which the model has maximum entropy on the unlab data • minEig: when solving the diff eqn, consider the minimum singular value of the matrix M. Across rounds, choose  for which the minimum singular value is the smallest

  22. Outline • Motivation & Contributions • Experiments • Homotopy method • More experiments

  23. Varying the Size of Unlab Data Size of the labeled data: 100 • The three Homotopy-based methods outperform EM • maxEnt outperforms minEig and monotone • minEig and monotone have similar performances

  24. Picked  Values

  25. Picked  Values • EM gives higher weight to unlabeled data compared to Homotopy-based method •  selected by • maxEnt are much smaller than those selected by minEig and monotone • minEig and monotone are close

  26. Conclusion and Future Work • Using EM can hurt performance in the case |L| << |U| • Proposed a method to alleviate this problem for HMMs for seq. labeling tasks • To speed up the method • Using sampling to find approximation to covariance matrix • Using faster methods in recovering the solution path, e.g. predictor-corrector

  27. Questions?

  28. Is Oracle outperformed by Homotopy? • No! • - The performance measure used in selecting in oracle method may be different from that used in comparing homotopy and oracle • - The decoding alg used in oracle may be different from that used in comparing homotopy and oracle

  29. Why not set ? • This adhoc way of setting  has two drawbacks: • It still may hurt the performance. The proper  may be much smaller than that. • - In some situations, the right choice of  may be a big value. Setting is very conservative and dose not fully take advantage of the available unlabeled data.

  30. Our method (see the paper for more results) Higher is Better Homotopy vs Baselines • Viterbi Decoding: most probableseq of states decoding • SMS Decoding: seq of most probable states decoding

More Related