1 / 26

Information Extraction with HMM Structures Learned by Stochastic Optimization

School of Computer Science. School of Computer Science. Information Extraction with HMM Structures Learned by Stochastic Optimization. Dayne Freitag and Andrew McCallum Presented by Tal Blum for the course: Machine Learning Approaches to Information Extraction and Information Integration.

glendat
Download Presentation

Information Extraction with HMM Structures Learned by Stochastic Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. School of Computer Science School of Computer Science Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented by Tal Blum for the course: Machine Learning Approaches to Information Extraction and Information Integration

  2. Outline • Background on HMM transition structure selection • The algorithm for the sparse IE task • Comparison between their algorithm and Borkar et al. algorithm • Discussion • Results

  3. HMMs for IE • Has been successfully used in many tasks: • Speech Recognition • Information Extraction (Biker et al.,Borkar et al.) • IE in Bioinformatics (Leek) • POS Tagging (Ratnaparkhi)

  4. Sparse Extraction task • Fields are extracted from a long document • Most of the document is irrelevant • Examples: • NE • Conference Time & Location

  5. HMM as a dynamic BN HMM as a BN S S1 S2 S3 Obs Obs1 Obs2 Obs3 t Learning HMM Structure? BN Y X Z W

  6. X1 X2 X3 X4 X1 X2 X3 X4 X3 X4 X2 X1 Constrained Transition

  7. country Zip code country Zip C2 Zip code country Zip C1 St. # street St. # street St. # street HMM Structure Learning • Unlike BN structure learning • Learn the structure of the transition Matrix A • Learn structures with different number of states

  8. HMM Structure Example

  9. Example Hierarchical HMM

  10. Why learn HMM structure? • HMMs are not specifically suited for IE tasks • Including structural bias can reduce the amount of parameters needed to learn and therefore require less data • The parameters will be more accurate • Constrain the number of times a class can appear in a document • Can represent class length more accurately • The emission probability might be multi modal • To model class left and right context of a class for the sparse IE task

  11. Fully Observed vs. Partially Observed • The structure learning is only required when the data is partially observed • Partially Observed – a field is represented by several states, where the label is the field • With fully observed data we can let the probabilities “learn” the structure • Edges that are not observed will get zero probability • Learning the transition structure involves incorporating new states • Naively allowing arbitrary transition will not generalize well

  12. The Problem • How to select the additional states and the state transition structure • Manual Selection doesn’t scale well • Human intuition do not always corresponds to the best structures

  13. The Solution • A system that automatically selects a HMM transition structure • The system starts from an initial simple model and extends it sequentially by a set of operations to search for a better model • The model quality is measured by its discrimination on validation dataset • The best model is returned • The system is comparable with human constructed HMM structures and on average outperforms them

  14. IE with HMMs • Each extracted field has its own HMM • Each HMM contains two kinds of states: • Target states • Non-Target states • All of the fields HMM are concatenated to a whole consistent HMM • The entire document is used to train the models with no need of pre-processing

  15. Parameter Estimation • Transition Probabilities Estimation is done with Maximum Likelihood • Unique path – ratio of counts • Non Unique path – use EM • Emission Probabilities require smoothing with priors • shrinkage with EM

  16. Learning State-Transition Structure • States: • Target • Prefix • Suffix • Background

  17. Model Expansion Choices • States: • Target • Prefix • Suffix • Background • Model Expansion Choices: • Lengthen a prefix • Split a prefix • Lengthen a suffix • Split a suffix • Lengthen a target string • Split a target string • Add a background state

  18. The Algorithm

  19. Discussion • Structure Learning is similar to rule learning for word or boundary classification • The search for the best structure is not comprehensive • There is no attempt to generalize better by using the same emission probabilities for different states

  20. Comparison with Bokar et. al. algorithm • Differences • Segmentation vs. • Sparse Extraction • Background and boundaries • modeling • Unique Path - don’t use EM • Backward Search vs. Forward Search • Both assume boundaries and that the position is the more relevant feature that distinguish different states

  21. Experimental Results • Tested on 8 extraction tasks over 4 datasets • Seminar Announcements (485) • Reuter Corporate Acquisition articles (600) • Job Announcements (298) • Call For Paper (363) • Training and Testing were equal size • Average performance over 10 splits

  22. Learned Structure

  23. Experimental Results • Compared to 4 other approaches • Grown HMM – the structure learned • SRV – rule learning (Freitag 1998) • Rapier – rule learning (Califf 1998) • Simple HMM • Complex HMM

  24. Experimental Results

  25. Conclusions • HMMs has been proved to be state of the art method for IE • Constraining the transition structure has a crucial effect on performance • Automatic Transition Structure learning compares and even outperforms manually crafted HMMs which require hard labor for manual construction

  26. The End!Questions?

More Related