Information Extraction with HMM Structures Learned by Stochastic Optimization

School of Computer Science School of Computer Science Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented by Tal Blum for the course: Machine Learning Approaches to Information Extraction and Information Integration

Outline • Background on HMM transition structure selection • The algorithm for the sparse IE task • Comparison between their algorithm and Borkar et al. algorithm • Discussion • Results

HMMs for IE • Has been successfully used in many tasks: • Speech Recognition • Information Extraction (Biker et al.,Borkar et al.) • IE in Bioinformatics (Leek) • POS Tagging (Ratnaparkhi)

Sparse Extraction task • Fields are extracted from a long document • Most of the document is irrelevant • Examples: • NE • Conference Time & Location

HMM as a dynamic BN HMM as a BN S S1 S2 S3 Obs Obs1 Obs2 Obs3 t Learning HMM Structure? BN Y X Z W

X1 X2 X3 X4 X1 X2 X3 X4 X3 X4 X2 X1 Constrained Transition

country Zip code country Zip C2 Zip code country Zip C1 St. # street St. # street St. # street HMM Structure Learning • Unlike BN structure learning • Learn the structure of the transition Matrix A • Learn structures with different number of states

HMM Structure Example

Example Hierarchical HMM

Why learn HMM structure? • HMMs are not specifically suited for IE tasks • Including structural bias can reduce the amount of parameters needed to learn and therefore require less data • The parameters will be more accurate • Constrain the number of times a class can appear in a document • Can represent class length more accurately • The emission probability might be multi modal • To model class left and right context of a class for the sparse IE task

Fully Observed vs. Partially Observed • The structure learning is only required when the data is partially observed • Partially Observed – a field is represented by several states, where the label is the field • With fully observed data we can let the probabilities “learn” the structure • Edges that are not observed will get zero probability • Learning the transition structure involves incorporating new states • Naively allowing arbitrary transition will not generalize well

The Problem • How to select the additional states and the state transition structure • Manual Selection doesn’t scale well • Human intuition do not always corresponds to the best structures

The Solution • A system that automatically selects a HMM transition structure • The system starts from an initial simple model and extends it sequentially by a set of operations to search for a better model • The model quality is measured by its discrimination on validation dataset • The best model is returned • The system is comparable with human constructed HMM structures and on average outperforms them

IE with HMMs • Each extracted field has its own HMM • Each HMM contains two kinds of states: • Target states • Non-Target states • All of the fields HMM are concatenated to a whole consistent HMM • The entire document is used to train the models with no need of pre-processing

Parameter Estimation • Transition Probabilities Estimation is done with Maximum Likelihood • Unique path – ratio of counts • Non Unique path – use EM • Emission Probabilities require smoothing with priors • shrinkage with EM

Learning State-Transition Structure • States: • Target • Prefix • Suffix • Background

Model Expansion Choices • States: • Target • Prefix • Suffix • Background • Model Expansion Choices: • Lengthen a prefix • Split a prefix • Lengthen a suffix • Split a suffix • Lengthen a target string • Split a target string • Add a background state

The Algorithm

Discussion • Structure Learning is similar to rule learning for word or boundary classification • The search for the best structure is not comprehensive • There is no attempt to generalize better by using the same emission probabilities for different states

Comparison with Bokar et. al. algorithm • Differences • Segmentation vs. • Sparse Extraction • Background and boundaries • modeling • Unique Path - don’t use EM • Backward Search vs. Forward Search • Both assume boundaries and that the position is the more relevant feature that distinguish different states

Experimental Results • Tested on 8 extraction tasks over 4 datasets • Seminar Announcements (485) • Reuter Corporate Acquisition articles (600) • Job Announcements (298) • Call For Paper (363) • Training and Testing were equal size • Average performance over 10 splits

Learned Structure

Experimental Results • Compared to 4 other approaches • Grown HMM – the structure learned • SRV – rule learning (Freitag 1998) • Rapier – rule learning (Califf 1998) • Simple HMM • Complex HMM

Experimental Results

Conclusions • HMMs has been proved to be state of the art method for IE • Constraining the transition structure has a crucial effect on performance • Automatic Transition Structure learning compares and even outperforms manually crafted HMMs which require hard labor for manual construction

The End!Questions?

Information Extraction with HMM Structures Learned by Stochastic Optimization

Information Extraction with HMM Structures Learned by Stochastic Optimization

Presentation Transcript

Stochastic Optimization ESI 6912

Information Extraction

Information Extraction

Information Extraction

Information Extraction

information extraction

Information Extraction

Budgeted Optimization with Concurrent Stochastic-Duration Experiments

Information Extraction

Information Extraction

Information Extraction

Information Extraction

Information Extraction

Information Extraction

Information Extraction

Information Extraction

Stochastic Optimization with Learning For Complex Problems

Information Extraction

SPDE-Constrained Optimization With Stochastic Collocation

SPDE-Constrained Optimization With Stochastic Collocation

Information Extraction with Unlabeled Data