Hierarchical Hidden Vector State Language Model: A Novel Approach for Language Processing

The Hidden Vector State Language Model Vidura Senevitratne, Steve Young Cambridge University Engineering Department

Reference • Young, S. J., “The Hidden Vector State language model”, Tech. Report CUED/F-INFENG/TR.467, Cambridge University Engineering Department, 2003. • He, Y. and Young S.J., “Hidden Vector State Model for hierarchical semantic parsing”, In Proc. of the ICASSP, Hong Kong, 2003. • Fine, S., Singer Y., and Tishby N., “The Hierarchical Hidden Markov Model: Analysis and applications”, Machine Learning 32(1): 41-62, 1998.

Outline • Introduction • HVS Model • Experiments • Conclusion

Introduction • Language model: • Issue of data sparseness, inability to capture long distance dependencies and model the nested structural information • Class-based language model • POS tag information • Structured language model • Syntactic information

Hierarchical Hidden Markov Model • HHMM is structured multi-level stochastic process. • Each state is an HHMM • Internal state: hidden state that do not emit observable symbols directly • Production state: leaf state • States of HMM are production states of HHMM.

HHMM (cont.) • Parameters of HHMM:

HHMM (cont.) • Transition probability: horizontal • Initial probability: vertical • Observation probability:

HHMM (cont.) • Current node is root: • Choose child according to initial probability • Child is production state: • Produce an observation • Transit within the same level • When it reaches end-state, back to parent of end-state • Child is internal state: • Choose child • Wait until control is back from children • Transit within the same level • When it reaches end-state, back to parent of end-state

HHMM (cont.)

HHMM (cont.) • Other application: trend of stocks (IDEAL 2004)

Hidden Vector State Model

Hidden Vector State Model (cont.) The semantic information relating to any single word can be stored as a vector of semantic tag names

Hidden Vector State Model (cont.) • If state transitions were unconstrained • Fully HHMM • Transitions between states can be factored into a stack shift: two stage, pop, push • Stack size is limited, # of new concept to be pushed is limited to one • More efficient

Hidden Vector State Model (cont.) • The joint probability is defined:

Hidden Vector State Model (cont.) • Approximation (assumption): • So,

Hidden Vector State Model (cont.) • Generative process associated with this constrained version of HVS models consists of three step for each position t: 1. choose a value for nt 2. Select preterminal concept tag ct[1] 3. Select a word wt

Hidden Vector State Model (cont.) • It is reasonable to ask an application designer to provide examples of utterances which would yield each type of semantic schema. • It is not reasonable to require utterances with manually transcribed parse trees. • Assume abstract semantic annotations and availability of a set of domain specific lexical classes.

Hidden Vector State Model (cont.) Abstract semantic annotations: • show me flights arriving in X at T. • List flights arriving around T in X. • Which flight reaches X before T. = FLIGHT(TOLOC(CITY(X),TIME_RELATIVE(TIME(T)))) Class set: CITY: Boston, New York, Denver…

Experiments Experimental Setup Training set: ATIS-2, ATIS-3 Test set: ATIS-3 NOV93, DEC94 Baseline: FST (Finite Semantic Tagger) GT for FST, Witten-Bell for HVS Show me flights from Boston to New York Goal: FLIGHT Slots: FROMLOC.CITY = Boston TOLOC.CITY = New York

Experiments

Experiments Dash line: goal detection accuracy, Solid line: F-measure

Conclusion • The key features of HVS model • Its ability for representing hierarchical information in a constrained way • Its capability for training directly from target semantics without explicit word-level annotation.

HVS Language Model • The basic HVS model is a regular HMM in which each state encodes history in a fixed dimension stack-like structure. • Each state consists of a stack where each element of the stack is a label chosen from a finite set of cardinality M+1: C={c1,…,cM,c#} • A D depth HVS model state can be characterized by a vector of dimension D with most recently pushed element at index 1 and the oldest at index D

HVS Language Model (cont.)

HVS Language Model (cont.) • Each HVS model state transition is restricted: (i) exactly nt class label are popped off the stack (ii) exactly one new class label ct is pushed into the stack • The number of elements to pop nt and the choice of new class label to push ct are determined:

HVS Language Model (cont.)

HVS Language Model (cont.) • nt is conditioned on all the class labels that are in the stack at t-1 but ct is conditioned only on the class labels that remain on the stack after the pop operation • Former distribution can encode embedding, whereas the latter focuses on modeling long-range dependencies.

HVS Language Model (cont.) • Joint probability: • Assumption:

HVS Language Model (cont.) • Training: EM algorithm • C,N: latent data, W: observed data • E-step:

HVS Language Model (cont.) • M-Step: • Q function (auxiliary): • Substituting P(W,C,N|λ)

HVS Language Model (cont.) • Calculate probability distributions separately.

HVS Language Model (cont.) • State space S, if fully populated: • |S|=MD states, for M=100+, D=3 to 4 • Due to data sparseness, backoff is needed.

HVS Language Model (cont.) • Backoff weight: • Modified version of absolute discounting

Experiments • Training set: • ATIS-3,276K words, 23K sentences. • Development set: • ATIS -3 Nov93 • Test set : • ATIS-3 Dec94, 10K words, 1K sentences. • OOV were removed • k=850

Experiments (cont.)

Conclusion • The HVS language model is able to make better use of context than standard class n-gram models. • HVS model is trainable using EM.

Class tree for implementation

Iteration number vs. perplexity

Hierarchical Hidden Vector State Language Model: A Novel Approach for Language Processing

Hierarchical Hidden Vector State Language Model: A Novel Approach for Language Processing

Presentation Transcript

Hidden Markov Model

Generalized Vector Model

Hidden Markov Model

Hidden Markov Model

The Vector Space Model

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model

The Vector Space Model

Hidden Markov model

Hidden Markov Model

Vector Space Model

Vector Space Model

Vector Data Model

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model