The hidden vector state language model
Sponsored Links
This presentation is the property of its rightful owner.
1 / 39

The Hidden Vector State Language Model PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

The Hidden Vector State Language Model. Vidura Senevitratne, Steve Young Cambridge University Engineering Department. Reference. Young, S. J., “The Hidden Vector State language model”, Tech. Report CUED/F-INFENG/TR.467, Cambridge University Engineering Department, 2003.

Download Presentation

The Hidden Vector State Language Model

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

The hidden vector state language model

The Hidden Vector State Language Model

Vidura Senevitratne, Steve Young

Cambridge University Engineering Department



  • Young, S. J., “The Hidden Vector State language model”, Tech. Report CUED/F-INFENG/TR.467, Cambridge University Engineering Department, 2003.

  • He, Y. and Young S.J., “Hidden Vector State Model for hierarchical semantic parsing”, In Proc. of the ICASSP, Hong Kong, 2003.

  • Fine, S., Singer Y., and Tishby N., “The Hierarchical Hidden Markov Model: Analysis and applications”, Machine Learning 32(1): 41-62, 1998.



  • Introduction

  • HVS Model

  • Experiments

  • Conclusion



  • Language model:

  • Issue of data sparseness, inability to capture long distance dependencies and model the nested structural information

  • Class-based language model

    • POS tag information

  • Structured language model

    • Syntactic information

Hierarchical hidden markov model

Hierarchical Hidden Markov Model

  • HHMM is structured multi-level stochastic process.

    • Each state is an HHMM

    • Internal state: hidden state that do not emit observable symbols directly

    • Production state: leaf state

  • States of HMM are production states of HHMM.

Hhmm cont

HHMM (cont.)

  • Parameters of HHMM:

Hhmm cont1

HHMM (cont.)

  • Transition probability: horizontal

  • Initial probability: vertical

  • Observation probability:

Hhmm cont2

HHMM (cont.)

  • Current node is root:

    • Choose child according to initial probability

  • Child is production state:

    • Produce an observation

    • Transit within the same level

    • When it reaches end-state, back to parent of end-state

  • Child is internal state:

    • Choose child

    • Wait until control is back from children

    • Transit within the same level

    • When it reaches end-state, back to parent of end-state

Hhmm cont3

HHMM (cont.)

Hhmm cont4

HHMM (cont.)

  • Other application: trend of stocks (IDEAL 2004)

Hidden vector state model

Hidden Vector State Model

Hidden vector state model cont

Hidden Vector State Model (cont.)

The semantic information relating to any single word can be stored

as a vector of semantic tag names

Hidden vector state model cont1

Hidden Vector State Model (cont.)

  • If state transitions were unconstrained

    • Fully HHMM

  • Transitions between states can be factored into a stack shift: two stage, pop, push

  • Stack size is limited, # of new concept to be pushed is limited to one

    • More efficient

Hidden vector state model cont2

Hidden Vector State Model (cont.)

  • The joint probability is defined:

Hidden vector state model cont3

Hidden Vector State Model (cont.)

  • Approximation (assumption):

  • So,

Hidden vector state model cont4

Hidden Vector State Model (cont.)

  • Generative process associated with this constrained version of HVS models consists of three step for each position t:

    1. choose a value for nt

    2. Select preterminal concept tag ct[1]

    3. Select a word wt

Hidden vector state model cont5

Hidden Vector State Model (cont.)

  • It is reasonable to ask an application designer to provide examples of utterances which would yield each type of semantic schema.

  • It is not reasonable to require utterances with manually transcribed parse trees.

  • Assume abstract semantic annotations and availability of a set of domain specific lexical classes.

Hidden vector state model cont6

Hidden Vector State Model (cont.)

Abstract semantic annotations:

  • show me flights arriving in X at T.

  • List flights arriving around T in X.

  • Which flight reaches X before T.


    Class set:

    CITY: Boston, New York, Denver…



Experimental Setup

Training set: ATIS-2, ATIS-3

Test set: ATIS-3 NOV93, DEC94

Baseline: FST (Finite Semantic Tagger)

GT for FST, Witten-Bell for HVS

Show me flights from Boston to New York


Slots: FROMLOC.CITY = Boston






Dash line: goal detection accuracy, Solid line: F-measure



  • The key features of HVS model

    • Its ability for representing hierarchical information in a constrained way

    • Its capability for training directly from target semantics without explicit word-level annotation.

Hvs language model

HVS Language Model

  • The basic HVS model is a regular HMM in which each state encodes history in a fixed dimension stack-like structure.

  • Each state consists of a stack where each element of the stack is a label chosen from a finite set of cardinality M+1: C={c1,…,cM,c#}

  • A D depth HVS model state can be characterized by a vector of dimension D with most recently pushed element at index 1 and the oldest at index D

Hvs language model cont

HVS Language Model (cont.)

Hvs language model cont1

HVS Language Model (cont.)

  • Each HVS model state transition is restricted:

    (i) exactly nt class label are popped off the stack

    (ii) exactly one new class label ct is pushed into the stack

  • The number of elements to pop nt and the choice of new class label to push ct are determined:

Hvs language model cont2

HVS Language Model (cont.)

Hvs language model cont3

HVS Language Model (cont.)

  • nt is conditioned on all the class labels that are in the stack at t-1 but ct is conditioned only on the class labels that remain on the stack after the pop operation

  • Former distribution can encode embedding, whereas the latter focuses on modeling long-range dependencies.

Hvs language model cont4

HVS Language Model (cont.)

  • Joint probability:

  • Assumption:

Hvs language model cont5

HVS Language Model (cont.)

  • Training: EM algorithm

    • C,N: latent data, W: observed data

  • E-step:

Hvs language model cont6

HVS Language Model (cont.)

  • M-Step:

    • Q function (auxiliary):

    • Substituting P(W,C,N|λ)

Hvs language model cont7

HVS Language Model (cont.)

  • Calculate probability distributions separately.

Hvs language model cont8

HVS Language Model (cont.)

  • State space S, if fully populated:

    • |S|=MD states, for M=100+, D=3 to 4

  • Due to data sparseness, backoff is needed.

Hvs language model cont9

HVS Language Model (cont.)

  • Backoff weight:

  • Modified version of absolute discounting



  • Training set:

    • ATIS-3,276K words, 23K sentences.

  • Development set:

    • ATIS -3 Nov93

  • Test set :

    • ATIS-3 Dec94, 10K words, 1K sentences.

  • OOV were removed

  • k=850

Experiments cont

Experiments (cont.)

Experiments cont1

Experiments (cont.)



  • The HVS language model is able to make better use of context than standard class n-gram models.

  • HVS model is trainable using EM.

Class tree for implementation

Class tree for implementation

Iteration number vs perplexity

Iteration number vs. perplexity

  • Login