The hidden vector state language model
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

The Hidden Vector State Language Model PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

The Hidden Vector State Language Model. Vidura Senevitratne, Steve Young Cambridge University Engineering Department. Reference. Young, S. J., “The Hidden Vector State language model”, Tech. Report CUED/F-INFENG/TR.467, Cambridge University Engineering Department, 2003.

Download Presentation

The Hidden Vector State Language Model

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

The Hidden Vector State Language Model

Vidura Senevitratne, Steve Young

Cambridge University Engineering Department


  • Young, S. J., “The Hidden Vector State language model”, Tech. Report CUED/F-INFENG/TR.467, Cambridge University Engineering Department, 2003.

  • He, Y. and Young S.J., “Hidden Vector State Model for hierarchical semantic parsing”, In Proc. of the ICASSP, Hong Kong, 2003.

  • Fine, S., Singer Y., and Tishby N., “The Hierarchical Hidden Markov Model: Analysis and applications”, Machine Learning 32(1): 41-62, 1998.


  • Introduction

  • HVS Model

  • Experiments

  • Conclusion


  • Language model:

  • Issue of data sparseness, inability to capture long distance dependencies and model the nested structural information

  • Class-based language model

    • POS tag information

  • Structured language model

    • Syntactic information

Hierarchical Hidden Markov Model

  • HHMM is structured multi-level stochastic process.

    • Each state is an HHMM

    • Internal state: hidden state that do not emit observable symbols directly

    • Production state: leaf state

  • States of HMM are production states of HHMM.

HHMM (cont.)

  • Parameters of HHMM:

HHMM (cont.)

  • Transition probability: horizontal

  • Initial probability: vertical

  • Observation probability:

HHMM (cont.)

  • Current node is root:

    • Choose child according to initial probability

  • Child is production state:

    • Produce an observation

    • Transit within the same level

    • When it reaches end-state, back to parent of end-state

  • Child is internal state:

    • Choose child

    • Wait until control is back from children

    • Transit within the same level

    • When it reaches end-state, back to parent of end-state

HHMM (cont.)

HHMM (cont.)

  • Other application: trend of stocks (IDEAL 2004)

Hidden Vector State Model

Hidden Vector State Model (cont.)

The semantic information relating to any single word can be stored

as a vector of semantic tag names

Hidden Vector State Model (cont.)

  • If state transitions were unconstrained

    • Fully HHMM

  • Transitions between states can be factored into a stack shift: two stage, pop, push

  • Stack size is limited, # of new concept to be pushed is limited to one

    • More efficient

Hidden Vector State Model (cont.)

  • The joint probability is defined:

Hidden Vector State Model (cont.)

  • Approximation (assumption):

  • So,

Hidden Vector State Model (cont.)

  • Generative process associated with this constrained version of HVS models consists of three step for each position t:

    1. choose a value for nt

    2. Select preterminal concept tag ct[1]

    3. Select a word wt

Hidden Vector State Model (cont.)

  • It is reasonable to ask an application designer to provide examples of utterances which would yield each type of semantic schema.

  • It is not reasonable to require utterances with manually transcribed parse trees.

  • Assume abstract semantic annotations and availability of a set of domain specific lexical classes.

Hidden Vector State Model (cont.)

Abstract semantic annotations:

  • show me flights arriving in X at T.

  • List flights arriving around T in X.

  • Which flight reaches X before T.


    Class set:

    CITY: Boston, New York, Denver…


Experimental Setup

Training set: ATIS-2, ATIS-3

Test set: ATIS-3 NOV93, DEC94

Baseline: FST (Finite Semantic Tagger)

GT for FST, Witten-Bell for HVS

Show me flights from Boston to New York


Slots: FROMLOC.CITY = Boston




Dash line: goal detection accuracy, Solid line: F-measure


  • The key features of HVS model

    • Its ability for representing hierarchical information in a constrained way

    • Its capability for training directly from target semantics without explicit word-level annotation.

HVS Language Model

  • The basic HVS model is a regular HMM in which each state encodes history in a fixed dimension stack-like structure.

  • Each state consists of a stack where each element of the stack is a label chosen from a finite set of cardinality M+1: C={c1,…,cM,c#}

  • A D depth HVS model state can be characterized by a vector of dimension D with most recently pushed element at index 1 and the oldest at index D

HVS Language Model (cont.)

HVS Language Model (cont.)

  • Each HVS model state transition is restricted:

    (i) exactly nt class label are popped off the stack

    (ii) exactly one new class label ct is pushed into the stack

  • The number of elements to pop nt and the choice of new class label to push ct are determined:

HVS Language Model (cont.)

HVS Language Model (cont.)

  • nt is conditioned on all the class labels that are in the stack at t-1 but ct is conditioned only on the class labels that remain on the stack after the pop operation

  • Former distribution can encode embedding, whereas the latter focuses on modeling long-range dependencies.

HVS Language Model (cont.)

  • Joint probability:

  • Assumption:

HVS Language Model (cont.)

  • Training: EM algorithm

    • C,N: latent data, W: observed data

  • E-step:

HVS Language Model (cont.)

  • M-Step:

    • Q function (auxiliary):

    • Substituting P(W,C,N|λ)

HVS Language Model (cont.)

  • Calculate probability distributions separately.

HVS Language Model (cont.)

  • State space S, if fully populated:

    • |S|=MD states, for M=100+, D=3 to 4

  • Due to data sparseness, backoff is needed.

HVS Language Model (cont.)

  • Backoff weight:

  • Modified version of absolute discounting


  • Training set:

    • ATIS-3,276K words, 23K sentences.

  • Development set:

    • ATIS -3 Nov93

  • Test set :

    • ATIS-3 Dec94, 10K words, 1K sentences.

  • OOV were removed

  • k=850

Experiments (cont.)

Experiments (cont.)


  • The HVS language model is able to make better use of context than standard class n-gram models.

  • HVS model is trainable using EM.

Class tree for implementation

Iteration number vs. perplexity

  • Login