The hidden vector state language model
Sponsored Links
This presentation is the property of its rightful owner.
1 / 39

The Hidden Vector State Language Model PowerPoint PPT Presentation


  • 46 Views
  • Uploaded on
  • Presentation posted in: General

The Hidden Vector State Language Model. Vidura Senevitratne, Steve Young Cambridge University Engineering Department. Reference. Young, S. J., “The Hidden Vector State language model”, Tech. Report CUED/F-INFENG/TR.467, Cambridge University Engineering Department, 2003.

Download Presentation

The Hidden Vector State Language Model

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


The Hidden Vector State Language Model

Vidura Senevitratne, Steve Young

Cambridge University Engineering Department


Reference

  • Young, S. J., “The Hidden Vector State language model”, Tech. Report CUED/F-INFENG/TR.467, Cambridge University Engineering Department, 2003.

  • He, Y. and Young S.J., “Hidden Vector State Model for hierarchical semantic parsing”, In Proc. of the ICASSP, Hong Kong, 2003.

  • Fine, S., Singer Y., and Tishby N., “The Hierarchical Hidden Markov Model: Analysis and applications”, Machine Learning 32(1): 41-62, 1998.


Outline

  • Introduction

  • HVS Model

  • Experiments

  • Conclusion


Introduction

  • Language model:

  • Issue of data sparseness, inability to capture long distance dependencies and model the nested structural information

  • Class-based language model

    • POS tag information

  • Structured language model

    • Syntactic information


Hierarchical Hidden Markov Model

  • HHMM is structured multi-level stochastic process.

    • Each state is an HHMM

    • Internal state: hidden state that do not emit observable symbols directly

    • Production state: leaf state

  • States of HMM are production states of HHMM.


HHMM (cont.)

  • Parameters of HHMM:


HHMM (cont.)

  • Transition probability: horizontal

  • Initial probability: vertical

  • Observation probability:


HHMM (cont.)

  • Current node is root:

    • Choose child according to initial probability

  • Child is production state:

    • Produce an observation

    • Transit within the same level

    • When it reaches end-state, back to parent of end-state

  • Child is internal state:

    • Choose child

    • Wait until control is back from children

    • Transit within the same level

    • When it reaches end-state, back to parent of end-state


HHMM (cont.)


HHMM (cont.)

  • Other application: trend of stocks (IDEAL 2004)


Hidden Vector State Model


Hidden Vector State Model (cont.)

The semantic information relating to any single word can be stored

as a vector of semantic tag names


Hidden Vector State Model (cont.)

  • If state transitions were unconstrained

    • Fully HHMM

  • Transitions between states can be factored into a stack shift: two stage, pop, push

  • Stack size is limited, # of new concept to be pushed is limited to one

    • More efficient


Hidden Vector State Model (cont.)

  • The joint probability is defined:


Hidden Vector State Model (cont.)

  • Approximation (assumption):

  • So,


Hidden Vector State Model (cont.)

  • Generative process associated with this constrained version of HVS models consists of three step for each position t:

    1. choose a value for nt

    2. Select preterminal concept tag ct[1]

    3. Select a word wt


Hidden Vector State Model (cont.)

  • It is reasonable to ask an application designer to provide examples of utterances which would yield each type of semantic schema.

  • It is not reasonable to require utterances with manually transcribed parse trees.

  • Assume abstract semantic annotations and availability of a set of domain specific lexical classes.


Hidden Vector State Model (cont.)

Abstract semantic annotations:

  • show me flights arriving in X at T.

  • List flights arriving around T in X.

  • Which flight reaches X before T.

    = FLIGHT(TOLOC(CITY(X),TIME_RELATIVE(TIME(T))))

    Class set:

    CITY: Boston, New York, Denver…


Experiments

Experimental Setup

Training set: ATIS-2, ATIS-3

Test set: ATIS-3 NOV93, DEC94

Baseline: FST (Finite Semantic Tagger)

GT for FST, Witten-Bell for HVS

Show me flights from Boston to New York

Goal: FLIGHT

Slots: FROMLOC.CITY = Boston

TOLOC.CITY = New York


Experiments


Experiments

Dash line: goal detection accuracy, Solid line: F-measure


Conclusion

  • The key features of HVS model

    • Its ability for representing hierarchical information in a constrained way

    • Its capability for training directly from target semantics without explicit word-level annotation.


HVS Language Model

  • The basic HVS model is a regular HMM in which each state encodes history in a fixed dimension stack-like structure.

  • Each state consists of a stack where each element of the stack is a label chosen from a finite set of cardinality M+1: C={c1,…,cM,c#}

  • A D depth HVS model state can be characterized by a vector of dimension D with most recently pushed element at index 1 and the oldest at index D


HVS Language Model (cont.)


HVS Language Model (cont.)

  • Each HVS model state transition is restricted:

    (i) exactly nt class label are popped off the stack

    (ii) exactly one new class label ct is pushed into the stack

  • The number of elements to pop nt and the choice of new class label to push ct are determined:


HVS Language Model (cont.)


HVS Language Model (cont.)

  • nt is conditioned on all the class labels that are in the stack at t-1 but ct is conditioned only on the class labels that remain on the stack after the pop operation

  • Former distribution can encode embedding, whereas the latter focuses on modeling long-range dependencies.


HVS Language Model (cont.)

  • Joint probability:

  • Assumption:


HVS Language Model (cont.)

  • Training: EM algorithm

    • C,N: latent data, W: observed data

  • E-step:


HVS Language Model (cont.)

  • M-Step:

    • Q function (auxiliary):

    • Substituting P(W,C,N|λ)


HVS Language Model (cont.)

  • Calculate probability distributions separately.


HVS Language Model (cont.)

  • State space S, if fully populated:

    • |S|=MD states, for M=100+, D=3 to 4

  • Due to data sparseness, backoff is needed.


HVS Language Model (cont.)

  • Backoff weight:

  • Modified version of absolute discounting


Experiments

  • Training set:

    • ATIS-3,276K words, 23K sentences.

  • Development set:

    • ATIS -3 Nov93

  • Test set :

    • ATIS-3 Dec94, 10K words, 1K sentences.

  • OOV were removed

  • k=850


Experiments (cont.)


Experiments (cont.)


Conclusion

  • The HVS language model is able to make better use of context than standard class n-gram models.

  • HVS model is trainable using EM.


Class tree for implementation


Iteration number vs. perplexity


  • Login