1 / 20

Hierarchical Markov Network

Hierarchical Markov Network. ACKNOWLEDGMENT: Daniel Rubin, Dima Vainbrand, Ronny Ronen, Ohad Falik, Zev Rivlin, Mike Deisher, Shai Fine, Shie Mannor. ICRI-CI Retreat 2013 May 9, 2013 Boris Ginzburg. Summary.

abeni
Download Presentation

Hierarchical Markov Network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hierarchical Markov Network ACKNOWLEDGMENT: Daniel Rubin, Dima Vainbrand, Ronny Ronen, Ohad Falik, Zev Rivlin, Mike Deisher, Shai Fine, Shie Mannor ICRI-CI Retreat 2013 May 9, 2013 Boris Ginzburg ICRI - Computational Intelligence

  2. Summary • Hierarchical Hidden Markov Model (H-HMM):A known statistical model for complex temporal pattern recognition (Fine, Singer, Tishby-1998) • Hierarchical Markov Network (HMN): • Compact and computationally efficient extension of H-HMM based on merging of identical sub-models • A new efficient Viterbi algorithm: sharing of computations for sub-model by its “parents” ICRI - Computational Intelligence

  3. background Hidden Markov Model • Hidden Markov Model (HMM) is among the leading tools used for temporal pattern recognition. • Used in: • Speech recognition • Handwriting • Language processing • Gesture recognition • Bioinformatics • Machine translation • Speech synthesis ICRI - Computational Intelligence

  4. background Hidden Markov Model The state of model is hidden from observer. 20% 30% 3 3 2 5 1 80% 4 1 30% 2 4 40% 2.7 3.4 1.2 4.9 ICRI - Computational Intelligence HMM is stochastic FSM described by Markov model: α(i,j)≔ Prob( q(t+1)=j | q(t)=i) Initial probability: π(i)≔Prob (q(1) = i); State [i] can emit symbol [o] with β(i,o):= Prob ( o | q(t) = i);

  5. background HMM: Viterbi Algorithm Problem:Given the observation sequence: O=( o(1),…o(T) ), what is the most probable state sequence: Q=(q(1),…q(T))? Solution: Forward-Backward (Baum-Welch) algorithm. For each state [x] and time t: Given that system was in [x]at time t, let’s define δ(x,t) - the likelihood of the most probable state sequence,whichcouldgenerate observation ( o(1),…,o(t) ): δ(x,t)≔ max P(q(1),…,q(t-1) | q(t)=x, o(1),…o(t)). We willuselog-likelihood:S(x, t):= -log (δ(x,t)). The S(x,t) iscalledtoken at state [x] at moment t ICRI - Computational Intelligence

  6. background HMM: Viterbi Algorithm Initialization (t=1): S(x,1) = p(x)+ b(x,o(1)); ψ(x,1) = 0; Induction (t  t+1): S(y,t+1) = min[ S(x,t) + a(x,y) + b(y,o(t+1)) ]; ψ(y,t+1) = argmin[ S(y,t+1) ]; Termination (t=T): Smin = min[ S(x,T) ]; q(T) = argmin[ S(x,T ) ]; Backward procedure, path recovery (T1): q(t) = ψ(q(t+1),t+1); Forward-backward algorithm (Baum-Welch) is based on principle of dynamic programming (Viterbi) ICRI - Computational Intelligence

  7. background Hierarchy of HMMs Example. Speech recognition - multi-layer hierarchy of HMMs ICRI - Computational Intelligence

  8. background Hierarchical HMM • H-HMM replaces the complex hierarchy of simple HMMs, with one unified model [Fine, Singer, Tishby-1998]. • H-HMM - hierarchical FSM with two types of states: • “Complex” state - state which is itself HMM • “Production” state - simple state on lowest level of hierarchy, which produces observed symbol • Efficient Viterbi algorithm on HHMM with complexity O(T*N2 ), where • T – sequence duration • N – number of states in H-HMM • [K. Murphy & Paskin-2001] and [Wakabayashi & Miura - 2012). ICRI - Computational Intelligence

  9. problem Scalability Problem PROBLEM: structural & computational redundancy, both for “Hierarchy of HMMs” and for H-HMM • Example: dictionary ={speech, beach, peach}. • Use 3-state HMM for phoneme model • 10 instances of HMMs • only 5 different HMM templates ICRI - Computational Intelligence

  10. solution Hidden Markov Network HMN is based on “call-return” semantics: parent node calls sub-HMM, which computes the score of subsequence and returns result to parent node. Hierarchical Markov Network: Compact representation of H-HHM, where each sub-model is embedded once and serves multiple “parents” ICRI - Computational Intelligence

  11. solution HMN: Viterbi algorithm The key observation: Viterbi computations inside identical HMMs, are almost the same. Consider e.g. H-HMM for “beach” and “peach”: ICRI - Computational Intelligence

  12. solution HMN: Viterbi algorithm Token S(.) in sub-HMM “i“ in the word ‘beach”, is based on score of previous phoneme “b”: S(beach.i.x1,t) = [S(beach.b1, t-1) + a(beach.b1, beach.i0)]+ [a(beach.i0, beach.i.x1) + b(beach.i.x1,o(t)) ]; For “peach” S(.) will be based on the score from “p”: S(peach.i.x1,t) = [S(peach.p1, t-1) + a(peach.p1, peach.i0)] + [a(peach.i0, peach.i.x1) + b(peach.i.x1,o(t)) ]; Two last terms in both expressions are equal: a(beach.i0, beach.i.x1) + b(beach.i.x1,o(t)) = a(peach.i0, peach.i.x1) + b(peach.i.x1,o(t)) We can do the computation once, and use it for both words. ICRI - Computational Intelligence

  13. solution HMN: Viterbi algorithm One sub-HMM can serve multiple parents: computes the score of sub-sequence, and returns it to parents ICRI - Computational Intelligence

  14. solution HMN: call-return • Child HMM: • serves multiple calls from multiple nodes. Child maintains list of received calls. • all calls, received at the same moment, are merged and computed together; child keeps list of “return address” • multiple tokens can be generated by one call (all marked by time when call was started) • when token reaches ‘end” state, the score is sent to parent • Parent node: • Maintains list of open calls and prefix scores • Add prefix-score to the score received from child ICRI - Computational Intelligence

  15. solution HMN: call-return ICRI - Computational Intelligence

  16. solution HMN: Temporal hierarchy Inspired by Tali Tishby’ talk yesterday How to support multiple temporal scales on different level of hierarchy? Possible directions: • Exponential increase time scale by each level of hierarchy ∆d = 2* ∆d+1 • One call cover a number of time overlapping sub-sequences, child selects sequence with best score S(x, td) = min (S(x,td+1),S(x,t d+1+1) ICRI - Computational Intelligence

  17. solution HMN: Temporal hierarchy ICRI - Computational Intelligence

  18. solution HMN: performance • HMN has potential performance benefits over HMM/H-HMM if cost of HMM > cost of “call-return”, • Cost of HMM is ~ number of arcs. • Cost of call / return is fixed, depends only on number of return tokens / call; does not depend on size of HMM. Back-of-the envelope: • Cost of Viterbi on 5-state HMM ~ 10 MACs; • Cost of one return token–1 MAC; Additional HMN cost - increased complexity: • book-keeping in HMM, complex parent node structure… ICRI - Computational Intelligence

  19. Next steps Next Steps • A promising idea, need to establish its efficiency beyond plain “back-of-the-envelop”; prototype to check performance claims • Extend HMN theory to support multiple temporal scales on different levels of hierarchy • Connection between Convolutional Neural Network and HMN ICRI - Computational Intelligence

  20. BACKUP ICRI - Computational Intelligence

More Related