the relative entropy rate of two hidden markov processes n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Relative Entropy Rate of Two Hidden Markov Processes PowerPoint Presentation
Download Presentation
The Relative Entropy Rate of Two Hidden Markov Processes

Loading in 2 Seconds...

play fullscreen
1 / 25

The Relative Entropy Rate of Two Hidden Markov Processes - PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on

The Relative Entropy Rate of Two Hidden Markov Processes. Or Zuk Dept. of Phys. Of Comp. Systems Weizmann Inst. Of Science Rehovot, Israel. Overview. Introduction Distance Measures and Relative Entropy rate Results: Generalization from Entropy Rate. Future Directions. 9.0. 8.0.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Relative Entropy Rate of Two Hidden Markov Processes' - kane


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the relative entropy rate of two hidden markov processes

The Relative Entropy Rate of Two Hidden Markov Processes

Or Zuk

Dept. of Phys. Of Comp. Systems

Weizmann Inst. Of Science

Rehovot, Israel

.

overview
Overview
  • Introduction
  • Distance Measures and Relative Entropy rate
  • Results: Generalization from Entropy Rate.
  • Future Directions
introduction

9.0

8.0

R (Mohm)

7.0

0

200

400

600

800

1000

Quantum jumps

Mesoscopic wires

Introduction

Hidden Markov Processes are relevant:

  • Error Correction (Markovian source +noise)
  • Signal Processing, Speech recognition
  • Experimental physics -telegraph noise, TLS+noise, quantum jumps.
  • Bioinformatics -biological sequences, gene expression

Noise 10%

t

Transmission

Markov chain

HMP

hmp definitions
Markov Process:

X – Markov Process

Mλ – Transition Matrix

mλ(i,j) = Pr(Xn+1 = j| Xn = i)

Xn

Xn+1

Yn

Yn+1

HMP - Definitions

Models are denoted by λ and µ.

  • Hidden Markov Process :
  • Y – Noisy Observation of X
  • Rλ – Noise/Emission Matrix
  • rλ(i,j) = Pr(Yn = j| Xn = i)
example binary hmp

p(1|0)

p(0|0)

0

p(1|1)

1

p(0|1)

q(0|0)

q(1|1)

q(1|0)

q(0|1)

1

0

Example: Binary HMP

Transition

Emission

example binary hmp cont
Example: Binary HMP (Cont.)
  • A simple, Symmetric Binary HMP :
  • M = R =
  • All properties of the process depend on two parameters, p and . Assume w.l.og. p,  < ½
overview1
Overview
  • Introduction
  • Distance Measures and Relative Entropy rate
  • Results: Generalization from Entropy Rate.
  • Future Directions
distance measures for two hmps
Distance Measures for Two HMPs
  • Why important ?
  • Often, one learns a HMP from data. It is important to know how different is the learned model from the true model.
  • Sometimes, many HMPs may represent different sources (e.g. different authors, different protein families etc.), and we wish to know which sources are similar.
  • What distance measure to use?
  • Look at joint distributions of N consecutive Y symbols Pλ(N) and Pµ(N) .
relative entropy re rate

Alternative definition, using conditional relative entropy:

Relative Entropy (RE) Rate
  • Notation :
  • Relative Entropy for finite (N-symbol) distributions:
  • Take the limit to get the RE-rate:
relative entropy re rate1

For Markov chains, D(λ || µ) is easily given by:

Relative Entropy (RE) Rate
  • First proposed for HMPs by [Juang&Rabiner 85].
  • Not a norm (not symmetric, no triangle inequality).
  • Still it has several natural interpretations:

-If one generates data from λ, and gives likelihood score to µ, then D(λ || µ) is the average likelihood-loss per symbol (compared to the optimal model λ).

-If one compresses data generated λ, assuming erroneously it was generated by µ, then one ‘looses’ on average D(λ || µ) per symbol.

relative entropy re rate2
Relative Entropy (RE) Rate
  • For HMPs, D(λ || µ) is difficult to compute. So far only bounds [Silva&Narayanan] or approximation algorithms

[Li et al. 05, Do 03, Mohammad&Tranter 05] are known.

  • D(λ || µ) generalizes the concept of the Shannon entropy rate, using:

H(λ) = log s – D(λ || u)

Where u is the uniform model, s is the alphabet size of Y.

  • The entropy rate H for an HMP is a Lyapunov Exponent, which is hard to compute generally. [Jacquet et al 04]
  • What is known (for H) ? Lyapunov exponent representation, analyticity, asymptotic expansions in different Regimes.
  • Generalize results and techniques to the RE-rate.
why is calculating d difficult

Markov Chains:

X

2N

-All states with the same no. of flips have the same prob.

Polynomial number of types (probs).

HMPs :Many Markov chains, {X} contributes to the same Y. Different {Y}s have different probs.

Exponential number of types (probs). Method of types does not work here.

X

X

X

Y

2N

2N

Y

Why is calculating D(λ || µ)difficult?
overview2
Overview
  • Introduction
  • Distance Measures and Relative Entropy rate
  • Results: Generalization from Entropy Rate.
  • Future Directions
re rate and lyapunov exponents
RE-Rate and Lyapunov Exponents
  • What is Lyapunov exponent?
  • Arises in Dynamical Systems, Control Theory, Statistical Physics etc. Measures the stability of the system.
  • Take two (square) matrices A,B. Choose each time at random A (with prob. p) or B (w.p. 1-p). Look at the norm:

(1/N) log ||ABBBAABAB…BA||

The limit:

-Exists a.s. [Furstenberg&Kesten 60]

-Called Top Lyaponov Exponent.

-Independent of Matrix Norm chosen.

  • HMP entropy rate is given as a Lyaponov Exponent [Jacquet et al. 04]
re rate and lyapunov exponents1
RE-Rate and Lyapunov Exponents
  • What about RE-rate?
  • Given as the difference of two Lyapunov Exponents:

-The G’s are random matrices, which are simply obtained from M and R using the forward equations.

-Different matrices appear in the two Lyapunov exponents, but the probabilities selecting the matrices are the same.

analyticity of the re rate
Analyticity of the RE-Rate
  • Is the RE-rate continuous, ‘smooth’, or even analytic in the parameters governing the HMPs?
  • For Lyapunov exponents: Known analyticity in the matrix entries [Rulle 79], and their probabilities [Peres 90,91] separately.
  • For HMP entropy rate, analyticity was recently shown by [Han&Marcus 05].
analyticity of the re rate1
Analyticity of the RE-Rate
  • Using both results, we are able to show:

Thm: The RE-rate is analytic in the HMPs parameters.

  • Analyticity is shown only in the interior of the parameters domain (i.e. strictly positive probabilities).
  • Behavior on the boundaries is more complicated. Sometimes analyticity remains on the boundaries (and beyond). Sometimes we encounter singularities. Full characterization is still lacking [Marcus&Han 05].
re rate taylor series expansion
RE-Rate Taylor Series Expansion
  • While in general the RE-rate is not known, there are specific parameters values for which it is easily given in closed-form (e.g. for Markov-Chains). Perhaps we can ‘expand’ around these values, and get asymptotic results near them.
  • Similar approach was used for Lyapunov exponents [Derrida], and for HMP entropy rate [Jacquet et al. 04, Weizmann&Ordenlich 04, Zuk et al. 05] giving first-order asymptotics in various regimes.
different regimes binary case

½

0

0

½

p

For High-SNR (η= λ,µ) :

Solution can be given as a power-series in  :

Different Regimes – Binary Case

p -> 0 , p -> ½ ( fixed)

 -> 0 ,  -> ½ (p fixed)

We concentrate on the ‘High-SNR regime’  -> 0, and

‘almost-memorylessregime’ p-> ½.

re rate taylor series expansion1
RE-Rate Taylor Series Expansion
  • In [Zuk,Domany,Kanter&Aizenman 06] we give a procedure for calculating the full Taylor-Series Expansion for the HMP entropy rate, in the ‘High SNR’, and ‘almost memoryless’ regime.
  • Main observation: Finite systems give the correct RE rate up to a given order:
  • Was discovered using computer experiments (symbolic computation in Maple).
  • Stronger result holds for the entropy rate

(orders ‘settle’ for N ≥ (k+3)/2)

  • Does not hold for any regime. For some regimes (e.g. p->0), even first order never settles.
proof outline with m aizenman

e1 e2 e3…. ej ….

em=0

(k+3)/2

k+2

  • Two main Ideas:
  • To distinguish between noise at different site

B. When em=0, the observation Ym=Xm,

conditioning back to the past is ‘blocked’

X

Y

Proof Outline (with M. Aizenman)

H(p,e) up to O(ek)

H(λ)

D(λ||µ)

overview3
Overview
  • Introduction
  • Distance Measures and Relative Entropy rate
  • Results: Generalization from Entropy Rate.
  • Future Directions
re rate taylor series expansion2
RE-Rate Taylor Series Expansion
  • First order :
  • Higher orders were computed for the binary symmetric case.
  • Similar results for the ‘almost-memoryless’ regime.
  • Radius of convergence seems larger for the latter expansion, albeit no rigorous results are known.
future directions
Future Directions
  • Study other regimes. (e.g. two ‘close’ models).
  • Behavior of the EM algorithm.
  • Generalizations (e.g. different alphabets sizes, continuous case).
  • Physical realization of HMPs (mesoscopic systems, quantum jumps)
  • Domain of Analyticity - Radius of convergence.
thanks
Thanks
  • Eytan Domany (Weizmann Inst.)
  • Ido Kanter (Bar-Ilan Univ.)
  • Michael Aizenman (Princeton Univ.)
  • Libi Hertzberg (Weizmann Inst.)