- 38 Views
- Uploaded on
- Presentation posted in: General

Markov, Shannon, and Turbo Codes: The Benefits of Hindsight

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Markov, Shannon, and Turbo Codes:The Benefits of Hindsight

Professor Stephen B. Wicker

School of Electrical Engineering

Cornell University

Ithaca, NY 14853

- Theme 1: Digital Communications, Shannon and Error Control Coding
- Theme 2: Markov and the Statistical Analysis of Systems with Memory
- Synthesis: Turbo Error Control: Parallel Concatenated Encoding and Iterative Decoding

- The classical design problem: transmitter power vs. bit error rate (BER)
- Complications:
- Physical Distance
- Co-Channel and Adjacent Channel Interference
- Nonlinear Channels

- Noisy Channel Coding Theorem (1948):
- Every channel has a capacity C.
- If we transmit at a data rate that is less than capacity, there exists an error control code that provides arbitrarily low BER.

- For an AWGN channel:

- Coding Gain: PUNCODED - PCODED
- The difference in power required by the uncoded and coded systems to obtain a given BER.

- NCCT: Almost 10dB possible on an AWGN channel with binary signaling.
- 1993: NASA/ESA Deep Space Standard provides 7.7 dB.

- MAP Sequence Decoding Problem:
- Find X that maximizes p(X|Y).
- Derive estimate of U from estimate of X.
- General problem is NP-Hard - related to many optimization problems.
- Polynomial time solutions exist for special cases.

- Hard decision: MAP decoding reduces to minimum distance decoding.
- Example: Berlekamp algorithm (RS codes)
- Soft Decision: Received signals are quantized.
- Example: Viterbi algorithm (Convolutional Codes)
- These techniques do NOT minimize information error rate.

- Memory is incorporated into encoder in an obvious way.
- Resulting code can be analyzed using state diagram.

- Convolutional code can be depicted as a tree.
- Tree and metric define a metric space.
- Sequential decoding is a local search of a metric space.
- Search complexity is a polynomial function of memory order.
- May not terminate in a finite amount of time.
- Local search methodology to return...

- Markov was, among many other things, a cryptanalyst.
- Interested in the structure of written text.
- Certain letters are can only be followed by certain others.

- Markov Chains:
- Let I be a countable set of states and let l be a probability measure on I.

- Let random variable S range over I and set li = p(S = i)
- Let P = {pij} be a stochastic matrix with rows and columns indexed by I.
- S = (Sn)n≥0 is a Markov chain with initial distribution l and transition matrix P if
- S0 has distribution l
- p(Sn+1 | S0, S1, S2, …, Sn – 1, Sn) = P(Sn+1 | Sn) = pij

- HMM :
- Markov chain X = X1, X2, …
- Sequence of r.v.’s Y = Y1, Y2, … that are a probabilistic function f() of X.

- Inference Problem: Observe Y and infer:
- Initial state of X
- State transition probabilities for X
- Probabilistic function f()

- Duration of eruptions by Old Faithful
- Movement of Locusts (Locusta Migratoria)
- Suicide rate in Capetown, SA.
- Progress of epidemics
- Econometric models
- Decoding of convolutional codes

- Lloyd Welch and Leonard Baum developed iterative solution to the HMM inference problem (~1962).
- Application-specific solution was classified for many years.
- Published in general form:
- L. E. Baum and T. Petrie, “Statistical Inference for Probabilistic Functions of Finite State Markov Chains,” Ann. Math. Stat., 37:1554 - 1563, 1966.

- Member of the class of algorithms now known as “Expectation-Maximization”, or “EM” algorithms.
- Initial hypothesis q0
- Series of estimates generated by the mapping qi = T(qi-1)
- P(q0) ≤ P(q1) ≤ P(q2) ≤ … , where is the maximum likelihood parameter estimate.

- Goal: Derive probability measure p(xj, y).
- BW algorithm recursively computes a’s and b’s.

- Define flow r(xi, xj) to be the probability that a random walk starting at xi will terminate at xj.
- a(xj) is the forward flow to xj at time j.
- b(xj) is the backward flow to xj at time j.

- Several of the woodsmen began to move slowly toward her and observing them closely, the little girl saw that they were turned backward, but really walking forward. “We have to go backward forward!” cried Dorothy. “Hurry up, before they catch us.”
- Ruth Plumly Thompson, The Lost King of Oz, pg. 120, The Reilly & Lee Co., 1925.

- Judea Pearl (1988)
- Each node in a polytree separates the graph into two distinct subgraphs.
- X D-separates upper and lower variables, implying conditional independence.

- 1974: Bahl, Cocke, Jelinek, and Raviv apply portion of BW algorithm to trellis decoding for convolutional and block codes.
- Forward and backward trellis flow: APP that a given branch is traversed.
- Info bit APP: sum of probabilities for branches associated with particular bit value.

- May 25, 1993: G. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon-Limit Error Correction Coding: Turbo Codes.”
- Two Key Elements:
- Parallel Concatenated Encoders
- Iterative Decoding.

- One “systematic” and two parity streams are generated from the information.
- Recursive (IIR) convolutional encoders are used as “component” encoders.

- Only a small number of low-weight input sequences are mapped to low-weight output sequences.
- The interleaver ensures that if the output of one component encoder has low weight, the output of the other probably will not.
- PCC emphasis: minimize number of low weight code words, as opposed to maximizing the minimum weight.

- BW/BCJR decoders are associated with each component encoder.
- Decoders take turns estimating and exchanging distribution on information bits.

- Decoder 1: BW/BCJR derives
- Decoder 2: BW/BCJR derives

- Information exchanged by the decoders must not be strongly correlated with systematic info or earlier exchanges.

- Turbo coding provides coding gain near 10dB
- Within 0.3 dB of the Shannon limit.
- NASA/ESA DSN: 1 dB = $80M in 1996.

- Issues:
- Sometimes turbo decoding fails to correct all of the errors in the received data. Why?
- Sometimes the component decoders do not converge. Why?
- Why does turbo decoding work at all?

- Cross entropy, or the Kullback-Leibler distance, is a measure of the distance between two distributions.
- Joachim Hagenauer et al. have suggested using a cross-entropy threshold as a stopping condition for turbo decoders.

- Neural networks can implement any piecewise-continuous function.
- Goal: Emulation of indicator functions for turbo decoder error and convergence.
- Two Experiments:
- FEDN: Predict eventual error and convergence at the beginning of the decoding process.
- DEDN: Detect error and convergence at the end of the decoding process.

- Missed detection occurs when number of errors is small.
- The average weight of error events in NN-assisted turbo is far less than that of CRC-assisted turbo decoding.
- When coupled with a code combining protocol, NN-assisted turbo is extremely reliable.

- Examined weights generated during training.
- Network monitors slope of cross entropy (rate of descent).
- Conjecture:
- Turbo decoding is a local search algorithm that attempts to minimize cross-entropy cycles.
- Topology of search space is strongly determined by initial cross entropy.

- Turbo Simulated Annealing (Buckley, Hagenauer, Krishnamachari, Wicker)
- Nonconvergent turbo decoding is nudged out of local minimum cycles by randomization (heat).

- Turbo Genetic Decoding (Krishnamachari, Wicker)
- Multiple processes are started in different places in the search space.

- “Classical” response to Shannon:
- Derive probability measure on transmitted sequence, not actual information.
- Explore optimal solutions to special cases of NP-Hard problem.
- Optimal, polynomial time decoding algorithms limit choice of codes.

- “Modern”: Exploit Markov property to obtain temporal/spatial recursion:
- Derive probability measure on information, not codeword
- Explore suboptimal solutions to more difficult cases of NP-Hard problem.
- Iterative decoding
- Graph Theoretic Interpretation of Code Space
- Variations on Local Search

- Relation of cross entropy to impact of cycles in belief propagation.
- Near-term abandonment of PCE’s as unnecessarily restrictive.
- Increased emphasis on low density parity check codes and expander codes.
- Decoding algorithms that look like solutions to K-SAT problem.
- Iteration between subgraphs.
- Increased emphasis on decoding as local search.