Error Correcting Codes for Serial links : an update

Error Correcting Codes for Serial links : an update Sergio Cavaliere • Department of Physics, University of Napoli “Federico II”, Italy • and • INFN Sezione di Napoli, Italy • e-mail: sergio.cavaliere@na.infn.it XV SuperB Workshop – Caltech - Dec, 2010

Overview • Recall Problems with serial link failures and errors due to rad hard environment • Recall what are the relevant parameters for the performance of the error correcting code • Define bit error rate and bit error frequency and the related Poisson statistics • Probability analysis regarding Bit Error Rate reduction in Hamming codes. • Probability analysis regarding Bit Error Rate reduction in Reed Solomon codes • Analysis of some proposed coding structures • Conclusion and future work XV SuperB Workshop – CaltechDec., 2010

Problems with serial link failures and errors • Two main problems regarding errors due to rad hard environment : • Loss Of Lock – due to failures on fixed bits in the SERDES - analyzed in last Frascati meeting. • Conclusion: need to provide a direct fast link between transmitter and receiver in order to signall promptly occurrence of LoL • Bit errors due to the radiation hard environment: affect data integrity and data quality • Solution: need of an Error Correcting Code (ECC) • evaluation of the required performance of the code • start from a presumed bit error rate 10-10 (LHC?) [compared with the extreme technology limits of 10-15] • arrive to a desired time between errors XV SuperB Workshop – CaltechDec., 2010

Relevant parameters for the performance of the error correcting code • In the usual communication approach, the relevant parameter for serial link improvement is the coding gain: • increase in channel noise wich can be balanced by error correction codes • This allows • reducing costs with the same performance or • increasing speed at the same cost: • relax SNR requirements • in our case what is important is the bit error rate reduction obtained by ECC • From BER parameter we may compute an overall failure rate for each serial link and for the whole apparatus at a fixed data rate XV SuperB Workshop – CaltechDec., 2010

Bit Error Rate and time between errors λ error events and λ faultybits 1/T transmitted bits Unit time T transmit clock period BER =no. of errored bits/ no. of trasmitted bits f = transmission frequency λ = BEF bit error frequency=BER*f Average Time between errors = 1/BEF e.g. f=1.1GHz BER=10-10 λ=0.11 μ=Average Time between errors = 9s XV SuperB Workshop – CaltechDec., 2010

Bit errors: Poisson statistics • Error on bits caused by events which take place in a radiation hard environment has an usual statistics with the features: • events take place one after the other and indipendently each other • the average number of events in unit time is constant, equal to λ. • λ is the average number of events in unit time (frequency or rate) • μ =1/ λ is the average time distance from one event to the next XV SuperB Workshop – CaltechDec., 2010

Bit Error Rate and time between errors The diagram shows how a value for BER translates into the average time between errors (in case of continuous data exchange) at a fixed operating frequency e.g. BER=10-10 average time between errors = 9s use error correction coding to achieve: BER=10-16 average time between errors = 4 months XV SuperB Workshop – CaltechDec., 2010

How to evaluate how much ECC power is needed? Assume a command length of 100 bits (actual figures will be 72-90-108 bits) Assume a reference BER=10-10 for each link For a single link: correction of 0 bit per frame will deliver BER=10-10 -> time between errors 9s correction of 1 bit per frame will deliver BER=5*10-17 time between errors years correction of 2 bit per frame will deliver BER=2*10-25 t between errors many years Binomial formula: probability of having n errors in a frame of m bits and error probability p We may argue that a moderate complexity ECC may be adopted Observation: Low probability values would involve very long simulations XV SuperB Workshop – CaltechDec., 2010

Bit Error Rate reduction in Hamming codes Probability of a word error for block codes. n = wordlength t=no. of corrected bits Probability of a bit error for block codes. n=wordlength t=no. of corrected bits Probability of a bit error for Hamming codes t=1 n=2m-1 In log scale it is a straight line with angular coefficient 2(n-1) XV SuperB Workshop – CaltechDec., 2010

Bit Error Rate reduction in shortened Hamming codes Due to the 18 bits constraint in the serdes we must use shortened Hamming codes 26 H(31,26) 13 H(31,26) 31 18 18 31 26 13 13 13 13 0 13 For Hamming code H(n,k) shortened to H(ns,ks) In the above example n=31 ns=18 XV SuperB Workshop – CaltechDec., 2010

8 15 12 15 8 11 H(15,11) 12 11 H(15,11) 3 3 0 3 3 18 11 11 7 4 H(7,4) 6 H(7,4) 4 3 6 7 3 1 1 0 1 1 Bit Error Rate reduction in multi-Hamming codes Our codes will be made of a combination of shortened Hamming codes pmulti = bit error probability for the overall code p1=probability of branch no.1 (shortened Hamming) k1red = no. of bits of the message of branch no.1 p2=probability of branch no. 2 (shortened Hamming) k2red = no. of bits of the message of branch no. 2 XV SuperB Workshop – CaltechDec., 2010

Bit Error Rate reduction in Reed Solomon codes ps probability that symbol is in error p probability that a bit is in error m is the symbol length pew probability that a word made of n symbols is in error pib probability that a bit of the message is in error after ECC coding Same work as Hamming to obtain features for shortened and combined codes XV SuperB Workshop – CaltechDec., 2010

Hamming code: features of a selected test code 15 11 H(15,11) 36 bit 25 bit 11 15 H(15,11) serdes 18bit 2*18 Data to transmit 6 3 Hs(6,3) 18 18 buffer & scrambler n=2 Ecc = 12 % Overhead = 44 % seriallink 15 11 H(15,11) Data to distribute 36 bit 25 bit 11 15 H(15,11) serdes 18bit 2*18 6 3 Hs(6,3) 18 18 Buffer & descrambler trasmitted a frame of 2x18 bit=36bit no polarity control codes 2 x H(15,11) + Hs(6,3) {from H(7,4)} encoder BER 10-10  10-19 decoder gen. ’20 13 XV SuperB Workshop – CaltechDec., 2010

Hamming code: features of a selected test code e.g. f=1.1GHz uncorrected BER = 10-10 average time between failures 9 s after coding corrected BER = 10-19 average time between failures 244 years XV SuperB Workshop – CaltechDec., 2010 gen. ’20 14

Reed Solomon codes • Similar examples may be made for Reed Solomon codes • We do not show an example for this also because greater hardware complexity of both encoding and decoding may drive to the Hamming solution which is: • simple as far as regards hardware complexity and • faster as far as regards the involved delays XV SuperB Workshop – CaltechDec., 2010

Conclusions • We have developed a thorough statistical analysis of bit error probability after ECC coding for complex, shortened and mixed codes both Hamming and Reed Solomon codes, with some simulation • We must point out that the above consideration on error rates apply to a single link. The 500-1000 multiplicity will obviously raise the bit error frequency in the apparatus by that multiplying factor. • Even taking into account this circumstance we might argue that a moderate correction capability is needed in order to reduce error rate to a suitable value. • This will be assessed as soon as we will have precise figures on the error rate in our rad hard environment • We will therefore revert to very simple ECC structures, fully compatible with a proper hardware implementation on the ground of both available hardware resources and processing time XV SuperB Workshop – CaltechDec., 2010

To be done • obtain precise figures on the bit error rate in ourrad hard environment • define and analyzeHamming (ReedSolomon) codingstructureswith the purposeofreducingbothsilicon area and operatingspeedfor the implementation • analyzethoroughly the impact oferrorrates on the performance of the overallapparatus and related data quality • evaluatepracticalimplementations XV SuperB Workshop – CaltechDec., 2010

Error Correcting Codes for Serial links : an update

Error Correcting Codes for Serial links : an update

Presentation Transcript

Error Detecting and Error Correcting Codes

Error correcting codes

Error-Correcting Codes: Classical to Quantum

Section 3.5: Error-Correcting Codes

Hardware accelerator for Efficient error-correcting codes

Error Correcting Codes

An elementary introduction to error correcting codes

Error-Correcting Codes for TLC Flash

Error Correcting Codes

Error correcting codes

ENEE 626: Error Correcting Codes

Using Error-Correcting Codes For Text Classification

Using Error-Correcting Codes For Text Classification

Error Correcting Codes

Error correcting codes

Digital Communication and Error Correcting Codes

An introduction to error correcting codes

Introduction to Error Correcting Codes

An elementary introduction to error correcting codes

ERROR-DETECTING AND ERROR- CORRECTING CODES

Error Correcting Codes

Error-Detecting and Error-Correcting Codes