cyclic redundancy codes
Download
Skip this Video
Download Presentation
Cyclic redundancy codes

Loading in 2 Seconds...

play fullscreen
1 / 26

Cyclic redundancy codes - PowerPoint PPT Presentation


  • 140 Views
  • Uploaded on

Cyclic redundancy codes. Circuit elements in Digital computations Prof. Seok-Bum Ko Mehrnoosh Janbakhsh Jan29, 2010. Novel Table Lookup-Based Algorithms for High-Performance CRC Generation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Cyclic redundancy codes' - katen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cyclic redundancy codes
Cyclic redundancy codes
  • Circuit elements in Digital computations
  • Prof. Seok-Bum Ko
  • Mehrnoosh Janbakhsh
  • Jan29, 2010
novel table lookup based algorithms for high performance crc generation
Novel Table Lookup-Based Algorithms for High-Performance CRC Generation
  • VOL.57, NO.11, November 2008
  • Michael E. Kounavis, Member, IEEE
  • Frank L. Berry
introduction
Introduction
  • CRC are used for detecting the digital content corruption
  • CRC treats each bitstream as a binary polynomial
  • All the binary words corresponing to remainder are transmitted with the bitstream
  • At the receiver side, CRC algorithms verify the correct remainder has been received
point of interest
Point of Interest
  • - New investigation on the CRC generation algorithms implementation in software
  • - Good for accelerating well known Codes
  • - Give more speed to many commercial host, network, and server chipsets
  • - A number of proposed Internet protocols like data center protocols require data integrity checks be performed above the transport layer by using very high speed CRCs(e.g., 10 Gbps)
sarwate algorithm
Sarwate Algorithm
  • This algorithm is able to read 8 bits at a time from a stream and calculates the stream\'s CRC value by performing lookups on a table of 256 32-bit entries.
  • It was designed when most computer architectures allowed XOR operations between 8-bit quantities.
  • Now they can perform efficiently between 32- or 64-bit quantities and few clock cycles large on-chip cache memory access.
what is new here
What is new here?
  • Novel slicing-by-4 algorithm

Based on Sarwate algorithm

Use a 4-Kbyte cache footprint

Double the existing CRC performance by reading 32 bits at a time

  • Novel slicing-by-8 algorithm

Based on Sarwate algorithm

Use a 8-Kbyte cache footprint

Triples the existing CRC performance by reading 64 bits at a time

advantages
Advantages
  • - Using the parallel lookup tables to generate the CRC values over long bitstreams.
  • - Compute the next remainder by performing parallel LUTs into smaller tables
parallet luts concept
Parallet LUTs concept
  • The concept of parallel table lookups appears in early CRC5 implementations and the work done by Braun and Waldvogel on performing incremental CRC updates for IP over ATM networks.
crc generation process
CRC Generation Process
  • CRCs are error detecting codes that are capable to detect the accidental alteration of data. Data in computer systems can be modified due to many reasons like hard drive malfunctions, Gaussian noise, and faulty physical connections.
how crc algorithm works
How CRC algorithm works?
  • It treats each bitstream as a binary polynomial B(x) and the remainder R(x) from the division of B(x) with a standard ”generator” polynomial g(x).
  • The length of R(x) in bits is equal to the length of G(x) minus one.
  • At the reciever, CRC algorithms verify that R(x) is the correct remainder.
  • Additions and subtractions are carry-less so they are equal to the XOR logical operation.
straightforward lut example 1
Straightforward LUT Example 1
  • divisordividend
  • 11011 10001 1 1 0 11000
  • 11011 ↓↓ ↓
  • steps 1010 1 ↓ ↓
  • replaced by 1101 1 ↓ ↓
  • a LUT 111 0 1 ↓
  • 110 1 1 ↓
  • current remainder011 0 0

Accelerating the long division using table lookups

modify ex 1
Modify Ex. 1
  • Remainder slicing
sarwate alg disadvantage
Sarwate Alg. disadvantage
  • The memory requirement is high when reading a large amount of bits at a time. For example, to achive acceleration by reading 32 bits at a time, table driven algorithm needs a table of 2 ³²= 4G entries.
first step
First step
  • p is the MSB of B (bit stream)
  • l be the length of B, l>p
  • g be the length of generator polynomial, g<l
  • l-g+1 is B\'s MSB that got encoded
  • g-1 is B\'s LSB that is equal to zero
continue
Continue
  • P= {b1,b2,.....bp} , B= {b1,b2,...,bl}
  • P= {P1:P2:....:Pm}
  • p is the length of P and p=Σpi
  • P is sliced in order for our Alg. to be able to read potentially large amounts of data without having to access to LUT of 2 power p entries.
  • Each Pi has its own LUT:Ti by the size of 2 power pi and contains the shifted remainders by an offset oi.
calculations
calculations
  • oi = ∑pj , m< j < i+1
  • Let\'s R1(i) be the values from LUT during first step: Ri(1)= Pi . 2 power oimod G
  • Ri (1)= ө Ri(1) , m<i<1
  • S(1)=[ R(1): Q(1)]=R(1).2 power qө Q(1)
  • Q(1) is the set of next q bits of the bit stream after p bits

Q(1)=[bp+1bp+2....bp+q]

step k
Step k
  • The difference between first step and other steps is because the length of the input stream l may not be a multiple of the amount of bits that are read at a time q.
  • f i = ∑ sj , m<j <i+1
  • Ri(k)= Si (k-1) . 2 powerfi mod G
  • Ri (k)= ө Ri(k) , n<i<1
  • S(k)=[ R(k): Q(k)]=R(k). 2 power qө Q(k)
  • N=l/q +1
correctness
Correctness

Theyprove the correctness of the algorithmic framework by showing the value of R(n) that is produced in the last step of framework is indeed the remainder from the division of the input stream B with the generator polynomial using modulo-2 arithmetic.

space and time requirements
Space and time requirements
  • In the first step, m slices are created and m LUT performed.
  • In worse case each slice will need one shift operation and one logical operation.
  • m-1 XOR operations are required for the execution of the first step
  • Total number of operations Including shift, AND, XOR and LUTs is O(1) = 4.m – 1
  • Since LUTs are in parallel, it will reduce to

O(1) = 3.m in fist step

continue1
Continue
  • In step k, the total number of operations required for the execution will be:
  • O= ∑ o(i) = 3.n (N+1) +3.m

n: No. of LUTs

m-1: No. of XOR

N : No. of steps to execute

continue2
Continue
  • The space required for storing the tables used by the first step of our algorithmic framework is :

E(1) = ∑ 2 power pi

m < i < 1

  • And in step k :

E(k) = ∑ 2 power si

n < i < 1

riminder
Riminder
  • The total space requirement of the slicing by 4 Alg. is 4 K bytes and it could read 32 bits at a time.
  • The total space requirement of the slicing by 8 Alg. is 8 K bytes and it could read 64 bits at a time.
evaluation
Evaluation
  • It is a trade-off between the number of logical operations and the space requirement of the algorithm.
  • If tables are stored in an external memory unit, the latency associated with accessing these tables may be significantly higher than they are stored in a cache unit.
  • Slicing reduces the number of operations performed for each byte of an input stream.
min and ave processing cost
Min. and Ave. processing cost
  • “Warm” refer to any memory entry placed in a cache memory
  • “Cold” refer to any any memory entry stored in an external memory unit
ad