a scalable architecture for ldpc decoding
Download
Skip this Video
Download Presentation
A Scalable Architecture for LDPC Decoding

Loading in 2 Seconds...

play fullscreen
1 / 18

A Scalable Architecture for LDPC Decoding - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

A Scalable Architecture for LDPC Decoding. Cocco, M.; Dielissen, J.; Heijligers, M.; Hekstra, A.; Huisken, J. Design, Automation and Test in Europe Conference and Exhibition, 2004. Proceedings , Volume: 3 , Feb. 16-20, 2004 Pages:88 - 93. Outline. Introduction Serial approach UMP algorithm

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Scalable Architecture for LDPC Decoding' - elgin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a scalable architecture for ldpc decoding

A Scalable Architecture for LDPC Decoding

Cocco, M.; Dielissen, J.; Heijligers, M.; Hekstra, A.; Huisken, J.

Design, Automation and Test in Europe Conference and Exhibition, 2004. Proceedings ,Volume: 3 ,Feb. 16-20, 2004 Pages:88 - 93

outline
Outline
  • Introduction
  • Serial approach
  • UMP algorithm
  • Dataset in check nodes
  • Check operation
  • Computation skill
  • Memory reduction
  • Computation for Iteration
introduction
Introduction
  • High code rate (=0.9) LDPC code
  • K (avg.=30):Row-weight
  • High code rate, codeword length and High SNR
  • Memory reduction (1/10)
serial approach
Serial Approach
  • Storage media application (optical or magnetic)
  • Relaxed delay requirement
  • Process from first bit node to last bit node
  • Memory storage for message
ump algorithm
UMP Algorithm
  • "FOR 40 ITERATIONS DO"
    • "FOR ALL BIT NODES DO"
      • "FOR EACH INCOMING ARC X"
        • "SUM ALL INCOMING LLRs EXCEPT OVER X"
        • "SEND THE RESULT BACK OVER X"
      • "NEXT ARC"
    • "NEXT BIT NODE"
    • "FOR ALL CHECK NODES DO"
      • "FOR EACH INCOMING ARC X"
        • "TAKE THE ABS MINIMUM OF THE INCOMING
        • LLRs EXCEPT OVER X"
        • “TAKE THE XOR OF THE INCOMING LLRs EXCEPT OVER X”
        • "SEND THE RESULT BACK OVER X"
      • "NEXT ARC“
    • "NEXT CHECK NODE"
  • "NEXT ITERATION"
ump algorithm1
UMP algorithm
  • Not needed knowledge of SNR of channel Robust performance
  • Not needed complex mathematical function (tanh x) area saving
dataset in check nodes
Check Node

4

Dataset in check nodes
  • Minimum: Overall minimum value
  • One-but-minimum
  • Index
check operation
Check operation
  • Compute exclusive or of all hard bits output by connected bit nodes, except jth.
  • Compute the minimum of all K absolute value of LLRs of bit nodes to which the check node is connected, except jth.
computation skill
Computation skill
  • Minimum:

LLRj is not minimum, minimum=overall minimum. Otherwise, minimum=second-to-minimum

memory reduction
Memory reduction
  • Original size
  • Reduced size
computation for iteration
Computation for Iteration
  • "FOR 40 ITERATIONS DO"
    • "FOR ALL BIT NODES DO"
      • “CALCULATE THE OUTPUT MESSAGES FROM THE 3 CONNECTED CHECK NODES“
      • “DO RUNNING CHECK NODE UPDATES ON THE 3 CHECK NODES”
    • “NEXT BIT NODES”
  • "NEXT ITERATION"
computation for iteration1
Computation for Iteration

NEW | OLD

NEW | OLD

NEW | OLD

NEW | OLD

time folded architecture
Control

R/W & address

Serial input

Serial output

Time folded architecture

FSM & PC

μROM

Computational Kernel

Prefetcher

Memory

prefetch
Prefetch
  • Every dataset is statically used for 30 consecutive cycles.
  • Every clock cycle an average of 2R and 2W operations are required.
  • Delayed writeback
  • Datasets caching
tiled architecture
Tiled architecture

FSM & PC

μROM

Computational Kernel

Prefetcher

Memory

result and area distribution
Result and area distribution
  • N=1020 R=0.5, 57 tiles

36mm2 with 0.13μm @1GHz, 300Mb/s

conclusion
Conclusion
  • Speedup & Simultaneously multiple access

 Prefetch

  • Reduce memory access latency

Memory hierarchy

  • Increase performance

N-tiled architecture

  • Modified version can be pipelined
ad