A scalable architecture for ldpc decoding
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

A Scalable Architecture for LDPC Decoding PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on
  • Presentation posted in: General

A Scalable Architecture for LDPC Decoding. Cocco, M.; Dielissen, J.; Heijligers, M.; Hekstra, A.; Huisken, J. Design, Automation and Test in Europe Conference and Exhibition, 2004. Proceedings , Volume: 3 , Feb. 16-20, 2004 Pages:88 - 93. Outline. Introduction Serial approach UMP algorithm

Download Presentation

A Scalable Architecture for LDPC Decoding

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A scalable architecture for ldpc decoding

A Scalable Architecture for LDPC Decoding

Cocco, M.; Dielissen, J.; Heijligers, M.; Hekstra, A.; Huisken, J.

Design, Automation and Test in Europe Conference and Exhibition, 2004. Proceedings ,Volume: 3 ,Feb. 16-20, 2004 Pages:88 - 93


Outline

Outline

  • Introduction

  • Serial approach

  • UMP algorithm

  • Dataset in check nodes

  • Check operation

  • Computation skill

  • Memory reduction

  • Computation for Iteration


Introduction

Introduction

  • High code rate (=0.9) LDPC code

  • K (avg.=30):Row-weight

  • High code rate, codeword length and High SNR

  • Memory reduction (1/10)


Serial approach

Serial Approach

  • Storage media application (optical or magnetic)

  • Relaxed delay requirement

  • Process from first bit node to last bit node

  • Memory storage for message


Ump algorithm

UMP Algorithm

  • "FOR 40 ITERATIONS DO"

    • "FOR ALL BIT NODES DO"

      • "FOR EACH INCOMING ARC X"

        • "SUM ALL INCOMING LLRs EXCEPT OVER X"

        • "SEND THE RESULT BACK OVER X"

      • "NEXT ARC"

    • "NEXT BIT NODE"

    • "FOR ALL CHECK NODES DO"

      • "FOR EACH INCOMING ARC X"

        • "TAKE THE ABS MINIMUM OF THE INCOMING

        • LLRs EXCEPT OVER X"

        • “TAKE THE XOR OF THE INCOMING LLRs EXCEPT OVER X”

        • "SEND THE RESULT BACK OVER X"

      • "NEXT ARC“

    • "NEXT CHECK NODE"

  • "NEXT ITERATION"


Ump algorithm1

UMP algorithm

  • Not needed knowledge of SNR of channel Robust performance

  • Not needed complex mathematical function (tanh x) area saving


Dataset in check nodes

Check Node

4

Dataset in check nodes

  • Minimum: Overall minimum value

  • One-but-minimum

  • Index


Check operation

Check operation

  • Compute exclusive or of all hard bits output by connected bit nodes, except jth.

  • Compute the minimum of all K absolute value of LLRs of bit nodes to which the check node is connected, except jth.


Computation skill

Computation skill

  • Minimum:

    LLRj is not minimum, minimum=overall minimum. Otherwise, minimum=second-to-minimum


Memory reduction

Memory reduction

  • Original size

  • Reduced size


Memory unit inside check node

Memory unit inside Check node


Computation for iteration

Computation for Iteration

  • "FOR 40 ITERATIONS DO"

    • "FOR ALL BIT NODES DO"

      • “CALCULATE THE OUTPUT MESSAGES FROM THE 3 CONNECTED CHECK NODES“

      • “DO RUNNING CHECK NODE UPDATES ON THE 3 CHECK NODES”

    • “NEXT BIT NODES”

  • "NEXT ITERATION"


Computation for iteration1

Computation for Iteration

NEW | OLD

NEW | OLD

NEW | OLD

NEW | OLD


Time folded architecture

Control

R/W & address

Serial input

Serial output

Time folded architecture

FSM & PC

μROM

Computational Kernel

Prefetcher

Memory


Prefetch

Prefetch

  • Every dataset is statically used for 30 consecutive cycles.

  • Every clock cycle an average of 2R and 2W operations are required.

  • Delayed writeback

  • Datasets caching


Tiled architecture

Tiled architecture

FSM & PC

μROM

Computational Kernel

Prefetcher

Memory


Result and area distribution

Result and area distribution

  • N=1020 R=0.5, 57 tiles

    36mm2 with 0.13μm @1GHz, 300Mb/s


Conclusion

Conclusion

  • Speedup & Simultaneously multiple access

     Prefetch

  • Reduce memory access latency

    Memory hierarchy

  • Increase performance

    N-tiled architecture

  • Modified version can be pipelined


  • Login