Rateless Wireless Networking Decoder
Download
1 / 22

Rateless Wireless Networking Decoder - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Rateless Wireless Networking Decoder. Mikhail Volkov Edison Achelengwa Minjie Chen. Cortex: a rateless wireless system. Very recent work here at CSAIL (Perry, 2011) Use a novel rateless code called spinal code

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Rateless Wireless Networking Decoder' - dara


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Rateless Wireless Networking Decoder

Mikhail Volkov

Edison Achelengwa

Minjie Chen


Cortex: a rateless wireless system

  • Very recent work here at CSAIL (Perry, 2011)

  • Use a novel rateless code called spinal code

  • Encoder and decoder agree on a seed s0, a hash function h and an IQ constellation mapping


Spinal Encoder

  • Wish to transmit a message M = m1m2... mn

  • Break the message into k-bit segments Mi

  • Apply h to generate a spine


Spinal Encoder

  • Encoder performs passes over the spine, each time generating new constellation points

  • These constellation points are sent across an AWGN channel


Spinal Decoder

  • Decoder knows s0 so it can generate the 2kpossible candidate symbols s1using h

  • Each time decoder receives symbol y it keeps the B best symbols from 2k candidates using ML

  • The transmitted message is estimated as the one with the lowest ML cost



Objectives

  • Implement decoder on an FPGA

  • Evaluate feasibility of Cortex in a real communications system

  • Identify key performance bottleneck and develop a clear strategy for developing a practical Cortex system


Micro-architecture

  • Interface

    • Takes stream of constellation symbols as input

    • Outputs a message (192-bit packet)

  • Decoding Stages

    • Code Enumeration

    • Add-Compare-Select

    • Suggestion Update

    • Spine Evaluator Update

    • Get output message


Decoder
Decoder

Input bit Streams

mkDecoder

I

curr_schedule

toACSQ

updateSymQ

doACS

outbitsQ

rcv (put)

Q

curr_suggcosts

suggupd

doEnumerate

Msg

Symbol

Vect(B*2^k, MarkedCost)

getOutMsg

out_msg (get)

Vect(B*2^k, EnumResp)

put

Sorting

module

get

Vect(B, MarkedCost)

EnumReq

Msg

Schedule

put

get

Vect(B, MarkedCost)

getMsg

Send_stat

getSchedule

Spine Evaluator

updateTree

Puncturing

Scheduler

backtrackMem

mkSalsa, h(*)

Symbol Mapper f(*)

evalupd

getBestMsgs

get

Vect(B, Mark)

schedule params

seeding parameters


Micro-architecture

  • Sub-modules

    • Puncturing Scheduler

    • Spine Evaluator

    • Sorter

    • Backtrack Memory


Decoder1
Decoder

Input bit Streams

mkDecoder

I

curr_schedule

toACSQ

updateSymQ

doACS

outbitsQ

rcv (put)

Q

curr_suggcosts

suggupd

doEnumerate

Msg

Symbol

Vect(B*2^k, MarkedCost)

getOutMsg

out_msg (get)

Vect(B*2^k, EnumResp)

put

Sorting

module

get

Vect(B, MarkedCost)

EnumReq

Msg

Schedule

put

get

Vect(B, MarkedCost)

getMsg

Send_stat

getSchedule

Spine Evaluator

updateTree

Puncturing

Scheduler

backtrackMem

mkSalsa, h(*)

Symbol Mapper f(*)

evalupd

getBestMsgs

get

Vect(B, Mark)

schedule params

seeding parameters


Practical Salsa Implementation

  • In practice we cannot have infinite precision floating point numbers

  • Salsa produces two outputs: a 64-bit spine and 512-bit arrays of symbol bits


Development and Testing

  • 3 point development and testing plan

  • Critical to our success with 3 people under time constraints

    Step 1: Develop Decoder backbone with dummy Sorter and Spine Evaluator. Develop Sorter and Spine Evaluator independently.

    - Sorter tested with MATLAB.

    - Spine Evaluator (and Salsa) tested with Python.


Development and Testing

Step 2: Integrate Decoder with Sorter and Spine Evaluator. Ensure correctness at the architectural level:

- Modules instantiate correctly

- Rules fire as expected, no deadlocks etc.

- Timing is correct

- Bits flowing end-to-end


Development and Testing

Step 3: Ensure correctness at the semantic level, i.e. “bit-by-bit debugging”

- Encode string with Python encoder to produce symbols

- Decode symbols and compare results

out

in

AWGN

Channel

Python Decoder

Python

Encoder

out

Bluespec Decoder


Development and Testing

  • Finally, the algorithm was tested by adding noise to the transmitted symbols

  • Strictly not our concern, as long as our implementation agreed with the source code

  • Algorithm worked very well

  • Actually “outdid” the reference code at one point: the Python code crashed but our decoder correctly decoded the message!


Performance Analysis – FPGA frequency

  • The synthesized FPGA maximum frequency is 98.035 MHz.

  • Different Salsas gives the same FPGA frequency .



Performance Analysis - Area

  • Sorter and SpineEvaluator take the most area


Performance Analysis - Area

  • Our implementation actually fits on the FPGA. (roughly taking 30% of the total area)

  • Different Salsa implementation don’t vary too much on device utilization.


Performance Analysis - Code

  • The total lines of source code was 3104. Of these, the total lines of test code was 1135 (36.5%) and non-test code was 1969 (63.4%).


How much better can we do?

  • We used a naive O(n2) algorithm for the sorter module. We might be able to use other algorithm to reduce the cycle step from 149 to 32 in the best case, which brings a 5 times better performance and improve the bit rate ot 7.5Mbits/s.

  • Given the current space requirement of Salsa, we can have B(B=4) of seperate hashing modules running in parallel with each other. In this case, we can have 4 times of better performance and improve the bit rates to 7.5*4 = 30 Mbits/s.

  • Suppose we have sufficient area on the FPGA, we will be able to have B*2k = 32 of hash modules running in parallel with each other . This will bring 32 times of better performance and improve the bit rates to 7.5*32 = 240Mbits/s.


ad