1 / 22

Rateless Wireless Networking Decoder

Rateless Wireless Networking Decoder. Mikhail Volkov Edison Achelengwa Minjie Chen. Cortex: a rateless wireless system. Very recent work here at CSAIL (Perry, 2011) Use a novel rateless code called spinal code

dara
Download Presentation

Rateless Wireless Networking Decoder

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rateless Wireless Networking Decoder Mikhail Volkov Edison Achelengwa Minjie Chen

  2. Cortex: a rateless wireless system • Very recent work here at CSAIL (Perry, 2011) • Use a novel rateless code called spinal code • Encoder and decoder agree on a seed s0, a hash function h and an IQ constellation mapping

  3. Spinal Encoder • Wish to transmit a message M = m1m2... mn • Break the message into k-bit segments Mi • Apply h to generate a spine

  4. Spinal Encoder • Encoder performs passes over the spine, each time generating new constellation points • These constellation points are sent across an AWGN channel

  5. Spinal Decoder • Decoder knows s0 so it can generate the 2kpossible candidate symbols s1using h • Each time decoder receives symbol y it keeps the B best symbols from 2k candidates using ML • The transmitted message is estimated as the one with the lowest ML cost

  6. Spinal Decoder

  7. Objectives • Implement decoder on an FPGA • Evaluate feasibility of Cortex in a real communications system • Identify key performance bottleneck and develop a clear strategy for developing a practical Cortex system

  8. Micro-architecture • Interface • Takes stream of constellation symbols as input • Outputs a message (192-bit packet) • Decoding Stages • Code Enumeration • Add-Compare-Select • Suggestion Update • Spine Evaluator Update • Get output message

  9. Decoder Input bit Streams mkDecoder I curr_schedule toACSQ updateSymQ doACS outbitsQ rcv (put) Q curr_suggcosts suggupd doEnumerate Msg Symbol Vect(B*2^k, MarkedCost) getOutMsg out_msg (get) Vect(B*2^k, EnumResp) put Sorting module get Vect(B, MarkedCost) EnumReq Msg Schedule put get Vect(B, MarkedCost) getMsg Send_stat getSchedule Spine Evaluator updateTree Puncturing Scheduler backtrackMem mkSalsa, h(*) Symbol Mapper f(*) evalupd getBestMsgs get Vect(B, Mark) schedule params seeding parameters

  10. Micro-architecture • Sub-modules • Puncturing Scheduler • Spine Evaluator • Sorter • Backtrack Memory

  11. Decoder Input bit Streams mkDecoder I curr_schedule toACSQ updateSymQ doACS outbitsQ rcv (put) Q curr_suggcosts suggupd doEnumerate Msg Symbol Vect(B*2^k, MarkedCost) getOutMsg out_msg (get) Vect(B*2^k, EnumResp) put Sorting module get Vect(B, MarkedCost) EnumReq Msg Schedule put get Vect(B, MarkedCost) getMsg Send_stat getSchedule Spine Evaluator updateTree Puncturing Scheduler backtrackMem mkSalsa, h(*) Symbol Mapper f(*) evalupd getBestMsgs get Vect(B, Mark) schedule params seeding parameters

  12. Practical Salsa Implementation • In practice we cannot have infinite precision floating point numbers • Salsa produces two outputs: a 64-bit spine and 512-bit arrays of symbol bits

  13. Development and Testing • 3 point development and testing plan • Critical to our success with 3 people under time constraints Step 1: Develop Decoder backbone with dummy Sorter and Spine Evaluator. Develop Sorter and Spine Evaluator independently. - Sorter tested with MATLAB. - Spine Evaluator (and Salsa) tested with Python.

  14. Development and Testing Step 2: Integrate Decoder with Sorter and Spine Evaluator. Ensure correctness at the architectural level: - Modules instantiate correctly - Rules fire as expected, no deadlocks etc. - Timing is correct - Bits flowing end-to-end

  15. Development and Testing Step 3: Ensure correctness at the semantic level, i.e. “bit-by-bit debugging” - Encode string with Python encoder to produce symbols - Decode symbols and compare results out in AWGN Channel Python Decoder Python Encoder out Bluespec Decoder

  16. Development and Testing • Finally, the algorithm was tested by adding noise to the transmitted symbols • Strictly not our concern, as long as our implementation agreed with the source code • Algorithm worked very well • Actually “outdid” the reference code at one point: the Python code crashed but our decoder correctly decoded the message!

  17. Performance Analysis – FPGA frequency • The synthesized FPGA maximum frequency is 98.035 MHz. • Different Salsas gives the same FPGA frequency .

  18. Performance Analysis – Frequency, Latency, Throughput

  19. Performance Analysis - Area • Sorter and SpineEvaluator take the most area

  20. Performance Analysis - Area • Our implementation actually fits on the FPGA. (roughly taking 30% of the total area) • Different Salsa implementation don’t vary too much on device utilization.

  21. Performance Analysis - Code • The total lines of source code was 3104. Of these, the total lines of test code was 1135 (36.5%) and non-test code was 1969 (63.4%).

  22. How much better can we do? • We used a naive O(n2) algorithm for the sorter module. We might be able to use other algorithm to reduce the cycle step from 149 to 32 in the best case, which brings a 5 times better performance and improve the bit rate ot 7.5Mbits/s. • Given the current space requirement of Salsa, we can have B(B=4) of seperate hashing modules running in parallel with each other. In this case, we can have 4 times of better performance and improve the bit rates to 7.5*4 = 30 Mbits/s. • Suppose we have sufficient area on the FPGA, we will be able to have B*2k = 32 of hash modules running in parallel with each other . This will bring 32 times of better performance and improve the bit rates to 7.5*32 = 240Mbits/s.

More Related