Loading in 2 Seconds...

An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking

Loading in 2 Seconds...

141 Views

Download Presentation
##### An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**An automated pipeline balancingin the SRC Reconfigurable**Computerand its application to the RC5 cipher breaking Hatim Diab1, Miaoqing Huang1, Kris Gaj2, Tarek El-Ghazawi1 , Nikitas Alexandridis1 1The George Washington University 2George Masson University**Objectives**• Implement pipelined RC5 Key Breaker on a single chip, • Demonstrate automatic balancing of a pipeline by a compiler (SRC), • Show the cost of added pipeline. 1011/MAPLD'04**Requirements**• Given: • A matching pair of Plain text message (M) and Cipher text (C) • Find the correct corresponding Secret Key • Test the possible Secrete Keys exhaustively, • Keys, 128bit-long key from all 0’s to all 1’s. • Requirements • The processing element (PE) to be fed a new Secrete Key (Ki) each cycle, • Compare C with the output Ci corresponding to Ki 1011/MAPLD'04**RC5 Algorithm**• Mixing in the Secret Key. i=j=0 A=B=0 do 3*max(26,4) times // S[0..25] is the array to be mixed for rc5 encryption A=S[i]=(S[i]+A+B)<<<3; // L[0…3] is the array converted from the secrete key K[0..15] B=L[j]=(L[j]+A+B)<<<(A+B); i=(i+1) mod (26); // The output is the array S[0..25], which will be used to encrypt j=(j+1) mod (4); // the plain text. • Encryption. LE=A+S[0]; // A is the upper part of plain text RE=B+S[1]; // B is the low part of plain text for i=1 to 12 do LE=((LE⊕RE)<<<RE)+S[2*i]; RE=((RE⊕LE)<<<LE)+S[2*i+1]; The processed LE is the upper part of cipher text, The processed RE is the low part of cipher text. 1011/MAPLD'04**Key-Breaking Flowchart**1011/MAPLD'04**Condition & Implementation**• RC5 32/12/16 • Cipher text 32*2 bits = 64 bits • 12 rounds • Key = 16 * 8bits = 128 bits • Implement RC5 encryption using • 12 rounds of encryption macros, with 6 clocks latency • 78 iterations of key generation macros, with 3 clocks latency 1011/MAPLD'04**Design & Bottleneck**• Pipelined design • Process one key every clock cycle in a pipelined fashion • Data dependencies • One of the features of RC5 is the extensive use of data dependent rotations, • S value needed every 26th step, • L value needed every 4th step, • Manual HDL-based realization of the pipeline proved to be time-consuming and error-prone. 1011/MAPLD'04**Data Dependencies in Each Iteration**1011/MAPLD'04**Solution**• Implement on one FPGA chip concurrently • 78 key initialization macros • 12 encryption macros • Connect the macros in a linear pipeline. • The SRC compiler will balance the pipeline by inserting delay channels to make all macros run synchronously. 1011/MAPLD'04**Delay 1 = 1 reg**Delay 2 = 2 reg wire Delay 5 = 5 reg Delay Channels Added by SRC Compiler 1011/MAPLD'04**Detailed flow**1011/MAPLD'04**Compilation Result**• Device utilization summary: Number of External IOBs 594 out of 1104 53% Number of LOCed External IOBs 594 out of 594 100% Number of Slices 33790 out of 33792 99% Number of BUFGMUXs 1 out of 16 6% • Maximum Clock Frequency 1011/MAPLD'04**Effectiveness of the Benchmark**1011/MAPLD'04**Conclusion**• The objective was realized, i.e., every clock one 128bit-long variable is pushed into the processing chain, • A speed-up of 1000x over SW and 300x over serial HW implementations was achieved, • For the flexible parameters used in RC5 algorithm, different map routines can be designed respectively to fit the distinct area and throughput requirements, • The automated pipeline balancing of the SRC compiler proved to substantially decrease the development time of complex pipelined designs. 1011/MAPLD'04