1 / 24

AES Microcode Implementation In IXP2400 And A study of Reconfigurable Crypto Unit

AES Microcode Implementation In IXP2400 And A study of Reconfigurable Crypto Unit. Piyush Ranjan Satapathy CS203B Class Project Presentation. Road Map. AES Algorithm Overview IXP2400 Platform: A Quick Look Microcode: Overview Implementation of AES Experimental Results

omer
Download Presentation

AES Microcode Implementation In IXP2400 And A study of Reconfigurable Crypto Unit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. AES Microcode Implementation In IXP2400 And A study ofReconfigurable Crypto Unit Piyush Ranjan Satapathy CS203B Class Project Presentation

  2. Road Map • AES Algorithm Overview • IXP2400 Platform: A Quick Look • Microcode: Overview • Implementation of AES • Experimental Results • Reconfigurable Crypto unit of Intel IXP2850

  3. Algorithm Overview • Designed by Daemen and Rijmen for the NIST • Originally called Rijndael • Symmetric key block substitution cipher • Replacement for DES • Successful field testing since inception • Three bit-modes • State defined as a 4x4 array of 16 bytes • Key size is either 16,24, or 32 bytes • A byte is represented by Galois polynomials

  4. Stages of AES Algorithm: Kn Result from round n-1 Pass to round n+1 ByteSub Shift Row MixColumn AddRoundKey Detailed view of round n • Each round performs the following operations: • Non-linear Layer: No linear relationship between the input and output of a round • Linear Mixing Layer: Guarantees high diffusion over multiple rounds • Very small correlation between bytes of the round input and the bytes of the output • Key Addition Layer: Bytes of the input are simply EXOR’ed with the expanded round key

  5. 1. SubBytes Function • Affine Transformation in GF (28) • Direct implementation is complex • Easily performed by a 16 x 16 LUT ROM • Simple byte substitution • Combinational logic Each byte at the input of a round undergoes a non-linear byte substitution according to the following transform Substitution (“S”)-box

  6. 2. Shift Row • Shifting done only on the bottom three rows of the State • Left rotate for encryption • Right rotate for decryption Depending on the block length, each “row” of the block is cyclically shifted according to the above table

  7. 3. MixColumns Function • Matrix multiplication in GF (28) • MixColumns functionality resides primarily in the controller and instruction memory • A series of conditional XOR and left shift operations Each column is multiplied by a fixed polynomial C(x) = ’03’*X3 + ’01’*X2 + ’01’*X + ’02’ This corresponds to matrix multiplication b(x) = c(x) a(x):

  8. 4. Key Expansion and Addition • Performed before both the encrypt and decrypt process • Byte values from the Key are read and manipulated into the RoundKey • A series of SubBytes and XOR operations with RCON ROM values and the Key • Performs XOR operation between the State and the Roundkey • This is the only function without an inverse Each word is simply EXOR’ed with the expanded round key

  9. IXP2400 Platform: A Quick Look • achieve high processing performance • programming flexibility • Cheaper than ASIC

  10. Microcode Overview • alu [ dest1, a, +, b] ALU addition of a and b and storing in dest1 • alu [ dest2, dest1, -, c] ALU subtraction • Move(reg1, reg2)  Moving from one reg1 to reg2 ; both are gprs. • Immed[reg, ox0020]  Immediate value assignment to register • local_csr_wr[ACTIVE_LM_ADDR_0, 0x0]  Local memory indexing with index0 • .begin … endm  Macro begin and end • .if … .endif  If loop • xbuf_alloc ($$state, 4, read)  buffer allocation in DRAM transfer register • .reg gen_regiater $sram_reg $$dram_reg  Register declaration • .sig sram_sig dram_sig  signal declaration • .while … .endw  While looping • #for round[1,2,3,4,5,6,7,8,9,10] … #endloop  For looping • alu_shf[index, --, B, s0, >>24]  Alu shift function of B • scratch[read, $T, index, 0, 1], ctx_swap[sram_sig]  scratch read instruction • ld_field_w_clr[t1, 1000, $T]  Performs a write to t1 register • dram[write, $$out[0], dst_addr, 0, 2], sig_done[dram_sig]  Dram write • ctx_arb[dram_sig], ctx_arb[kill]  signaling

  11. Implementation Setup • Environmental Setup: • Intel IXP 4.1 • 600MHz ME configurations • 200-MHz SRAMs • 150-MHz RDRAMs • Executed in Multi threads • Executed in Different Micro Engines

  12. Experimental Results(1) SRAM Utilization ME utilization %

  13. Experimental Results(2) Throughput Performance Across Threads in 1 ME Throughput Performance Across Threads in 1 ME

  14. Crypto Unit of IXP2850

  15. Intel IXP2850 Encryption Data Flow

  16. Crypto Unit Overview

  17. Simple Encrypt Example

  18. Simple Encrypt and Hash Example

  19. 3DES Core 􀁹2 Cores per crypto unit • 􀁹Takes 192-bit key • –(56-bit + 8-bit parity) x 3Keys • 􀁹Operates on 8-byte blocks • 􀁹Result is written to ME transfer registers or TBUF element • 􀁹Result can be passed to the SHA-1 unit for hashing Security Processing, pipelining, and interleaving using three wires and one core Multiple keys and IVs

  20. AES Core • 􀁹All AES key sizes are supported • –(128, 192, or 256) • Both Encryption and Decryption supported • 􀁹Operates on 16 byte blocks AES Key Scheduler

  21. SHA1 Core • 2 SHA-1 cores per crypto unitOperates on 64-byte blocks • Data is loaded from Input RAM or Crypto cores into the SHA-1 buffer • Can perform on unmodified packet data or on the ciphered packet data • Operates on 512 bit block size and has a data buffer to accumulate the ciphered data • This gives flexibility to run SHA and AES, 3DES at different rates. SHA1 Critical Path Analysis

  22. Some of The Crypto Commands • crypto_write_ram($$orig_plain_text[0],DATA_RAM_ADDR,8,ENCRYPT_UNIT, ram_sig)  Perform and wait for the write • crypto_load_iv($$iv[0], 1,ENCRYPT_UNIT,CRYPTO_BANK, ENCRYPT_STATE, iv_sig)  Loading IV Data • crypto_load_key($$key[0],3,ENCRYPT_UNIT,CRYPTO_BANK,ENCRYPT_STATE,key_sig)  Loading Key • crypto_cipher($$encrypt_data[0],DATA_RAM_ADDR,8,CRYPTO_CIPHER_ENCRYPT,CRYPTO_CIPHER_NO_CBC, CRYPTO_CIPHER_3DES, ENCRYPT_UNIT,CRYPTO_BANK, ENCRYPT_STATE, cipher_sig)

  23. Acknowledgement • Yan Luo • Chris Baron • http://cnscenter.future.co.kr/resource/rsc-center/presentation/intel/spring2003/S03USCPTS92_OS.pdf ( For some slides) • Mel Tsai; UC Berkeley (For some slides) • Thomas Sodon et al, EE College of NewJersey • Zhangxi Tan et al, Tsinghua University

  24. Q……………?

More Related