AES Microcode Implementation In IXP2400 And A study of Reconfigurable Crypto Unit

AES Microcode Implementation In IXP2400 And A study ofReconfigurable Crypto Unit Piyush Ranjan Satapathy CS203B Class Project Presentation

Road Map • AES Algorithm Overview • IXP2400 Platform: A Quick Look • Microcode: Overview • Implementation of AES • Experimental Results • Reconfigurable Crypto unit of Intel IXP2850

Algorithm Overview • Designed by Daemen and Rijmen for the NIST • Originally called Rijndael • Symmetric key block substitution cipher • Replacement for DES • Successful field testing since inception • Three bit-modes • State defined as a 4x4 array of 16 bytes • Key size is either 16,24, or 32 bytes • A byte is represented by Galois polynomials

Stages of AES Algorithm: Kn Result from round n-1 Pass to round n+1 ByteSub Shift Row MixColumn AddRoundKey Detailed view of round n • Each round performs the following operations: • Non-linear Layer: No linear relationship between the input and output of a round • Linear Mixing Layer: Guarantees high diffusion over multiple rounds • Very small correlation between bytes of the round input and the bytes of the output • Key Addition Layer: Bytes of the input are simply EXOR’ed with the expanded round key

1. SubBytes Function • Affine Transformation in GF (28) • Direct implementation is complex • Easily performed by a 16 x 16 LUT ROM • Simple byte substitution • Combinational logic Each byte at the input of a round undergoes a non-linear byte substitution according to the following transform Substitution (“S”)-box

2. Shift Row • Shifting done only on the bottom three rows of the State • Left rotate for encryption • Right rotate for decryption Depending on the block length, each “row” of the block is cyclically shifted according to the above table

3. MixColumns Function • Matrix multiplication in GF (28) • MixColumns functionality resides primarily in the controller and instruction memory • A series of conditional XOR and left shift operations Each column is multiplied by a fixed polynomial C(x) = ’03’*X3 + ’01’*X2 + ’01’*X + ’02’ This corresponds to matrix multiplication b(x) = c(x) a(x):

4. Key Expansion and Addition • Performed before both the encrypt and decrypt process • Byte values from the Key are read and manipulated into the RoundKey • A series of SubBytes and XOR operations with RCON ROM values and the Key • Performs XOR operation between the State and the Roundkey • This is the only function without an inverse Each word is simply EXOR’ed with the expanded round key

IXP2400 Platform: A Quick Look • achieve high processing performance • programming flexibility • Cheaper than ASIC

Microcode Overview • alu [ dest1, a, +, b] ALU addition of a and b and storing in dest1 • alu [ dest2, dest1, -, c] ALU subtraction • Move(reg1, reg2)  Moving from one reg1 to reg2 ; both are gprs. • Immed[reg, ox0020]  Immediate value assignment to register • local_csr_wr[ACTIVE_LM_ADDR_0, 0x0]  Local memory indexing with index0 • .begin … endm  Macro begin and end • .if … .endif  If loop • xbuf_alloc ($$state, 4, read)  buffer allocation in DRAM transfer register • .reg gen_regiater $sram_reg $$dram_reg  Register declaration • .sig sram_sig dram_sig  signal declaration • .while … .endw  While looping • #for round[1,2,3,4,5,6,7,8,9,10] … #endloop  For looping • alu_shf[index, --, B, s0, >>24]  Alu shift function of B • scratch[read, $T, index, 0, 1], ctx_swap[sram_sig]  scratch read instruction • ld_field_w_clr[t1, 1000, $T]  Performs a write to t1 register • dram[write, $$out[0], dst_addr, 0, 2], sig_done[dram_sig]  Dram write • ctx_arb[dram_sig], ctx_arb[kill]  signaling

Implementation Setup • Environmental Setup: • Intel IXP 4.1 • 600MHz ME configurations • 200-MHz SRAMs • 150-MHz RDRAMs • Executed in Multi threads • Executed in Different Micro Engines

Experimental Results(1) SRAM Utilization ME utilization %

Experimental Results(2) Throughput Performance Across Threads in 1 ME Throughput Performance Across Threads in 1 ME

Crypto Unit of IXP2850

Intel IXP2850 Encryption Data Flow

Crypto Unit Overview

Simple Encrypt Example

Simple Encrypt and Hash Example

3DES Core 􀁹2 Cores per crypto unit • 􀁹Takes 192-bit key • –(56-bit + 8-bit parity) x 3Keys • 􀁹Operates on 8-byte blocks • 􀁹Result is written to ME transfer registers or TBUF element • 􀁹Result can be passed to the SHA-1 unit for hashing Security Processing, pipelining, and interleaving using three wires and one core Multiple keys and IVs

AES Core • 􀁹All AES key sizes are supported • –(128, 192, or 256) • Both Encryption and Decryption supported • 􀁹Operates on 16 byte blocks AES Key Scheduler

SHA1 Core • 2 SHA-1 cores per crypto unitOperates on 64-byte blocks • Data is loaded from Input RAM or Crypto cores into the SHA-1 buffer • Can perform on unmodified packet data or on the ciphered packet data • Operates on 512 bit block size and has a data buffer to accumulate the ciphered data • This gives flexibility to run SHA and AES, 3DES at different rates. SHA1 Critical Path Analysis

Some of The Crypto Commands • crypto_write_ram($$orig_plain_text[0],DATA_RAM_ADDR,8,ENCRYPT_UNIT, ram_sig)  Perform and wait for the write • crypto_load_iv($$iv[0], 1,ENCRYPT_UNIT,CRYPTO_BANK, ENCRYPT_STATE, iv_sig)  Loading IV Data • crypto_load_key($$key[0],3,ENCRYPT_UNIT,CRYPTO_BANK,ENCRYPT_STATE,key_sig)  Loading Key • crypto_cipher($$encrypt_data[0],DATA_RAM_ADDR,8,CRYPTO_CIPHER_ENCRYPT,CRYPTO_CIPHER_NO_CBC, CRYPTO_CIPHER_3DES, ENCRYPT_UNIT,CRYPTO_BANK, ENCRYPT_STATE, cipher_sig)

Acknowledgement • Yan Luo • Chris Baron • http://cnscenter.future.co.kr/resource/rsc-center/presentation/intel/spring2003/S03USCPTS92_OS.pdf ( For some slides) • Mel Tsai; UC Berkeley (For some slides) • Thomas Sodon et al, EE College of NewJersey • Zhangxi Tan et al, Tsinghua University

Q……………?

AES Microcode Implementation In IXP2400 And A study of Reconfigurable Crypto Unit

AES Microcode Implementation In IXP2400 And A study of Reconfigurable Crypto Unit

Presentation Transcript

Efficient Software Implementation of AES on 32-bit Platforms

Efficient Implementation of a String Matching Algorithm for SRC and Cray Reconfigurable Computers

IXP2400 Protocol Offloading

Design of a Reconfigurable Hardware

RTI Implementation in Karnataka: A Study

FPGA Implementation of Multicore AES 128/192/256

Implementation and impact of NQFs: Report of a study in 16 countries

AES in C++

AES Implementation EE 370 Project

Microcode

An Implementation of the Discrete Fourier Transform on a Reconfigurable Processor

Implementation of IDEA on a Reconfigurable Computer

A Reconfigurable Functional Unit for Adaptable Custom Instructions

ISO IMPLEMENTATION - A CASE STUDY

AES in CMS

Characteristics of AES

VLSI Implementation of Reconfigurable Cells for RFU in Embedded Processors

Implementation of a case study

FPGA Implementation of Multicore AES 128/192/256

Implementation of Rotation and Vectoring Mode Reconfigurable CORDIC

Microcode

Implementation and impact of NQFs: Report of a study in 16 countries