aes microcode implementation in ixp2400 and a study of reconfigurable crypto unit
Download
Skip this Video
Download Presentation
AES Microcode Implementation In IXP2400 And A study of Reconfigurable Crypto Unit

Loading in 2 Seconds...

play fullscreen
1 / 24

AES Microcode Implementation In IXP2400 And A study of ... - PowerPoint PPT Presentation


  • 194 Views
  • Uploaded on

AES Microcode Implementation In IXP2400 And A study of Reconfigurable Crypto Unit. Piyush Ranjan Satapathy CS203B Class Project Presentation. Road Map. AES Algorithm Overview IXP2400 Platform: A Quick Look Microcode: Overview Implementation of AES Experimental Results

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'AES Microcode Implementation In IXP2400 And A study of ...' - omer


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
aes microcode implementation in ixp2400 and a study of reconfigurable crypto unit

AES Microcode Implementation In IXP2400 And A study ofReconfigurable Crypto Unit

Piyush Ranjan Satapathy

CS203B Class Project

Presentation

road map
Road Map
  • AES Algorithm Overview
  • IXP2400 Platform: A Quick Look
  • Microcode: Overview
  • Implementation of AES
  • Experimental Results
  • Reconfigurable Crypto unit of Intel IXP2850
algorithm overview
Algorithm Overview
  • Designed by Daemen and Rijmen for the NIST
  • Originally called Rijndael
  • Symmetric key block substitution cipher
  • Replacement for DES
  • Successful field testing since inception
  • Three bit-modes
  • State defined as a 4x4 array of 16 bytes
  • Key size is either 16,24, or 32 bytes
  • A byte is represented by Galois polynomials
stages of aes algorithm
Stages of AES Algorithm:

Kn

Result from round n-1

Pass to

round n+1

ByteSub

Shift Row

MixColumn

AddRoundKey

Detailed view of round n

  • Each round performs the following operations:
    • Non-linear Layer: No linear relationship between the input and output of a round
    • Linear Mixing Layer: Guarantees high diffusion over multiple rounds
      • Very small correlation between bytes of the round input and the bytes of the output
    • Key Addition Layer: Bytes of the input are simply EXOR’ed with the expanded round key
1 subbytes function
1. SubBytes Function
  • Affine Transformation in GF (28)
  • Direct implementation is complex
  • Easily performed by a 16 x 16 LUT ROM
    • Simple byte substitution
    • Combinational logic

Each byte at the input of a round undergoes a

non-linear byte substitution according to the following transform

Substitution (“S”)-box

2 shift row
2. Shift Row
  • Shifting done only on the bottom three rows of the State
  • Left rotate for encryption
  • Right rotate for decryption

Depending on the block length, each “row” of the

block is cyclically shifted according to the above table

3 mixcolumns function
3. MixColumns Function
  • Matrix multiplication in GF (28)
  • MixColumns functionality resides primarily in the controller and instruction memory
  • A series of conditional XOR and left shift operations

Each column is multiplied by a fixed polynomial

C(x) = ’03’*X3 + ’01’*X2 + ’01’*X + ’02’

This corresponds to matrix multiplication b(x) = c(x) a(x):

4 key expansion and addition
4. Key Expansion and Addition
  • Performed before both the encrypt and decrypt process
  • Byte values from the Key are read and manipulated into the RoundKey
  • A series of SubBytes and XOR operations with RCON ROM values and the Key
  • Performs XOR operation between the State and the Roundkey
  • This is the only function without an inverse

Each word is simply EXOR’ed with the expanded round key

ixp2400 platform a quick look
IXP2400 Platform: A Quick Look
      • achieve high processing performance
  • programming flexibility
  • Cheaper than ASIC
microcode overview
Microcode Overview
  • alu [ dest1, a, +, b] ALU addition of a and b and storing in dest1
  • alu [ dest2, dest1, -, c] ALU subtraction
  • Move(reg1, reg2)  Moving from one reg1 to reg2 ; both are gprs.
  • Immed[reg, ox0020]  Immediate value assignment to register
  • local_csr_wr[ACTIVE_LM_ADDR_0, 0x0]  Local memory indexing with index0
  • .begin … endm  Macro begin and end
  • .if … .endif  If loop
  • xbuf_alloc ($$state, 4, read)  buffer allocation in DRAM transfer register
  • .reg gen_regiater $sram_reg $$dram_reg  Register declaration
  • .sig sram_sig dram_sig  signal declaration
  • .while … .endw  While looping
  • #for round[1,2,3,4,5,6,7,8,9,10] … #endloop  For looping
  • alu_shf[index, --, B, s0, >>24]  Alu shift function of B
  • scratch[read, $T, index, 0, 1], ctx_swap[sram_sig]  scratch read instruction
  • ld_field_w_clr[t1, 1000, $T]  Performs a write to t1 register
  • dram[write, $$out[0], dst_addr, 0, 2], sig_done[dram_sig]  Dram write
  • ctx_arb[dram_sig], ctx_arb[kill]  signaling
implementation setup
Implementation Setup
  • Environmental Setup:
  • Intel IXP 4.1
  • 600MHz ME configurations
  • 200-MHz SRAMs
  • 150-MHz RDRAMs
  • Executed in Multi threads
  • Executed in Different Micro Engines
experimental results 1
Experimental Results(1)

SRAM Utilization

ME utilization %

experimental results 2
Experimental Results(2)

Throughput Performance

Across Threads in 1 ME

Throughput Performance

Across Threads in 1 ME

3des core
3DES Core

􀁹2 Cores per crypto unit

  • 􀁹Takes 192-bit key
    • –(56-bit + 8-bit parity) x 3Keys
  • 􀁹Operates on 8-byte blocks
  • 􀁹Result is written to ME transfer registers or TBUF element
  • 􀁹Result can be passed to the SHA-1 unit for hashing

Security Processing, pipelining, and interleaving using three wires and one core

Multiple keys and IVs

aes core
AES Core
  • 􀁹All AES key sizes are supported
      • –(128, 192, or 256)
      • Both Encryption and Decryption supported
      • 􀁹Operates on 16 byte blocks

AES Key Scheduler

sha1 core
SHA1 Core
  • 2 SHA-1 cores per crypto unitOperates on 64-byte blocks
  • Data is loaded from Input RAM or Crypto cores into the SHA-1 buffer
  • Can perform on unmodified packet data or on the ciphered packet data
  • Operates on 512 bit block size and has a data buffer to accumulate the ciphered data
  • This gives flexibility to run SHA and AES, 3DES at different rates.

SHA1 Critical Path Analysis

some of the crypto commands
Some of The Crypto Commands
  • crypto_write_ram($$orig_plain_text[0],DATA_RAM_ADDR,8,ENCRYPT_UNIT, ram_sig)  Perform and wait for the write
  • crypto_load_iv($$iv[0], 1,ENCRYPT_UNIT,CRYPTO_BANK, ENCRYPT_STATE, iv_sig)  Loading IV Data
  • crypto_load_key($$key[0],3,ENCRYPT_UNIT,CRYPTO_BANK,ENCRYPT_STATE,key_sig)  Loading Key
  • crypto_cipher($$encrypt_data[0],DATA_RAM_ADDR,8,CRYPTO_CIPHER_ENCRYPT,CRYPTO_CIPHER_NO_CBC, CRYPTO_CIPHER_3DES, ENCRYPT_UNIT,CRYPTO_BANK, ENCRYPT_STATE, cipher_sig)
acknowledgement
Acknowledgement
  • Yan Luo
  • Chris Baron
  • http://cnscenter.future.co.kr/resource/rsc-center/presentation/intel/spring2003/S03USCPTS92_OS.pdf ( For some slides)
  • Mel Tsai; UC Berkeley (For some slides)
  • Thomas Sodon et al, EE College of NewJersey
  • Zhangxi Tan et al, Tsinghua University
ad