Aes microcode implementation in ixp2400 and a study of reconfigurable crypto unit
1 / 24

AES Microcode Implementation In IXP2400 And A study of ... - PowerPoint PPT Presentation

  • Uploaded on

AES Microcode Implementation In IXP2400 And A study of Reconfigurable Crypto Unit. Piyush Ranjan Satapathy CS203B Class Project Presentation. Road Map. AES Algorithm Overview IXP2400 Platform: A Quick Look Microcode: Overview Implementation of AES Experimental Results

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'AES Microcode Implementation In IXP2400 And A study of ...' - omer

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Aes microcode implementation in ixp2400 and a study of reconfigurable crypto unit

AES Microcode Implementation In IXP2400 And A study ofReconfigurable Crypto Unit

Piyush Ranjan Satapathy

CS203B Class Project


Road map
Road Map

  • AES Algorithm Overview

  • IXP2400 Platform: A Quick Look

  • Microcode: Overview

  • Implementation of AES

  • Experimental Results

  • Reconfigurable Crypto unit of Intel IXP2850

Algorithm overview
Algorithm Overview

  • Designed by Daemen and Rijmen for the NIST

  • Originally called Rijndael

  • Symmetric key block substitution cipher

  • Replacement for DES

  • Successful field testing since inception

  • Three bit-modes

  • State defined as a 4x4 array of 16 bytes

  • Key size is either 16,24, or 32 bytes

  • A byte is represented by Galois polynomials

Stages of aes algorithm
Stages of AES Algorithm:


Result from round n-1

Pass to

round n+1


Shift Row



Detailed view of round n

  • Each round performs the following operations:

    • Non-linear Layer: No linear relationship between the input and output of a round

    • Linear Mixing Layer: Guarantees high diffusion over multiple rounds

      • Very small correlation between bytes of the round input and the bytes of the output

    • Key Addition Layer: Bytes of the input are simply EXOR’ed with the expanded round key

1 subbytes function
1. SubBytes Function

  • Affine Transformation in GF (28)

  • Direct implementation is complex

  • Easily performed by a 16 x 16 LUT ROM

    • Simple byte substitution

    • Combinational logic

Each byte at the input of a round undergoes a

non-linear byte substitution according to the following transform

Substitution (“S”)-box

2 shift row
2. Shift Row

  • Shifting done only on the bottom three rows of the State

  • Left rotate for encryption

  • Right rotate for decryption

Depending on the block length, each “row” of the

block is cyclically shifted according to the above table

3 mixcolumns function
3. MixColumns Function

  • Matrix multiplication in GF (28)

  • MixColumns functionality resides primarily in the controller and instruction memory

  • A series of conditional XOR and left shift operations

Each column is multiplied by a fixed polynomial

C(x) = ’03’*X3 + ’01’*X2 + ’01’*X + ’02’

This corresponds to matrix multiplication b(x) = c(x) a(x):

4 key expansion and addition
4. Key Expansion and Addition

  • Performed before both the encrypt and decrypt process

  • Byte values from the Key are read and manipulated into the RoundKey

  • A series of SubBytes and XOR operations with RCON ROM values and the Key

  • Performs XOR operation between the State and the Roundkey

  • This is the only function without an inverse

Each word is simply EXOR’ed with the expanded round key

Ixp2400 platform a quick look
IXP2400 Platform: A Quick Look

  • achieve high processing performance

  • programming flexibility

  • Cheaper than ASIC

  • Microcode overview
    Microcode Overview

    • alu [ dest1, a, +, b] ALU addition of a and b and storing in dest1

    • alu [ dest2, dest1, -, c] ALU subtraction

    • Move(reg1, reg2)  Moving from one reg1 to reg2 ; both are gprs.

    • Immed[reg, ox0020]  Immediate value assignment to register

    • local_csr_wr[ACTIVE_LM_ADDR_0, 0x0]  Local memory indexing with index0

    • .begin … endm  Macro begin and end

    • .if … .endif  If loop

    • xbuf_alloc ($$state, 4, read)  buffer allocation in DRAM transfer register

    • .reg gen_regiater $sram_reg $$dram_reg  Register declaration

    • .sig sram_sig dram_sig  signal declaration

    • .while … .endw  While looping

    • #for round[1,2,3,4,5,6,7,8,9,10] … #endloop  For looping

    • alu_shf[index, --, B, s0, >>24]  Alu shift function of B

    • scratch[read, $T, index, 0, 1], ctx_swap[sram_sig]  scratch read instruction

    • ld_field_w_clr[t1, 1000, $T]  Performs a write to t1 register

    • dram[write, $$out[0], dst_addr, 0, 2], sig_done[dram_sig]  Dram write

    • ctx_arb[dram_sig], ctx_arb[kill]  signaling

    Implementation setup
    Implementation Setup

    • Environmental Setup:

    • Intel IXP 4.1

    • 600MHz ME configurations

    • 200-MHz SRAMs

    • 150-MHz RDRAMs

    • Executed in Multi threads

    • Executed in Different Micro Engines

    Experimental results 1
    Experimental Results(1)

    SRAM Utilization

    ME utilization %

    Experimental results 2
    Experimental Results(2)

    Throughput Performance

    Across Threads in 1 ME

    Throughput Performance

    Across Threads in 1 ME

    3des core
    3DES Core

    􀁹2 Cores per crypto unit

    • 􀁹Takes 192-bit key

      • –(56-bit + 8-bit parity) x 3Keys

    • 􀁹Operates on 8-byte blocks

    • 􀁹Result is written to ME transfer registers or TBUF element

    • 􀁹Result can be passed to the SHA-1 unit for hashing

    Security Processing, pipelining, and interleaving using three wires and one core

    Multiple keys and IVs

    Aes core
    AES Core

    • 􀁹All AES key sizes are supported

      • –(128, 192, or 256)

      • Both Encryption and Decryption supported

      • 􀁹Operates on 16 byte blocks

    AES Key Scheduler

    Sha1 core
    SHA1 Core

    • 2 SHA-1 cores per crypto unitOperates on 64-byte blocks

    • Data is loaded from Input RAM or Crypto cores into the SHA-1 buffer

    • Can perform on unmodified packet data or on the ciphered packet data

    • Operates on 512 bit block size and has a data buffer to accumulate the ciphered data

    • This gives flexibility to run SHA and AES, 3DES at different rates.

    SHA1 Critical Path Analysis

    Some of the crypto commands
    Some of The Crypto Commands

    • crypto_write_ram($$orig_plain_text[0],DATA_RAM_ADDR,8,ENCRYPT_UNIT, ram_sig)  Perform and wait for the write

    • crypto_load_iv($$iv[0], 1,ENCRYPT_UNIT,CRYPTO_BANK, ENCRYPT_STATE, iv_sig)  Loading IV Data

    • crypto_load_key($$key[0],3,ENCRYPT_UNIT,CRYPTO_BANK,ENCRYPT_STATE,key_sig)  Loading Key



    • Yan Luo

    • Chris Baron

    • ( For some slides)

    • Mel Tsai; UC Berkeley (For some slides)

    • Thomas Sodon et al, EE College of NewJersey

    • Zhangxi Tan et al, Tsinghua University