Undergrad team u8 jk flipflop clark cianfarini and garrett smith
This presentation is the property of its rightful owner.
Sponsored Links
1 / 17

MISTY1 Block Cipher PowerPoint PPT Presentation


  • 143 Views
  • Uploaded on
  • Presentation posted in: General

Undergrad Team U8 – JK FlipFlop Clark Cianfarini and Garrett Smith. MISTY1 Block Cipher. What is MISTY1?. Cryptographic block cipher Developed by Mitsubishi Electric Created in 1995 Developed primarily for encryption on mobile phones and other mobile devices

Download Presentation

MISTY1 Block Cipher

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Undergrad team u8 jk flipflop clark cianfarini and garrett smith

Undergrad Team U8 – JK FlipFlop

Clark Cianfarini and Garrett Smith

MISTY1 Block Cipher


What is misty1

What is MISTY1?

Cryptographic block cipher

Developed by Mitsubishi Electric

Created in 1995

Developed primarily for encryption on mobile phones and other mobile devices

Stands for: Mitsubishi Improved Security TechnologY


Technical specs

Technical Specs

  • Feistel Network

  • 64-bit block size

  • 128-bit key

  • Rounds in multiples of 4 (4, 8, 12, 16, …)

  • RFC 2994

Picture from:

http://web.archive.org/web/20000823133547/http://www.mitsubishi.com/ghp_japan/misty/misty_e_b.pdf


Our original implementation

Our Original Implementation

8 rounds; the standard

128-bit key and 64-bit data as hexadecimal inputs (command line arguments)

Encrypt and decrypt functionality both implemented (as well as performing both consecutively for benchmarking)


Original unoptimized design

Original (Unoptimized) Design

Designed for code size and clarity

Written in C

Only standard libraries used

Inefficiencies in: loops, multiplies and divides, function calls, parameter passing

Usage: ./misty <e|d|b> <K> <M> [I]

'e' to encrypt, 'd' to decrypt, 'b' to test both

K is a required 16-digit hex string (128 bits)

M is a required 8-digit hex string (64 bits)

I is an optional number of iterations for benchmarking


Original design gprof profile

Original Design GPROF Profile

% cumulative self self total

time seconds seconds calls us/call us/call name

45.57 10.65 10.65 560000000 0.02 0.02 fi

19.80 15.28 4.63 160000000 0.03 0.09 fo

7.63 17.06 1.78 100000000 0.02 0.02 fl

6.94 18.69 1.62 100000000 0.02 0.02 flinv

5.25 19.91 1.23 10000000 0.12 0.27 key_schedule

3.13 20.65 0.73 10000000 0.07 1.01 decrypt_block

3.06 21.36 0.72 10000000 0.07 1.03 encrypt_block

2.44 21.93 0.57 20000000 0.03 0.03 unpack_data

1.54 22.29 0.36 50000000 0.01 0.04 decrypt_round_even

1.33 22.60 0.31 40000000 0.01 0.13 encrypt_round_even

1.03 22.85 0.24 40000000 0.01 0.18 decrypt_round_odd

0.96 23.07 0.23 __gmon_start__

0.86 23.27 0.20 40000000 0.01 0.09 encrypt_round_odd

0.34 23.35 0.08 10000000 0.01 0.04 encrypt_final

0.21 23.40 0.05 main

0.00 23.40 0.00 48 0.00 0.00 xtoi

0.00 23.40 0.00 4 0.00 0.00 print_hex_data

0.00 23.40 0.00 2 0.00 0.00 parse_hex_arg

  • 80% of the time spent in FO/FI/FL/FLINV

  • Compiled with gcc-4.3.4

  • Benchmarked on 64-bit Core2 @ 2.4 GHz, linux-2.6.33


Unoptimized execution time

Unoptimized Execution Time

gcc misty_slow.c -o slow

time ./slow b 00112233445566778899aabbccddeeff 0123456789abcdef 10000000

real 0m23.093suser 0m22.886ssys 0m0.031s

10 million iterations, 2.31 µs per iteration (~ 1.15 µs per encryption and decryption)


Revised software design

Revised Software Design

Designed for optimal performance

Loops unrolled (rounds, d0/d1 pack)

Pow-2 mul, div, mod → shift, and

Functions inlined

Reduced parameter passing (key)

Compiler optimization levels enabled

Compiler architecture-specific options enabled


Rounds before unrolling

Rounds: Before Unrolling

for (i = 0; i < NUM_ROUNDS; i++)

{

if (i == (NUM_ROUNDS - 1))

encrypt_final(i, &d0, &d1, ek);

else if ((i % 2) == 0)

encrypt_round_even(i, &d0, &d1, ek);

else

encrypt_round_odd(i, &d0, &d1, ek);

}


Rounds after unrolling

Rounds: After Unrolling

// round 4

d0 = fl(d0, 4);

d1 = fl(d1, 5);

d1 = d1 ^ fo(d0, 4);

// round 5

d0 = d0 ^ fo(d1, 5);

// round 6

d0 = fl(d0, 6);

d1 = fl(d1, 7);

d1 = d1 ^ fo(d0, 6);

// round 7

d0 = d0 ^ fo(d1, 7);

// finalize

d0 = fl(d0, 8);

d1 = fl(d1, 9);

// round 0

d0 = fl(d0, 0);

d1 = fl(d1, 1);

d1 = d1 ^ fo(d0, 0);

// round 1

d0 = d0 ^ fo(d1, 1);

// round 2

d0 = fl(d0, 2);

d1 = fl(d1, 3);

d1 = d1 ^ fo(d0, 2);

// round 3

d0 = d0 ^ fo(d1, 3);


Execution time and speedup

Execution Time and Speedup

Description Time Speedup

Slow / Initial 0m23.093s 1.00000

Unroll Rounds 0m21.573s 1.07046

Unroll D0/D1 Init 0m20.750s 1.11292

Shift and AND 0m18.978s 1.21683

Unroll Packing 0m18.135s 1.27339

Make EK Global 0m17.902s 1.28997

Inline F0/FI/FL 0m15.921s 1.45047

Enable O1 0m4.308s 5.36049

Enable O2 0m4.276s 5.40061

Enable O3 0m4.155s 5.55654

Architecture Flags 0m4.128s 5.59423


Building and testing the optimized implementation

Building and Testing the Optimized Implementation

gccmisty_fast.c -o fast

gccmisty_fast.c -o fast -O1

gccmisty_fast.c -o fast -O2

gccmisty_fast.c -o fast -O3

gccmisty_fast.c -o fast -O3 -march=core2

Fastest execution time:real 0m4.128suser 0m4.117ssys 0m0.007s

10 million iterations, 413 ns per iteration


Execution time and speedup1

Execution Time and Speedup


Final design gprof profile

Final Design GPROF Profile

% cumulative self self total

time seconds seconds calls ns/call ns/call name

42.99 2.26 2.26 10000000 226.15 226.15 decrypt_block

41.57 4.45 2.19 10000000 218.65 218.65 encrypt_block

15.41 5.26 0.81 main

0.00 5.26 0.00 4 0.00 0.00 print_hex_data

0.00 5.26 0.00 2 0.00 0.00 parse_hex_arg

  • Most function calls inlined, only decrypt_block and encrypt_block remain


What was learned

What was Learned?

Original implementation may not have been implemented all that badly (~1.5 speedup from manual implementations)

Larger benefit from instruction level optimization (gcc)

Profile first, then optimize in places where it actually matters

Bit-wise AND operator lower precedence than modulus:

x % y + z → (x % y) + z

x & y + z → x & (y + z)

All optimizations add up to a significant amount of savings


Future work

Future Work

Use of SSE vector instructions for parallel operations

Data types such as uint8_t/uint16_t converted to natural integer size for better memory alignment and access performance

Use of a union to replace packing and unpacking of data from array to D0/D1

Written directly in optimized assembly

Dedicated hardware implementation (ASIC/FPGA) for MISTY1 (originally designed to be implemented in hardware)


Questions

Questions?

?


  • Login