1 / 37

Cryptography on Graphics Processors

Cryptography on Graphics Processors. Francisco Chinchilla COMP 290-GPGP Presentation. Papers. “Brute Force Attack on UNIX Passwords with SIMD Computer” Gershon Kedem and Yuriko Ishihara, Computer Science Department, Duke University. Usenix ’99 Security Symposium

Samuel
Download Presentation

Cryptography on Graphics Processors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cryptography on Graphics Processors Francisco Chinchilla COMP 290-GPGP Presentation

  2. Papers • “Brute Force Attack on UNIX Passwords with SIMD Computer” • Gershon Kedem and Yuriko Ishihara, Computer Science Department, Duke University. Usenix ’99 Security Symposium • "CryptoGraphics: Secret Key Cryptography Using Graphics Cards" • Debra L. Cook, John Ioannidis, Angelos D. Keromytis, and Jake Luck, Columbia University. To appear in the Proceedings of the RSA Conference, Cryptographer's Track (CT-RSA). February 2005, San Francisco, CA. • Remotely-Keyed Cryptographics • Debra L. Cook, Ricardo Baratto, and Angelos D. Keromytis, Columbia University.

  3. Outline • Quick Tour of Cryptography • Brute force password hacking using PixelFlow • Implementing Cryptographic systems on a GPU • Using GPU based cryptographic systems in remote keyed systems in unfriendly environments • Conclusions

  4. Cryptography Basics • Two functions of two variables • E(keye, text) encrypts • D(keyd, E(keye, text)) decrypts • If keye==keyd cryptosystem is called symmetric, otherwise it’s asymmetric • We’ll deal with symmetric

  5. Cipher Types • Block Cipher • Encrypts fixed-size blocks of data • Stream Cipher • Encrypts input data one bit (sometimes one byte) at a time • Useful for data with unknown length • Stream ciphers are faster in hardware, but slower in software than block ciphers • Both use XOR heavily

  6. Attacks to Break Encryption • Techniques used • Differential • Linear • Brute Force • Underlying methods • Chosen Plaintext • Ciphertext-only • Known Plaintext • Goals • Find a password • Find the key

  7. DES • Block cipher • Developed by IBM, made a FIPS in ‘76 • Uses only 56-bit keys, not 64, not 128, possibly due to NSA • Designed against differential cryptoanalysis, a ‘new’ (not to IBM nor NSA) field of cryptography in the late 1980’s. Takes 2^47 chosen plaintexts to break. • Not designed against linear cryptoanalysis. Takes 2^39-2^44 known plaintexts • Broken in 56 hours by EFF $250k machine in 1998

  8. DES Encryption Details Key Schedule The Feistel (F) function Subkey(48 bits)

  9. DES – Other Boxes • http://en.wikipedia.org/wiki/Substitution_box • http://en.wikipedia.org/wiki/DES_supplementary_material

  10. Crypt • A variation of the DES algorithm • Used to encrypt Unix passwords • Sample invocations: • crypt(password)=pajXG3p8vEaok • crypt(God)=GoqTni0y.7CH2 • crypt(sex)=se8lvCoF9FQE. • crypt(money)=moIzVjU7RTRr6 • crypt(love)=loYMPnrPlj9gk

  11. AES • Competition to replace DES FIPS • Competition won by Rijndael • AES is fast in both SW and HW • Relatively easy to implement • Requires little memory

  12. AES 128-bit Details • AES operates on a state, a 4×4 array of bytes. • For encryption, each round of a total of 10 consists of four stages: • SubBytes — a non-linear substitution step where each byte is replaced with another according to a lookup table, an S-block. This is the non-linear step. • ShiftRows — a transposition step where each row of the state is shifted cyclically a certain number of steps. • MixColumns — a mixing operation which operates on the columns of the state, combining the four bytes in each column using a linear transformation. This is the “Diffusion” step and is omitted in the final round! • AddRoundKey — each byte of the state is combined with the round key; each round key is derived from the cipher key using a key schedule.

  13. RC4 • Stream cipher • Designed by Ron Rivest in ’87 • In 1994 a description was anonymously posted, which lead to ARCFOUR • In 2001 it was found that the first 256 bytes of the output were non-random (this leads to one big vulnerability in WEP) • Unusually simple • Requires only byte-length manipulations

  14. RC4 - details RC4 Key Scheduling Algorithm (KSA) Encrypt(bits, key){ S = new byte[256]; KSA(key,S); randomBits = PRGA(S); for i=0 to length(bits)-1{ bits[i] = XOR(bits[i],randomBits[i]) } } Decrypt(bits, key){ Encrypt(bits, key) } for i from 0 to 255 S[i] := i j: = 0 for i from 0 to 255 j := (j + S[i] + key[i mod l]) mod 256 swap(S[i],S[j]) pseudo-random gen. algorithm (PRGA) i := 0 j := 0 while GeneratingOutput: i := (i + 1) mod 256 j := (j + S[i]) mod 256 swap(S[i],S[j]) output S[(S[i] + S[j]) mod 256]

  15. PixelFlow Unit

  16. Processing Element

  17. PixelFlow

  18. Password Hacking • Two full chassis=18 SIMD Arrays=147,456 PE’s • Find the key used to encrypt a stream of bytes using RC4 • Check Unix passwords

  19. Password Hacking Results • Each PE checks 1,000 RC4 keys in 3.75s = 38.8M/s for the system • Each PE checks 1,000 crypt passwords in 6s =24.6M/s for the system • System encrypts 614M DES passwords/second

  20. Secret Key Cryptography • Test capability of XOR on different cards using different ciphers • RC4 • AES • Used OpenGL and GLUT, no CG • Did not use ‘the latest greatest’ cards • Basic steps • Disable dithering! • Copy pixels between coordinates, with colormapping and XOR on or off • Why colormaps?

  21. Test systems Note: 256KB RAM = 256 MB RAM?

  22. RC4 on a GPU • Random bit stream must be precomputed • Simply load the bit stream and the data to the GPU and then perform the XOR

  23. RC4 Comparisons (includes all data transfers)

  24. AES on a GPU • Key schedule is not done on GPU • Only store 1 byte of data in each pixel • (De/En)crypts 4*n blocks simultaneously, given that the data is stored in n pixels • Tested against reference and optimized CPU implementations

  25. AES on GPU data layout

  26. AES Encryption Rates

  27. Cryptography on GPU -Conclusions • Performance isn’t great, now what? • GPU’s can still offload some cycles off the CPU? • Not really, the CPU was still fully utilized • Look for a new application, like video system described in their next paper

  28. Remotely Keyed Cryptographics • Extend their previous work • RC4 can be used to sustain rates that are ok for live video • The video is not susceptible to attacks from within the OS • Finally? explained why they didn’t use CG: “Higher level languages do not allow the developer to specify which OpenGL commands are utilized when there are multiple ways of implementing a function via OpenGL commands and do not even guarantee the operations will be transformed into OpenGL commands but instead may transform it intoC code… that may be executed on the OS and not using OpenGL.”

  29. Regarding compression and transmission • Encrypted data doesn’t compress • 100 Mbps = 16.7 500x500 RGB frames/s • If you know about the RFP protocol (used in VNC) or MPG video, you know that the entire screen isn’t necessarily updated from frame to frame, only the sections that need it

  30. System Overview

  31. Session Overview

  32. Test systems (again) Note: 256KB RAM = 256 MB RAM?

  33. Test results

  34. Remote Keyed Cryptography- Conclusion and Comments • 99% of time taken on GPU is writing the keystream • Rates still good for thin-clients • Great for encrypting very sensitive images in a system that is not completely trusted

  35. Presentation Conclusion • GPU’s can have lots and lots of FLOPs • Can we find applications that use them as the main PU? • As a co-processor to offload cycles? • If we fail, can we apply a different twist or find a special case/market of the setting where we had failed previously?

  36. Discussion • Could make the performance of these algorithms better if we could use fragment programs for example?

  37. Additional References • Wikipedia • lots and lots of info and images taken from here • PixelFlow Website • In particular:Eyles, John, Steven Molnar, John Poulton, Trey Greer, Anselmo Lastra, and Nick England, "PixelFlow: The Realization", Proceedings of the Siggraph/Eurographics Workshop on Graphics Hardware, Los Angeles, CA, August 3-4, 1997, 57-68.

More Related