1 / 22

An Encryption-Enabled Network Protocol Accelerator

An Encryption-Enabled Network Protocol Accelerator. Steffen Peter, Mario Zessack, Frank Vater, Goran Panic, Horst Frankenfeldt, and Michael Methfessel. Outline. Motivation TCP General Hardware Design Cryptographic Accelerators Implementation Conclusions . Motivation.

taurus
Download Presentation

An Encryption-Enabled Network Protocol Accelerator

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Encryption-Enabled Network Protocol Accelerator Steffen Peter, Mario Zessack, Frank Vater, Goran Panic, Horst Frankenfeldt, and Michael Methfessel

  2. Outline • Motivation • TCP • General Hardware Design • Cryptographic Accelerators • Implementation • Conclusions

  3. Motivation Wireless sensor network Internet • standard TCP • high data rates • security • low energy Tiny sensor nodes Cluster head

  4. Motivation • Increasing amount of data • even in mobile and ubiquitous scenarios • need for good transport performance • Low cost • Small silicon area • Energy efficient • Need for security • Secrecy, Integrity, Reliability • Support of standard protocols • TCP (transport) • AES (data encryption) • ECC/ECDSA (signature, key agreement) Is dedicated hardware the solution?

  5. TCP • Standard transport protocol of the Internet • Connection-based protocol • Three-way handshake • Complicated connection tear-down • Basic data integrity mechanism • Checksum • Error correction mechanisms • Fast retransmit, slow (re-)start, many others • Flow control • - Buffer and congestion control • No actual security mechanisms

  6. TCP – Profiling Results Transmit Receive

  7. TCP Profiling – Implications • Copying data consumes most time and energy • Reduce copy operations as much as possible • Protocol handling needs merely 1/5th of the total computation • Is it worth hard-wiring the TCP state machines in hardware? • Trade-off performance  flexibility • Checksum is the most expensive computation • The obvious dedicated hardware unit • How to integrate in the data flow? • Memory allocation needs more than 5 percent of time • Can a dedicated unit help here?

  8. TCP Profiling – Our Answers • One copy architecture • Data is copied directly from the peripheral handler to the right memory location (assigned by CPU) • During this one copy operation other operational blocks (checksum, encryption) listen on the bus and do their work • MIPS CPU performs complicated (but low effort) TCP logic • Connection build-up/tear-down, error handling, congestion control • Software handling allows protocol variations and debugging • Dedicated checksum-block • checksum block computes checksum during the one copy operation • No dedicated memory manager unit • Hard-wired memory manager reduces time by 77 percent (from 5%1%) • BUT high hardware costs (300 flip-flops) and lack of flexibility

  9. to Host Host interface handler SRAM CPU System Bus RF Checksum Data I/O to MAC/PHY General Design • MIPS CPU handles complex protocols. • CPU never touches payload • Internal 32 kByte SRAM stores packets. • AMBA bus connects system components. • Standard bus system allows modular approach. • Periperhal bus (APB) connects GPIO, UART, SPI ports. • System concept: interacting independent units. • Units exchange commands and status using register file. • General-purpose formalism for command/status syntax.

  10. General Design - Flow • Incoming packet to Host • Full header processing • Window update • Alloc memory slot for next packet • Signal application Host interface handler Sleeps Wakeup SRAM CPU System Bus RF Check-sum Data I/O • Basic header processing • Select memory slot for packet • Packet received and Checksum ok? to MAC/PHY

  11. Results (Performance and power consumption) • Simulation results: split power among different entities. • Maximal data rate in pure software on MIPS is 20.7 Mbit/sec. • Hardware accelerators reduce load on CPU and save 50% of power. • Maximal data rate with hardware accelerators is 40 Mbit/sec. Case Rate CPU CPU AMBA Reg- Card- EPP/ Total (Mb/sec) active bus file bus UA power SW 20.7 100% 60 14 7 4 4 89 mW HW 20.7 15% 9 14 7 4 12 46 mW HW 40.0 31% 18 14 7 4 12 55 mW • Measured power consumption is 2.5 times simulated power. • Measured power includes pads. • Consumed power varies for different production runs.

  12. Cryptographic Accelerators • AES (Advanced Encryption Standard) • Symmetric stream cipher • Suitable for low-power high-throughput data streams • Standardized in November 2000 (NIST/National Institute for Standards and Technology; USA) • Input data length: 128 bit; key length: 128, 196, 256 bit • Assumed to be secure for the next 70 years • ECC (Elliptic Curve Cryptography) • Asymmetric cryptography • Suitable signatures and key-establishment • Key length 160-571 bit (NIST standard)

  13. Advanced Encryption Standard (AES) 10 Rounds xor S-Box Shiftrow MixColumn Calckey Key Data Output data S I H E P C 84 30 AE CB R M E i c T r 97 38 AD 43 o K E e l Y e 17 58 7A 0E c t r o 67 CF FE 80 • Huge design space • Sharing S-Boxes reduces performance but leads to smaller designs • Pipelining and Parallelism boost performance – but cost area and energy

  14. AES - Results • Throughput: ~52 MBit/s @33 Mhz (includes input and output of the data blocks) • Size: 0.336mm² in 0.25 CMOS (8,450 equivalent gates) • 70 clock cycles per 128 bit data block for en-/decryption • 72 times faster than software implementation on MIPS (33 MHz) and it requires 0.4% energy of the software solution

  15. Elliptic Curve Cryptography (ECC) • Asymmetric cryptography • Basis for many key exchange and signature algorithms (ECDSA) • Trapdoor : Elliptic Curve Point Multiplication • one can compute: Q = kP • it is infeasible to determine k for given Q and P • Higher security with shorter key lengths • about 1/10th of RSA’s key size • Still operations on Elliptic Curves are expensive • one 233 bit EC Point multiplication needs: 1200 additions, 1500 multiplications, 800 squarings, 1 division (233 bit each in the finite field)

  16. ECC - Design • Asymmetric cryptography • Trapdoor : Elliptic Curve Point Multiplication • one can compute: Q = kP • it is infeasible to determine k for given Q and P • Utilization 15% 95% 50% • Area 70% 5% 20%

  17. ECC – Implementation Results • Time for one ECPM (233 bit): • MIPS: 410 ms • HW: 0.4 ms • Energy for one ECPM (233 bit): • MIPS: 16 mWs • HW: 0.03 mWs

  18. Implemented Chip (Design) Bridge (Master) Bridge CardBus (Linux/Windows Host) I-Cache (16 kB) CardBus (Master) MIPS Processor Core UART Serial 1+2 EJTAG (Debug) AMBA AHB Bus GPIO GPIO Flash Data I/O Control (Master) Packet Filter / Checksum Memory Controller (AHB Slave) UART 0 D-SPRAM (8 kB) (Master) SRAM Check Sum Registers & Control Sum1 CPU Control Bus & ECC EPP UART Internal SRAM (32 kB) AES / MD5 SRAM (32 kB) AES Data I/O

  19. Implemented Chip (Chip Photo) Size: 7.3 x 7.4 mm (54 mm²) Core: 44 mm² Pads: 219 in QFP256 package Transistors: 4.8 M in 0.25μm Packet SRAM: 32 kByte Instruction cache: 16 kByte Data scratchpad: 8 kByte

  20. Implementation (Test Board) • Allows: • Testing the implementation in practice • Tests of interoperability • Performance tests • Energy measurements

  21. Conclusions • Profiling of TCP/IP code identified bottlenecks • TCP checksum and copying use 90% of power. • High data rate needs hardware accelerators. • Chip is a hardware solution for TCP/IP handling • Takes care of middle protocol layers efficiently. • AMBA-based bus as prototype for modular systems • Assemble different systems quickly. • Pre-tested components lead to reliability. • Cryptographic components allow security for low-cost • Designs for AES and ECC improve performance and energy consumption for security operations by three orders of magnitute. • TCP chip creates basis for further developments • Extension to higher data rates (Gbit/sec). • Use as component of complex single-chip systems.

  22. Thank You Questions? peter@ihp-microelectronics.com

More Related