Physical Security and Side-Channel Attacks

Physical Security and Side-Channel Attacks Rice ELEC 528/ COMP 538 Farinaz Koushanfar Spring 2009

Outline • Introduction • Hardware targets • Attack classification • Power attacks • Timing attacks • Electromagnetic attacks • Fault injection attacks

Introduction • Classic cryptography views the securing problem using mathematical abstractions • The classic cryptoanalysis has had a great success and promise • Analysis and quantifying crypto algorithms’s resilience against attacks) • Recently, many of the security protocols have been attacked using physical attacks • Take advantage of the implementation specific to recover the secret parameters

Physical attacks • Traditional cryptography is centered around the concepts of one-way and trapdoor functions • A one-way function can be rapidly calculated, but is computationally difficult to invert • Polynomial time algorithms rarely find a pre-image of the one-way security functions for a random set of inputs • A trapdoor one-way function is a function that is easy to invert if and only if a certain secret (key) is available • Physical attacks usually have two phases: • Interaction phase: the attacker exploits some physical characteristics of the device • Exploitation phase: analyzing the gathered information to recover the secret

Model • Consider a device capable of doing cryptographic function • The key is usually stored in the device and protected • Modern crypto based on Kerckhoff’s assumptions all of the data required to operate a chip is entirely hidden in the secret • Attacker only needs to extract the keys

Principle of divide-and-conquer attack • The divide and conquer (D&C) attacks attempt at recovering the key by parts • The idea is that an observable characteristic can be correlated with a partial key • The partial key should be small enough to enable exhaustive search • Once a partial key is validated, the process is repeated for finding other keys • D&C attacks may be iterative (some parts of the key dependent on others) or independent

Hardware targets • The most common victim of hardware cryptoanalysis are the smart cards (SC) • Attacks on SCs are applicable to any general purpose processor with a fixed bus length • Attacks on FPGAs are also reported. FPGAs represent application specific devices with parallel computing opportunity

Smart Cards • It has a small processor (8bit or 32bit) long with ROM, EEPROM and a small RAM • There are eight wires connecting the processor to the outside world • Power supply: SCs have no internal batteries, the current provided by the reader • Clock: SCs do not have an internal clock • SCs are typically equipped with a shield that destroys the chip if a tampering happens

FPGAs • The first difference with SCs is in the applications of the two processor. • FPGAs and ASICs allow parallel computing • Multiple programmable configuration bits

Attack classification • Many possible attacks, the attacks are often not mutually exclusive • Invasive vs. noninvasive attacks • Active vs. passive • Active attacks tamper with device’s proper functionality, either temporary or permanently

Five major attack groups • Probing attack (invasive) • Fault injection attacks – active attacks , maybe invasive or noninvasive • Timing attacks exploit device’s running time • Power analysis attack • Electromagnetic analysis attacks

Power attacks

Measuring phase • This task is usually straightforward • Easy for smart cards: the energy is provided by the terminal and the current can be read • Relatively inexpensive (<$1000) equipment can digitally sample voltage differences at high rates (1GHz++) with less than 1% error • Device’s power consumption depends on many things, including its structure and data

Simple power analysis (SPA) • Monitoring the device’s power consumption to deduce information about data/operation • Example: SPA on DES – smart card • The internal structure is shown in the next slide • Summary DES - a block cipher • a product cipher • 16 rounds (iterations) on the input bits (of P) • substitutions (for confusion) and • permutations (for diffusion) • Each round with a round key • Generated from the user-supplied key

Input Input Permutation L0 R0 S P L1 R1 K1 K L16 R16 K16 Final Permutation Output * DES Basic Structure [Fig. – cf. J. Leiwo] • Input: 64 bits (a block) • Li/Ri– left/right half of the input block • for iteration i (32 bits) – subject to substitution S and permutation P (cf. Fig 2-8– text) • K - user-supplied key • Ki - round key: • 56 bits used +8 unused • (unused for E but often used for error checking) • Output: 64 bits (a block) • Note: Ri becomes L(i+1) • All basic op’s are simple logical ops • Left shift / XOR

Example 1 - SPA on DES (cont’d) • The upper trace – entire encryption, including the initial phase, 16 DES rounds, and the initial permutation • The lower trace – detailed view of the second and third rounds

square and multiply algorithm SPA on DES (cont’d) • The DES structure and 16 rounds are known • Instruction flow depends on data  power signature • Example: Modular exponentiation in DES is often implemented by square and multiply algorithm • Typically the square operation is implemented differently compared with the multiply (for speed purposes) • Then, the power trace of the exponentiation can directly yields the corresponding value • All programs involving conditional branchingbased on the key values are at risk!

Example 2: SPA on RSA

SPA example (cont’d)

SPA example (cont’d) • Unprotected modular exponentiation – square and multiply algorithm • The pick values reveal the key values

Possible countermeasure – randomizing RSA exponentiation

Differential power analysis (DPA) • SPA targets variable instruction flow • DPA targets data-dependence • Difference b/w smart cards (SCs) and FPGAs • In SCs, one operation running at a time •  Simple power tracing is possible • In FPGAs, typically parallel computations prevents visual SPA inspection  DPA

Example: DPA on DES • Divide-and-conquer strategy, comparing powers for different inputs • Record large number of inputs and record the corresponding power consumption • We have access to R15, that entered the last round operation, since it is equal to L16 • Take this output bit (called M’i) at the last round and classify the curves based on the bit • 6 specific bits of R15 will be XOR’d with 6 bits of the key, before entering the S-box • By guessing the 6-bit key value, we can predict the bit b, or an arbitrary output bit of an arbitrary S-box output • Thus, with 26 partitions, one for each possible key, we can break the cipher much faster A closer look at HW Implementation Of DES

DPA (cont’d) • DPA can be performed in any algorithm that has the operation =S(K), •  is known and K is the segment key The waveforms are captured by a scope and Sent to a computer for analysis

What is available after acquisition?

DPA (cont’d) The bit will classify the wave wi • Hypothesis 1: bit is zero • Hypothesis 2: bit is one • A differential trace will be calculated for each bit!

DPA (cont’d)

DPA -- testing

DPA – the wrong guess

DPA (cont’d) • The DPA waveform with the highest peak will validate the hypothesis

DPA curve example

DPA by correlations

Attacking a secret key algorithm

Typical DPA Target

Example -- DPA

Example – hypothesis testing

DPA on DES algorithm

DPA on other algorithms

DPA (Cont’d)

Improvements over DPA • Correlation power analysis (CPA) - attacker steps • Predict the power usage of the device at one specific instant, as a function of certain key bits • E.g., for DES, it is assumed to be function of the Hamming weight of the data • Prediction matrix stores the predicted values • Consumption vector Stores the measured power • The attacker compared the actual and the predicted values, using correlation coefficient • E.g., correlation b/w all the columns of the prediction vector and the consumption matrix

Modeling the power consumption • Hamming weight model • Typically measured on a bus, Y=aH(X)+b • Y: power consumption; X: data value; H: Hamming weight • The Hamming distance model • Y=aH(PX)+b • Accounting for the previous value on the bus (P)

Correlation power analysis (CPA) • The equation for generating differential waveforms replaced with correlations • Rather than attacking one bit, the attacker tries prediction of the Hamming weight of a word (H) • The correlation is computed by:

More about PA (cont’d) • Data-dependent attacks require power consumption model • Can be measured and learned • Synchronization of the measurements needs to be addressed • The attack is affected by parallel computing which lowers observability • The described attack is not the best achieved to date, e.g., techniques based on maximum likelihood often offer better results

Statistical PA -- countermeasures

Anti-DPA countermeasures

Physical Security and Side-Channel Attacks