1 / 98

# Physical Security and Side-Channel Attacks - PowerPoint PPT Presentation

Physical Security and Side-Channel Attacks. Rice ELEC 528/ COMP 538 Farinaz Koushanfar Spring 2009. Outline. Introduction Hardware targets Attack classification Power attacks Timing attacks Electromagnetic attacks Fault injection attacks. Introduction.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Physical Security and Side-Channel Attacks' - dugan

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Physical Security and Side-Channel Attacks

Rice ELEC 528/ COMP 538

Farinaz Koushanfar

Spring 2009

• Introduction

• Hardware targets

• Attack classification

• Power attacks

• Timing attacks

• Electromagnetic attacks

• Fault injection attacks

• Classic cryptography views the securing problem using mathematical abstractions

• The classic cryptoanalysis has had a great success and promise

• Analysis and quantifying crypto algorithms’s resilience against attacks)

• Recently, many of the security protocols have been attacked using physical attacks

• Take advantage of the implementation specific to recover the secret parameters

• Traditional cryptography is centered around the concepts of one-way and trapdoor functions

• A one-way function can be rapidly calculated, but is computationally difficult to invert

• Polynomial time algorithms rarely find a pre-image of the one-way security functions for a random set of inputs

• A trapdoor one-way function is a function that is easy to invert if and only if a certain secret (key) is available

• Physical attacks usually have two phases:

• Interaction phase: the attacker exploits some physical characteristics of the device

• Exploitation phase: analyzing the gathered information to recover the secret

• Consider a device capable of doing cryptographic function

• The key is usually stored in the device and protected

• Modern crypto based on Kerckhoff’s assumptions all of the data required to operate a chip is entirely hidden in the secret

• Attacker only needs to extract the keys

• The divide and conquer (D&C) attacks attempt at recovering the key by parts

• The idea is that an observable characteristic can be correlated with a partial key

• The partial key should be small enough to enable exhaustive search

• Once a partial key is validated, the process is repeated for finding other keys

• D&C attacks may be iterative (some parts of the key dependent on others) or independent

• Introduction

• Hardware targets

• Attack classification

• Power attacks

• Timing attacks

• Electromagnetic attacks

• Fault injection attacks

• The most common victim of hardware cryptoanalysis are the smart cards (SC)

• Attacks on SCs are applicable to any general purpose processor with a fixed bus length

• Attacks on FPGAs are also reported. FPGAs represent application specific devices with parallel computing opportunity

• It has a small processor (8bit or 32bit) long with ROM, EEPROM and a small RAM

• There are eight wires connecting the processor to the outside world

• Power supply: SCs have no internal batteries, the current provided by the reader

• Clock: SCs do not have an internal clock

• SCs are typically equipped with a shield that destroys the chip if a tampering happens

• The first difference with SCs is in the applications of the two processor.

• FPGAs and ASICs allow parallel computing

• Multiple programmable configuration bits

• Introduction

• Hardware targets

• Attack classification

• Power attacks

• Timing attacks

• Electromagnetic attacks

• Fault injection attacks

• Many possible attacks, the attacks are often not mutually exclusive

• Invasive vs. noninvasive attacks

• Active vs. passive

• Active attacks tamper with device’s proper functionality, either temporary or permanently

• Probing attack (invasive)

• Fault injection attacks – active attacks , maybe invasive or noninvasive

• Timing attacks exploit device’s running time

• Power analysis attack

• Electromagnetic analysis attacks

• Introduction

• Hardware targets

• Attack classification

• Power attacks

• Timing attacks

• Electromagnetic attacks

• Fault injection attacks

• This task is usually straightforward

• Easy for smart cards: the energy is provided by the terminal and the current can be read

• Relatively inexpensive (<\$1000) equipment can digitally sample voltage differences at high rates (1GHz++) with less than 1% error

• Device’s power consumption depends on many things, including its structure and data

• Monitoring the device’s power consumption to deduce information about data/operation

• Example: SPA on DES – smart card

• The internal structure is shown in the next slide

• Summary DES - a block cipher

• a product cipher

• 16 rounds (iterations) on the input bits (of P)

• substitutions (for confusion) and

• permutations (for diffusion)

• Each round with a round key

• Generated from the user-supplied key

Input Permutation

L0

R0

S

P

L1

R1

K1

K

L16

R16

K16

Final Permutation

Output

* DES Basic Structure

[Fig. – cf. J. Leiwo]

• Input: 64 bits (a block)

• Li/Ri– left/right half of the input block

• for iteration i (32 bits) – subject to substitution S and permutation P (cf. Fig 2-8– text)

• K - user-supplied key

• Ki - round key:

• 56 bits used +8 unused

• (unused for E but often used for error checking)

• Output: 64 bits (a block)

• Note: Ri becomes L(i+1)

• All basic op’s are simple logical ops

• Left shift / XOR

• The upper trace – entire encryption, including the initial phase, 16 DES rounds, and the initial permutation

• The lower trace – detailed view of the second and third rounds

SPA on DES (cont’d)

• The DES structure and 16 rounds are known

• Instruction flow depends on data  power signature

• Example: Modular exponentiation in DES is often implemented by square and multiply algorithm

• Typically the square operation is implemented differently compared with the multiply (for speed purposes)

• Then, the power trace of the exponentiation can directly yields the corresponding value

• All programs involving conditional branchingbased on the key values are at risk!

• Unprotected modular exponentiation – square and multiply algorithm

• The pick values reveal the key values

• SPA targets variable instruction flow

• DPA targets data-dependence

• Difference b/w smart cards (SCs) and FPGAs

• In SCs, one operation running at a time

•  Simple power tracing is possible

• In FPGAs, typically parallel computations prevents visual SPA inspection  DPA

• Divide-and-conquer strategy, comparing powers for different inputs

• Record large number of inputs and record the corresponding power consumption

• We have access to R15, that entered the last round operation, since it is equal to L16

• Take this output bit (called M’i) at the last round and classify the curves based on the bit

• 6 specific bits of R15 will be XOR’d with 6 bits of the key, before entering the S-box

• By guessing the 6-bit key value, we can predict the bit b, or an arbitrary output bit of an arbitrary S-box output

• Thus, with 26 partitions, one for each possible key, we can break the cipher much faster

A closer look at HW

Implementation Of DES

• DPA can be performed in any algorithm that has the operation =S(K),

•  is known and K is the segment key

The waveforms are captured by a scope and

Sent to a computer for analysis

The bit will classify the wave wi

• Hypothesis 1: bit is zero

• Hypothesis 2: bit is one

• A differential trace will be calculated for each bit!

• The DPA waveform with the highest peak will validate the hypothesis

• Correlation power analysis (CPA) - attacker steps

• Predict the power usage of the device at one specific instant, as a function of certain key bits

• E.g., for DES, it is assumed to be function of the Hamming weight of the data

• Prediction matrix stores the predicted values

• Consumption vector Stores the measured power

• The attacker compared the actual and the predicted values, using correlation coefficient

• E.g., correlation b/w all the columns of the prediction vector and the consumption matrix

• Hamming weight model

• Typically measured on a bus, Y=aH(X)+b

• Y: power consumption; X: data value; H: Hamming weight

• The Hamming distance model

• Y=aH(PX)+b

• Accounting for the previous value on the bus (P)

• The equation for generating differential waveforms replaced with correlations

• Rather than attacking one bit, the attacker tries prediction of the Hamming weight of a word (H)

• The correlation is computed by:

• Data-dependent attacks require power consumption model

• Can be measured and learned

• Synchronization of the measurements needs to be addressed

• The attack is affected by parallel computing which lowers observability

• The described attack is not the best achieved to date, e.g., techniques based on maximum likelihood often offer better results

• Internal clock phase shift

• Differential power analysis, by kocher, Jaffe, and Jun

• Power analysis tutorial, by Aigner and Oswald

• A tutorial on physical security and side-channel attacks, by Koeune and Standaert

• Michael Tunstall has some good material – a few of the charts are his courtesy

• Side channel attacks: countermeasures, by Verbauwhede

• Introduction

• Hardware targets

• Attack classification

• Power attacks

• Timing attacks

• Electromagnetic attacks

• Fault injection attacks

• Running time of a crypto processor can be used as an information channel

• The idea was proposed by Kocher, Crypto’96

• Key generation:

• Generate large (say, 2048-bit) primes p, q

• Compute n=pq and (n)=(p-1)(q-1)

• Choose small e, relatively prime to (n)

• Typically, e=3 (may be vulnerable) or e=216+1=65537 (why?)

• Compute unique d such that ed = 1 mod (n)

• Public key = (e,n); private key = d

• Security relies on the assumption that it is difficult to factor n into p and q

• Encryption of m: c = me mod n

• Decryption of c: cd mod n = (me)d mod n = m

• RSA decryption: compute yx mod n

• This is a modular exponentiation operation

• Naïve algorithm: square and multiply

Whether iteration takes a long time

depends on the kth bit of secret exponent

This takes a while

to compute

This is instantaneous

Outline of Kocher’s Attack

• Idea: guess some bits of the exponent and predict how long decryption will take

• If guess is correct, will observe correlation; if incorrect, then prediction will look random

• This is a signal detection problem, where signal is timing variation due to guessed exponent bits

• The more bits you already know, the stronger the signal, thus easier to detect (error-correction property)

• Start by guessing a few top bits, look at correlations for each guess, pick the most promising candidate and continue

• OpenSSL is a popular open-source toolkit

• mod_SSL (in Apache = 28% of HTTPS market)

• stunnel (secure TCP/IP servers)

• sNFS (secure NFS)

• Many more applications

• Kocher’s attack doesn’t work against OpenSSL

• Instead of square-and-multiply, OpenSSL uses CRT, sliding windows and two different multiplication algorithms for modular exponentiation

• CRT = Chinese Remainder Theorem

• Secret exponent is processed in chunks, not bit-by-bit

• n = n1n2…nk

where gcd(ni,nj)=1 when i  j

• The system of congruences

x = x1 mod n1 = … = xk mod nk

• Has a simultaneous solution x to all congruences

• There exists exactly one solution x between 0 and n-1

• For RSA modulus n=pq, to compute x mod n it’s enough to know x mod p and x mod q

This is enough to learn private key (why?)

RSA Decryption With CRT

• To decrypt c, need to computem=cd mod n

• Use Chinese Remainder Theorem (why?)

• d1 = d mod (p-1)

• d2 = d mod (q-1)

• qinv = (1/q) mod p

• Compute m1 = cd1 mod p; m2 = cd2 mod q

• Compute m = m2+(qinv*(m1-m2) mod p)*q

these are precomputed

This is enough to learn private key (why?)

RSA Decryption With CRT

• To decrypt c, need to computem=cd mod n

• Use Chinese Remainder Theorem (why?)

• d1 = d mod (p-1)

• d2 = d mod (q-1)

• qinv = (1/q) mod p

• Compute m1 = cd1 mod p; m2 = cd2 mod q

• Compute m = m2+(qinv*(m1-m2) mod p)*q

these are precomputed

What is needed to compute cd mod q and xy mod q?

• Exponentiation

• Sliding windows

• Multiplication routines

• Normal (when operands have unequal length)

• Karatsuba (when operands have equal length): faster

• Modular reduction

• Montgomery reduction

• Decryption requires computing m2 = cd2 mod q

• This is done by repeated multiplication

• Simple: square and multiply (process d2 1 bit at a time)

• More clever: sliding windows (process d2 in 5-bit blocks)

• In either case, many multiplications modulo q

• Multiplications use Montgomery reduction

• Pick some R = 2k

• To compute x*y mod q, convert x and y into their Montgomery form xR mod q and yR mod q

• Compute (xR * yR) * R-1 = zR mod q

• Multiplication by R-1 can be done very efficiently

• At the end of Montgomery reduction, if zR > q, then need to subtract q

• Probability of this extra step is proportional to c mod q

• If c is close to q, a lot of subtractions will be done

• If c mod q = 0, very few subtractions

• Decryption will take longer as c gets closer to q, then become fast as c passes a multiple of q

• By playing with different values of c and observing how long decryption takes, attacker can guess q!

• Doesn’t work directly against OpenSSL because of sliding windows and two multiplication algorithms

Decryption time

q

2q

p

Value of ciphertext c

#ReductionsMult routine

0-1 Gap

q

Value of ciphertext

Attack Is Binary Search

• Initial guess g for q between 2512 and 2511 (why?)

• Try all possible guesses for the top few bits

• Suppose we know i-1 top bits of q. Goal: ith bit

• Set g =…known i-1 bits of q…000000

• Set ghi=…known i-1 bits of q…100000 (note: g<ghi)

• If g<q<ghi then the ith bit of q is 0

• If g<ghi<q then the ith bit of q is 1

• Goal: decide whether g<q<ghi or g<ghi<q

#ReductionsMult routine

ghi?

ghi?

g

q

Value of ciphertext

Two Possibilities for ghi

Difference in decryption times

between g and ghi will be small

Difference in decryption times

between g and ghi will be large

• What is “large” and “small”?

• Know from attacking previous bits

• Decrypting just g does not work because of sliding windows

• Decrypt a neighborhood of values near g

• Will increase difference between large and small values, resulting in larger 0-1 gap

• Attack requires only 2 hours, about 1.4 million queries to recover the private key

• Only need to recover most significant half bits of q

Zero-one gap

Montgomery reductiondominates

zero-one gap

Multiplication routine dominates

• Introduction

• Hardware targets

• Attack classification

• Power attacks

• Timing attacks

• Electromagnetic attacks

• Fault injection attacks

• Introduction

• Hardware targets

• Attack classification

• Power attacks

• Timing attacks

• Electromagnetic attacks

• Fault injection attacks

• Transient (provisional) and permanent (destructive) faults

• Variations to supply voltage

• Variations in the external clock

• Temperature

• White light

• Laser light

• X-rays and ion beams

• Electromagnetic flux

• Single event upsets

• Temporary flips in a cell’s logical state to a complementary state

• Multiple event faults

• Several simultaneous SEUs

• Dose rate faults

• The individual effects are negligible, but cumulative effect causes fault

• Provisional faults are used more in fault injection

• Single-event burnout faults

• Caused by a parasitic thyristor being formed in the MOS power transistors

• Single-event snap back faults

• Caused by self-sustained current by parasitic bipolar transistors in MOS

• Single-event latch-up faults

• Creates a self sustained current in parasitics

• Total dose rate faults

• Progressive degradation of the electronic circuit

• Resetting data

• Data randomization – could be misleading, no control over!

• Modifying op-code – implementation dependent