- 111 Views
- Uploaded on
- Presentation posted in: General

Physical Security and Side-Channel Attacks

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Physical Security and Side-Channel Attacks

Rice ELEC 528/ COMP 538

Farinaz Koushanfar

Spring 2009

- Introduction
- Hardware targets
- Attack classification
- Power attacks
- Timing attacks
- Electromagnetic attacks
- Fault injection attacks

- Classic cryptography views the securing problem using mathematical abstractions
- The classic cryptoanalysis has had a great success and promise
- Analysis and quantifying crypto algorithms’s resilience against attacks)

- Recently, many of the security protocols have been attacked using physical attacks
- Take advantage of the implementation specific to recover the secret parameters

- Traditional cryptography is centered around the concepts of one-way and trapdoor functions
- A one-way function can be rapidly calculated, but is computationally difficult to invert
- Polynomial time algorithms rarely find a pre-image of the one-way security functions for a random set of inputs
- A trapdoor one-way function is a function that is easy to invert if and only if a certain secret (key) is available
- Physical attacks usually have two phases:
- Interaction phase: the attacker exploits some physical characteristics of the device
- Exploitation phase: analyzing the gathered information to recover the secret

- Consider a device capable of doing cryptographic function
- The key is usually stored in the device and protected
- Modern crypto based on Kerckhoff’s assumptions all of the data required to operate a chip is entirely hidden in the secret
- Attacker only needs to extract the keys

- The divide and conquer (D&C) attacks attempt at recovering the key by parts
- The idea is that an observable characteristic can be correlated with a partial key
- The partial key should be small enough to enable exhaustive search

- Once a partial key is validated, the process is repeated for finding other keys
- D&C attacks may be iterative (some parts of the key dependent on others) or independent

- Introduction
- Hardware targets
- Attack classification
- Power attacks
- Timing attacks
- Electromagnetic attacks
- Fault injection attacks

- The most common victim of hardware cryptoanalysis are the smart cards (SC)
- Attacks on SCs are applicable to any general purpose processor with a fixed bus length
- Attacks on FPGAs are also reported. FPGAs represent application specific devices with parallel computing opportunity

- It has a small processor (8bit or 32bit) long with ROM, EEPROM and a small RAM
- There are eight wires connecting the processor to the outside world
- Power supply: SCs have no internal batteries, the current provided by the reader
- Clock: SCs do not have an internal clock
- SCs are typically equipped with a shield that destroys the chip if a tampering happens

- The first difference with SCs is in the applications of the two processor.
- FPGAs and ASICs allow parallel computing
- Multiple programmable configuration bits

- Introduction
- Hardware targets
- Attack classification
- Power attacks
- Timing attacks
- Electromagnetic attacks
- Fault injection attacks

- Many possible attacks, the attacks are often not mutually exclusive
- Invasive vs. noninvasive attacks
- Active vs. passive
- Active attacks tamper with device’s proper functionality, either temporary or permanently

- Probing attack (invasive)
- Fault injection attacks – active attacks , maybe invasive or noninvasive
- Timing attacks exploit device’s running time
- Power analysis attack
- Electromagnetic analysis attacks

- Introduction
- Hardware targets
- Attack classification
- Power attacks
- Timing attacks
- Electromagnetic attacks
- Fault injection attacks

- This task is usually straightforward
- Easy for smart cards: the energy is provided by the terminal and the current can be read

- Relatively inexpensive (<$1000) equipment can digitally sample voltage differences at high rates (1GHz++) with less than 1% error
- Device’s power consumption depends on many things, including its structure and data

- Monitoring the device’s power consumption to deduce information about data/operation
- Example: SPA on DES – smart card
- The internal structure is shown in the next slide

- Summary DES - a block cipher
- a product cipher
- 16 rounds (iterations) on the input bits (of P)
- substitutions (for confusion) and
- permutations (for diffusion)

- Each round with a round key
- Generated from the user-supplied key

Input

Input Permutation

L0

R0

S

P

L1

R1

K1

K

L16

R16

K16

Final Permutation

Output

[Fig. – cf. J. Leiwo]

- Input: 64 bits (a block)
- Li/Ri– left/right half of the input block
- for iteration i (32 bits) – subject to substitution S and permutation P (cf. Fig 2-8– text)
- K - user-supplied key
- Ki - round key:
- 56 bits used +8 unused
- (unused for E but often used for error checking)

- Output: 64 bits (a block)
- Note: Ri becomes L(i+1)
- All basic op’s are simple logical ops
- Left shift / XOR

- The upper trace – entire encryption, including the initial phase, 16 DES rounds, and the initial permutation
- The lower trace – detailed view of the second and third rounds

square and multiply algorithm

- The DES structure and 16 rounds are known
- Instruction flow depends on data power signature
- Example: Modular exponentiation in DES is often implemented by square and multiply algorithm
- Typically the square operation is implemented differently compared with the multiply (for speed purposes)
- Then, the power trace of the exponentiation can directly yields the corresponding value
- All programs involving conditional branchingbased on the key values are at risk!

- Unprotected modular exponentiation – square and multiply algorithm
- The pick values reveal the key values

- SPA targets variable instruction flow
- DPA targets data-dependence
- Difference b/w smart cards (SCs) and FPGAs
- In SCs, one operation running at a time
- Simple power tracing is possible

- In FPGAs, typically parallel computations prevents visual SPA inspection DPA

- Divide-and-conquer strategy, comparing powers for different inputs
- Record large number of inputs and record the corresponding power consumption
- We have access to R15, that entered the last round operation, since it is equal to L16
- Take this output bit (called M’i) at the last round and classify the curves based on the bit
- 6 specific bits of R15 will be XOR’d with 6 bits of the key, before entering the S-box
- By guessing the 6-bit key value, we can predict the bit b, or an arbitrary output bit of an arbitrary S-box output
- Thus, with 26 partitions, one for each possible key, we can break the cipher much faster

A closer look at HW

Implementation Of DES

- DPA can be performed in any algorithm that has the operation =S(K),
- is known and K is the segment key

The waveforms are captured by a scope and

Sent to a computer for analysis

The bit will classify the wave wi

- Hypothesis 1: bit is zero
- Hypothesis 2: bit is one
- A differential trace will be calculated for each bit!

- The DPA waveform with the highest peak will validate the hypothesis

- Correlation power analysis (CPA) - attacker steps
- Predict the power usage of the device at one specific instant, as a function of certain key bits
- E.g., for DES, it is assumed to be function of the Hamming weight of the data

- Prediction matrix stores the predicted values
- Consumption vector Stores the measured power
- The attacker compared the actual and the predicted values, using correlation coefficient
- E.g., correlation b/w all the columns of the prediction vector and the consumption matrix

- Predict the power usage of the device at one specific instant, as a function of certain key bits

- Hamming weight model
- Typically measured on a bus, Y=aH(X)+b
- Y: power consumption; X: data value; H: Hamming weight

- The Hamming distance model
- Y=aH(PX)+b
- Accounting for the previous value on the bus (P)

- The equation for generating differential waveforms replaced with correlations
- Rather than attacking one bit, the attacker tries prediction of the Hamming weight of a word (H)
- The correlation is computed by:

- Data-dependent attacks require power consumption model
- Can be measured and learned

- Synchronization of the measurements needs to be addressed
- The attack is affected by parallel computing which lowers observability
- The described attack is not the best achieved to date, e.g., techniques based on maximum likelihood often offer better results

- Internal clock phase shift

- Differential power analysis, by kocher, Jaffe, and Jun
- Power analysis tutorial, by Aigner and Oswald
- A tutorial on physical security and side-channel attacks, by Koeune and Standaert
- Michael Tunstall has some good material – a few of the charts are his courtesy
- Side channel attacks: countermeasures, by Verbauwhede

- Introduction
- Hardware targets
- Attack classification
- Power attacks
- Timing attacks
- Electromagnetic attacks
- Fault injection attacks

- Running time of a crypto processor can be used as an information channel
- The idea was proposed by Kocher, Crypto’96

Timing attacks (cont’d)

- Key generation:
- Generate large (say, 2048-bit) primes p, q
- Compute n=pq and (n)=(p-1)(q-1)
- Choose small e, relatively prime to (n)
- Typically, e=3 (may be vulnerable) or e=216+1=65537 (why?)

- Compute unique d such that ed = 1 mod (n)
- Public key = (e,n); private key = d
- Security relies on the assumption that it is difficult to factor n into p and q

- Encryption of m: c = me mod n
- Decryption of c: cd mod n = (me)d mod n = m

- RSA decryption: compute yx mod n
- This is a modular exponentiation operation

- Naïve algorithm: square and multiply

Whether iteration takes a long time

depends on the kth bit of secret exponent

This takes a while

to compute

This is instantaneous

- Idea: guess some bits of the exponent and predict how long decryption will take
- If guess is correct, will observe correlation; if incorrect, then prediction will look random
- This is a signal detection problem, where signal is timing variation due to guessed exponent bits
- The more bits you already know, the stronger the signal, thus easier to detect (error-correction property)

- Start by guessing a few top bits, look at correlations for each guess, pick the most promising candidate and continue

- OpenSSL is a popular open-source toolkit
- mod_SSL (in Apache = 28% of HTTPS market)
- stunnel (secure TCP/IP servers)
- sNFS (secure NFS)
- Many more applications

- Kocher’s attack doesn’t work against OpenSSL
- Instead of square-and-multiply, OpenSSL uses CRT, sliding windows and two different multiplication algorithms for modular exponentiation
- CRT = Chinese Remainder Theorem
- Secret exponent is processed in chunks, not bit-by-bit

- Instead of square-and-multiply, OpenSSL uses CRT, sliding windows and two different multiplication algorithms for modular exponentiation

- n = n1n2…nk
where gcd(ni,nj)=1 when i j

- The system of congruences
x = x1 mod n1 = … = xk mod nk

- Has a simultaneous solution x to all congruences
- There exists exactly one solution x between 0 and n-1

- For RSA modulus n=pq, to compute x mod n it’s enough to know x mod p and x mod q

Attack this computation in order to learn q.

This is enough to learn private key (why?)

- To decrypt c, need to computem=cd mod n
- Use Chinese Remainder Theorem (why?)
- d1 = d mod (p-1)
- d2 = d mod (q-1)
- qinv = (1/q) mod p
- Compute m1 = cd1 mod p; m2 = cd2 mod q
- Compute m = m2+(qinv*(m1-m2) mod p)*q

these are precomputed

Attack this computation in order to learn q.

This is enough to learn private key (why?)

- To decrypt c, need to computem=cd mod n
- Use Chinese Remainder Theorem (why?)
- d1 = d mod (p-1)
- d2 = d mod (q-1)
- qinv = (1/q) mod p
- Compute m1 = cd1 mod p; m2 = cd2 mod q
- Compute m = m2+(qinv*(m1-m2) mod p)*q

these are precomputed

What is needed to compute cd mod q and xy mod q?

- Exponentiation
- Sliding windows

- Multiplication routines
- Normal (when operands have unequal length)
- Karatsuba (when operands have equal length): faster

- Modular reduction
- Montgomery reduction

- Decryption requires computing m2 = cd2 mod q
- This is done by repeated multiplication
- Simple: square and multiply (process d2 1 bit at a time)
- More clever: sliding windows (process d2 in 5-bit blocks)

- In either case, many multiplications modulo q
- Multiplications use Montgomery reduction
- Pick some R = 2k
- To compute x*y mod q, convert x and y into their Montgomery form xR mod q and yR mod q
- Compute (xR * yR) * R-1 = zR mod q
- Multiplication by R-1 can be done very efficiently

- At the end of Montgomery reduction, if zR > q, then need to subtract q
- Probability of this extra step is proportional to c mod q

- If c is close to q, a lot of subtractions will be done
- If c mod q = 0, very few subtractions
- Decryption will take longer as c gets closer to q, then become fast as c passes a multiple of q

- By playing with different values of c and observing how long decryption takes, attacker can guess q!
- Doesn’t work directly against OpenSSL because of sliding windows and two multiplication algorithms

Decryption time

q

2q

p

Value of ciphertext c

Decryption time

#ReductionsMult routine

0-1 Gap

q

Value of ciphertext

- Initial guess g for q between 2512 and 2511 (why?)
- Try all possible guesses for the top few bits
- Suppose we know i-1 top bits of q. Goal: ith bit
- Set g =…known i-1 bits of q…000000
- Set ghi=…known i-1 bits of q…100000 (note: g<ghi)
- If g<q<ghi then the ith bit of q is 0
- If g<ghi<q then the ith bit of q is 1

- Goal: decide whether g<q<ghi or g<ghi<q

Decryption time

#ReductionsMult routine

ghi?

ghi?

g

q

Value of ciphertext

Difference in decryption times

between g and ghi will be small

Difference in decryption times

between g and ghi will be large

- What is “large” and “small”?
- Know from attacking previous bits

- Decrypting just g does not work because of sliding windows
- Decrypt a neighborhood of values near g
- Will increase difference between large and small values, resulting in larger 0-1 gap

- Attack requires only 2 hours, about 1.4 million queries to recover the private key
- Only need to recover most significant half bits of q

Zero-one gap

Montgomery reductiondominates

zero-one gap

Multiplication routine dominates

- Introduction
- Hardware targets
- Attack classification
- Power attacks
- Timing attacks
- Electromagnetic attacks
- Fault injection attacks

- Introduction
- Hardware targets
- Attack classification
- Power attacks
- Timing attacks
- Electromagnetic attacks
- Fault injection attacks

- Transient (provisional) and permanent (destructive) faults
- Variations to supply voltage
- Variations in the external clock
- Temperature
- White light
- Laser light
- X-rays and ion beams
- Electromagnetic flux

- Single event upsets
- Temporary flips in a cell’s logical state to a complementary state

- Multiple event faults
- Several simultaneous SEUs

- Dose rate faults
- The individual effects are negligible, but cumulative effect causes fault

- Provisional faults are used more in fault injection

- Single-event burnout faults
- Caused by a parasitic thyristor being formed in the MOS power transistors

- Single-event snap back faults
- Caused by self-sustained current by parasitic bipolar transistors in MOS

- Single-event latch-up faults
- Creates a self sustained current in parasitics

- Total dose rate faults
- Progressive degradation of the electronic circuit

- Resetting data
- Data randomization – could be misleading, no control over!
- Modifying op-code – implementation dependent