Challenging Assumptions in the Use of Heuristic Search Techniques in Cryptography

Challenging Assumptions in the Use of Heuristic Search Techniques in Cryptography John A ClarkDept. of Computer Science University of York, UKjac@cs.york.ac.uk 24.07.2001

Overview • Assumptions People Make. • Part I: Boolean Functions and S-Box Design • Part II: Identification Protocols based on NP-complete Problems • Part III: Security Protocols with Proofs

Assumptions People Make • Cryptographic security has suffered a number of shocks over the past decade: • 1994: Peter Shor demonstrates polynomial time algorithm for prime factorisation on a quantum computer. • 1994: The Bellcore attack. Cryptosystems with injected faults may leak vast amounts of information. • 1996: Paul Kocher demonstrates how data dependent timing of exponentiation can break several cryptosystems (inc. RSA) • Various. Power analysis on smart cards.

Assumptions People Make • Attacks are really a challenge to fundamental assumptions: • Cryptanalysis assumes search will be carried out in the classical computing paradigm. • Cryptanalysis generally assumes an attack on the abstract algorithms, not on their implementation. • Challenging these assumptions allows some the most successful cryptanalytic attacks to date.

Assumptions People Make • Heuristic optimisation techniques (simulated annealing, genetic algorithms, tabu search etc) are some of the most successful general purpose tools available, used across most scientific and engineering disciplines. • They have hardly been used for serious modern day cryptography. • One might well ask “Why?”

Assumptions • The capabilities of heuristic optimisation techniques for cryptanalysis and the design of cryptographic artefacts are significantly under-estimated. • Assumptions are made about the scope of application of heuristic optimisation techniques and about the ways in which the techniques should be used. • Challenging these assumptions allows existing applications to be addressed with significantly greater success and allows application of the techniques to new problems.

Assumptions • The Big Lie : Heuristic optimisation techniques are just no good for serious cryptography. • No Nonsense : The cost function f should be a direct and obvious characterisation of the problem to be solved. • Nice Guys : Optimisation users are honest. • High Flier : A cost function f needs to be a ‘high performer’ to be useful. • CF Matters : A great deal of care must be taken in choosing a cost function. • Black Box: Heuristic optimisation is a black box technique. • Bit Twiddlers : Optimisation techniques are only suited to low level cryptographic tasks.

Heuristic Optimisation

x0 x1 x2 x3 Local Optimisation - Hill Climbing z(x) Really want toobtain xopt Neighbourhood of a point x might be N(x)={x+1,x-1}Hill-climb goes x0 x1 x2 since f(x0)<f(x1)<f(x2) > f(x3) and gets stuck at x2 (local optimum) xopt

x0 x1 x2 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 Simulated Annealing Allows non-improving moves so that it is possible to go down z(x) in order to rise again to reach globaloptimum x In practice neighbourhood may be very large and trial neighbour is chosen randomly. Possible to accept worsening move when improving ones exist.

Simulated Annealing • Improving moves always accepted • Non-improvingmoves may be accepted probabilistically and in a manner depending on the temperature parameter T. Loosely • the worse the move the lesslikely it is to be accepted • a worsening move is less likely to be accepted the cooler the temperature • The temperature T starts high and is gradually cooled as the search progresses. • Initially virtually anything is accepted, at the end only improving moves are allowed (and the search effectively reduces to hill-climbing)

Simulated Annealing • Current candidate x. Minimisation formulation. At each temperature consider 400 moves Always accept improving moves Temperature cycle Accept worsening moves probabilistically. Gets harder to do this the worse the move. Gets harder as Temp decreases.

Simulated Annealing Do 400 trial moves Do 400 trial moves Do 400 trial moves Do 400 trial moves Do 400 trial moves Do 400 trial moves

Boolean Function Design

Boolean Functions • Boolean Functions f:{0,1}n ->{0,1} and S-Boxes f:{0,1}n -> {0,1}m lie at the heart of much modern cryptography. • Need such functions with particular properties to guard against particular forms of attack: • Balance (equal number of 0s and 1s) • High non-linearity • Low autocorrelation • Correlation immunity • High algebraic degree • There are theoretical results relating these – tradeoffs have to be made

0 0 0 1 -1 0 0 0 1 0 1 1 0 1 0 0 1 2 0 1 1 0 1 3 1 0 0 1 -1 4 1 0 1 0 1 5 1 1 0 1 -1 6 1 1 1 1 -1 7 Boolean Function Design • A Boolean function x f(x) f(x) For present purposes we shall use the polar representation Will talk only about balanced functions where there are equal numbers of 1s and -1s.

Lw(x) Lw(x)=(-1) Preliminary Definitions • Definitions relating to a Boolean function f of n variables Lw(x)=w1x1… wnxn Linear function (polar form) Walsh Hadamard

ACf=max |Sf(x)f(x+s) | s x Preliminary Definitions • Non-linearity • Auto-correlation • For present purposes we need simply note that these can be easily evaluated given a function f. They can therefore be used as the functions to be optimised. Traditionally they are.

Using Parseval’s Theorem • Parseval’s Theorem • Loosely, push down on F(w)2 for some particular w and it appears elsewhere. • Loose motivation: arranging for uniform values of F(w)2 will lead to good non-linearity. This is the initial motivation for our new cost function. NEW FUNCTION!

0 0 0 0 0 0 1 1 0 1 0 2 0 1 1 3 1 0 0 4 1 0 1 5 1 1 0 6 1 1 1 7 Moves Preserving Balance • Start with balanced (but otherwise random) solution. Move strategy preserves balance x f(x) f(x) g(x) Neighbourhood of a particular function f to be the set of all functions obtained byexchanging (flipping) any two dissimilar values. Here we have swapped f(2) and f(4) 1 -1 -1 0 1 1 0 1 -1 0 1 1 1 -1 1 0 1 1 1 -1 -1 1 -1 -1

Getting in the Right Area • Previous work (QUT) has shown strongly • Heuristic techniques can be very effective for cryptographic design synthesis • Boolean function, S-box design etc • Hill-climbing works far better than random search • Combining heuristic search and hill-climbing generally gives best results • Aside – notion applies more generally too - has led to development of memetic algorithms in GA work. • GAs known to be robust but not suited for ‘fine tuning’. • We will adopt this strategy too: use simulated annealing to get in the ‘right area’ then hill-climb. • But we will adopt the new cost function for the first stage.

Hill-climbing With Traditional CF (n=8)

Varying the Technique (n=8) Non-linearity Non-linearity Non-linearity Autocorrelation Simulated AnnealingWith Traditional CF Simulated AnnealingWith New CF Simulated AnnealingWith New CF+Hill Climbing With Traditional CF

Tuning the Technique • Experience has shown that experimentation is par for the course with optimisation. • Initial cost function motivated by theory but the real issue is how the cost function and search technique interact. • Have generalised the initial cost function to give a parametrised family of new cost functions Cost(f)=S ||F(w)|-(2 n/2+K)| R

Tuning the Technique (n=8) Non-linearity Autocorrelation Illustration of how results change as K is varied400 runs

Tuning the Technique (n=8) Non-linearity Autocorrelation Further illustration of how results change as K is varied. 100 Runs

Comparison of Results

Reprise Non-linearity Autocorrelation This exceeds a conjectured bound on auto-correlation.

Millan’s Table • Table due to Bill Millan provides more detail summary of what current theoretical state of the art is. (Non-linearity, Degree) Demonstration of such a function cited as an open problem in 2000. Tools can generate these very easily. Sadly, it has already now been demonstrated Note: higher order immunities seem beyond the tools at present – future work.

Why Don’t They do Better? • At smaller n the techniques seem to do very well. • As n gets large their utility decreases. • Revisit the cost function – essentially of the form • Can expand this as the modulus of a cubic • Essentially now a single parameter cost function in X. • Moving over to a more general polynomial has been used to provide better results. • Some preliminary results on higher level optimisation look encouraging.

S1 S2 S3 S4 S5 S6 S-Box Design • Won’t give details here but the same sorts of cost function can be used for multiple output functions. • Again attempting to control the whole spectrum brings benefits.

Uses and Abuses • Can use optimisation to maximise the non-linearity, minimise autocorrelation elements etc. • These are publicly recognised good properties that we might wish to demonstrate to others. • From an optimisation point of view one way of satisfying these requirements is as good as another. • But for a malicious designer this may not be the case. Who says that optimisation has to be used honestly???? • What’s to stop me creating Boolean functions or S-boxes with good public properties but with hidden (unknown) properties?

Planting Trapdoors • Can use these techniques to generate cryptographic elements with good public properties using an honest fitness function • honestFit(x) • But also can try to hide useful (but privately known) properties using a malicious fitness function • trapFit(x) • Now take combination and do both at the same time • Want l as low as you can get away with for the next N years! The result must still possess the required good properties.

Planting Trapdoors Publicly good solutions, e.g. Boolean functions with same very high non-linearity Publicly good solutions with high trapdoor bias found by annealing and combined honest and trapdoor cost functions. Publicly good solutions found by annealing and honest cost function There appears nothing to distinguish the sets of solutions obtained – unless you know what form the trapdoor takes! Or is there…

+1 -1 +1 +1 -1 +1 -1 -1 Vector Representations Different cost functions may give similar goodness results but may do so in radically different ways. Results using honest and dishonest cost functions cluster in different parts of the design space Basically distinguish using discriminant analysis. If you don’t have an alternative hypothesis then you can generate a family of honest results and ask how probable the offered one is.

Games People Play • It seems possible to tell that something has been going on. • And we don’t need to know precisely what has been going on. • Since any design has a binary vector representation the technique is general. • Myriad of further games you can play…

Summary and Conclusions • Have shown that local search can be used effectively for a cryptographic non-linear optimisation problem - Boolean Function Design. • ‘Direct’ cost functions not necessarily best. • Cost function is a means to an end. • Whatever works will do. • Cost function efficacy depends on problem, problem parameters, and the search technique used. • You can take short cuts with annealing parameters (and computationally there may be little choice) • Experimentation is highly beneficial • should look to engaging theory more?

Future Work • Opportunities for expansion: • detailed variation of parameters • use of more efficient annealing processes (e.g. thermostatistical annealing). • evolution of artefacts with hidden properties (you do not need to be honest - e.g. develop S-Boxes with hidden trapdoors) • experiment with different cost function families • multiple criteria etc. • evolve sets of Boolean functions • other local techniques (e.g. tabu search, TS) • more generally, when do GAs, SA, TS work best? • investigate non-balanced functions.

Breaking Protocols with Heuristic Optimisation

Identification Problems • Notion of zero-knowledge introduced by Goldwasser and Micali (1985) • Indicate that you have a secret without revealing it • Early scheme by Shamir • Several schemes of late based on NP-complete problems • Permuted Kernel Problem (Shamir) • Syndrome Decoding (Stern) • Constrained Linear Equations (Stern) • Permuted Perceptron Problem(Pointcheval)

Given Find So That Pointcheval’s Perceptron Schemes • Interactive identification protocols based on NP-complete problem. • Perceptron Problem.

Given Find So That Has particular histogram H of positive values 1 3 5 .. .. .. Pointcheval’s Perceptron Schemes • Permuted Perceptron Problem (PPP). Make Problem harder by imposing extra constraint.

1 3 5 Example: Pointcheval’s Scheme • PP and PPP-example • Every PPP solution is a PP solution. Has particular histogram H of positive values

Generate random matrix A • Generate random secret S • Calculate AS • If any (AS)i <0 then negate ith row of A Generating Instances • Suggested method of generation: Significant structure in this problem; high correlation between majority values of matrix columns and secret corresponding secret bits

Image elements tend to be small 1 3 5 7… Instance Properties • Each matrix row/secret dot product is the sum of n Bernouilli (+1/-1) variables. • Initial image histogram has Binomial shape and is symmetric about 0 • After negation simply folds over to be positive -7–5-3-1 1 3 5 7…

Neighbourhood defined by single bit flips on current solution Cost function punishes any negative image components costNeg(y)=|-1|+|-3| =4 current solution Y PP Using Search: Pointcheval • Pointcheval couched the Perceptron Problem as a search problem.

Using Annealing: Pointcheval • PPP solution is also PP solution. • Based estimates of cracking PPP on ratio of PP solutions to PPP solutions. • Calculated sizes of matrix for which this should be most difficult • Gave rise to (m,n)=(m,m+16) • Recommended (m,n)=(101,117),(131,147),(151,167) • Gave estimates for number of years needed to solve PPP using annealing as PP solution means • PP instances with matrices of size 200 ‘could usually be solved within a day’ • But no PPP problem instance greater than 71 was ever solved this way ‘despite months of computation’.

Perceptron Problem (PP) • Knudsen and Meier approach (loosely): • Carrying out sets of runs • Note where results obtained all agree • Fix those elements where there is complete agreement and carry out new set of runs and so on. • If repeated runs give same values for particular bits assumption is that those bits are actually set correctly • Used this sort of approach to solve instances of PP problem up to 180 times faster than Pointcheval for (151,167) problem but no upper bound given on sizes achievable.

Actual Secret Run 1 Run 2 Run 3 Run 4 Run 5 Run 6 All runs agree All agree (wrongly) Profiling Annealing • Approach is not without its problems. • Not all bits that have complete agreement are correct. 1 -1

Knudsen and Meier • Have used this method to attack PPP problem sizes (101,117) • Needs hefty enumeration stage (to search for wrong bits), allowed up to 264 search complexity • Used new cost function w1=30, w2=1 with histogram punishment cost(y)=w1costNeg(y)+w2costHist(y)

Challenging Assumptions in the Use of Heuristic Search Techniques in Cryptography

Challenging Assumptions in the Use of Heuristic Search Techniques in Cryptography

Presentation Transcript

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic Search

Heuristic Search

HEURISTIC SEARCH

Assumptions in the Use of Heuristic Optimisation in Cryptography

Chapter 3 Heuristic Search Techniques

Chapter 3 Heuristic Search Techniques

Heuristic Search

Chapter 2 Heuristic Search Techniques

Heuristic Search

What is Heuristic Search – Techniques & Hill Climibing in AI

HEURISTIC SEARCH TECHNIQUES

Heuristic search

Heuristic Search

Chapter 3 Heuristic Search Techniques

Heuristic Search techniques