420 likes | 865 Views
October 31, 2009 San Diego Math Circle Jack Brennen. Computational Number Theory. Computational? What's that?. Computational Number Theory deals primarily with algorithms
E N D
October 31, 2009 San Diego Math Circle Jack Brennen Computational Number Theory
Computational? What's that? • Computational Number Theory deals primarily with algorithms • Combines elements of traditional number theory (mathematics) with the study and development of algorithms (computer science) • Usually working with “big” numbers • Active area for research, implementation, and unsolved problems
Overview • Computer software (PARI/GP) • Basic algorithms (greatest common divisor, modular inverse, exponentiation, modular square roots, etc.) • Distinguishing primes from composites (sometimes easy, sometimes hard) • Factoring a number into prime factors (again, sometimes easy, sometimes hard) • Discrete logarithm • Elliptic curves
PARI/GP • A powerful free software package for computational number theory • Originally developed under Dr. Henri Cohen at the University of Bordeaux • Downloadable from: http://pari.math.u-bordeaux.fr/
Features of PARI/GP • Deals with arbitrarily large numbers • Built-in data types for rational numbers, complex numbers, vectors, matrices, polynomials, etc. • Built-in functions for number theory and other branches of mathematics • Can be used as a library from C/C++, or comes with the GP front end – a powerful interactive calculator with hundreds of functions available
Euclid's Algorithm • Oldest result in computational number theory, maybe? • Euclid described how to determine the greatest common divisor of two natural numbers • Does not require factoring • Number of steps required is bounded and depends only on the size of the numbers
Euclid's Algorithm in GP • Assume both arguments are natural numbers and that a >= b • mygcd(a,b) ={ local(r); r=(a%b); return(if(r==0,b,mygcd(b,r)));}
GCD of “big” integers • Euclid's algorithm slows down when the numbers get really big. Why? • Computer age brings on new optimization • Binary GCD algorithm • Similar to Euclid's algorithm, but only requires comparisons, subtractions, and multiplying and dividing by powers of 2
Modular Inverse with Euclid • Euclid's algorithm extends easily for modular inverse (given A and M, find B such that A*B == 1 modulo M) • Find inverse of 73 modulo 100 • (0,100) -> (1,73) -> (-1,27) -> (3,19) -> (-4,8) -> (11,3) -> (-26,2) -> (37,1) • Tells us that 37*73 == 1 modulo 100 • 37 is the inverse of 73; 37*73 == 2701
Exponentiation • How would you compute x600? • Start with x, multiply by x 599 times? Works, but painfully slow. • Binary method: Write exponent in binary, use the binary representation to determine the operations • 600 in binary: 1001011000 • x600 = x512 * x64 * x16 * x8 • All of the smaller terms are computed on the way to getting the biggest term. • Requires 9 multiplications (squarings) to get to x512, then 3 more multiplications to get the final answer, for a total of 12 multiplications
Exponentiation • Can we compute x600 in fewer than 12 multiplications? • Note that 600 == 2 * 2 * 2 * 3 * 5 * 5 • We can square with one multiplication, cube with two multiplications, and take a fifth power with three multiplications • 1 + 1 + 1 + 2 + 3 + 3 is 11. So we can actually compute x600 in 11 multiplications. • This is called the factor method
Exponentiation • These methods of powering are a special case of the general problem called “addition chains.” An addition chain begins with the number 1; each subsequent term is a sum of two numbers already existing in the chain. • Addition chains for 600 that we've discussed: • 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 576, 592, 600 • 1, 2, 4, 8, 16, 24, 48, 96, 120, 240, 480, 600 • The problem of finding an optimal addition chain is hard in general • Binary method is never terrible, but can very often be improved upon • Factor method is sometimes an improvement, but not always
Addition chains • First target where binary addition chain is not optimal is 15. Factor method beats binary method. • First target where an “ad hoc” method beats binary or factor methods is 23. Binary method uses 7 steps: 1, 2, 4, 8, 16, 20, 22, 23. • Ad hoc method, 6 steps: 1, 2, 3, 5, 10, 13, 23. • Finding these optimal ad hoc methods generally requires brute force search, and is only worth it if you are going to raise numbers to a constant power on a regular basis, or for small exponents where you can keep a table of optimal addition chains
Prime or composite? • The problem of determining whether a large integer is prime or composite is one of the oldest and most studied subjects in computational number theory • Fermat's Little Theorem is fundamental to the question; it can be used to easily prove compositeness
Fermat's Little Theorem • Simply stated, if p is prime, then (ap-a) is divisible by p for any integer a. • Can be used to prove compositeness; if(ap-a) is not divisible by p, then p is composite. • Has been proven in dozens of ways; like Pythagorean Theorem, mathematicians seem to like finding new proofs.
Proof of Fermat's Little Theorem • Induction proof, assume that p is given • Note that (ap-a) is divisible by p when a = 1. • Show that if (ap-a) is divisible by p, then so is ((a+1)p-(a+1)) • Note that the binomial expansion of (a+1)p contains only two terms which don't contain a factor of p in the coefficient – the first and the last • Write (a+1)p as (ap+1+p*C); then((a+1)p-(a+1)) becomes (ap-a+p*C) • Since (ap-a) is divisible by p, so is (ap-a+p*C), thus ((a+1)p-(a+1)) is divisible by p
Compositeness Test • Example of usage. Prove that 9 is composite without showing its factors... • Compute (29-2) == 512-2 == 510. 510 is not divisible by 9, so 9 is not prime. • Note that proving compositeness using Fermat's Little Theorem neither requires factors, nor yields factors.
Fermat's Little Theorem can't prove primality • Some composite numbers “fool” Fermat's Little Theorem. • Such numbers are called pseudo-primes • For a=2, the smallest such pseudo-prime is the number 341. • 341 is composite, but (2341-2) is also divisible by 341. • Notice why efficient exponentiation is so important in computational number theory?
Exponentiation simplified • Note that 2341 is a number with 103 decimal digits. But do we need to compute a 103 digit number? • No, because we only care about the remainder of 2341 when divided by 341; thus, we can do the exponentiation modulo 341. • Actually well within the realm of being done by pencil and paper
Proof that 341 is pseudo-prime • 22 == 4 (mod 341) • 23 == 8 (mod 341) • 25 == 32 (mod 341) • 27 == 128 (mod 341) • 214 == 128*128 == 16 (mod 341) • 221 == 128*16 == 2 (mod 341) • 242 == 2*2 == 4 (mod 341) • 284 == 4*4 == 16 (mod 341) • 2168 == 16*16 == 256 (mod 341) • 2173 == 256*32 == 8 (mod 341) • 2341 == 256*8 == 2 (mod 341) • So we see that (2341-2) is divisible by 341
How to prove a pseudo-prime composite? • So 341 is a pseudo-prime (technically, an Euler pseudo-prime to base 2) • How to prove it composite, short of factoring it? • Use a different base. In this case, we can choose the base 3, and note that (3341-3) is not divisible by 341. So we were just “unlucky” to choose base 2 as our first base. • Most compositeness tests based on Fermat's Little Theorem are probabilistic tests. Given a composite number, they “usually” prove its compositeness, but not always.
How to improve the test? • Most common compositeness test in widespread usage is the Rabin-Miller test. • Very similar to the test we just did, with one major improvement. • First, note that if (ap-a) is divisible by p, and if a is coprime to p, then we can describe the test as: a(p-1) == 1 (modulo p). • Second, assuming that we are only testing odd values for compositeness (testing even numbers is trivial), then the exponent (p-1) will always have one or more factors of 2, so we strip those factors out for our initial test. • Third, note that if we ever find that x2 == 1 (modulo M), and that x is neither equal to 1 or -1 (modulo M), that constitutes a proof that M is composite.
Rabin-Miller Test on 341 • To run the Rabin-Miller Test on the number 341 with base 2, we first note that if 341 is prime, then 2340 == 1 (modulo 341). • Next, note that we can compute 285 (modulo 341), then square it twice to get 2340. • So we compute 285 (modulo 341) == 32. So far so good, 341 could be prime. • Next, square it: 2170 (modulo 341) == (32*32) modulo 341 == 1. Thus, 341 is not prime, because we found a non-trivial solution to x2 == 1 (modulo 341), specifically x == 32.
Properties of Rabin-Miller Test • When testing a number N, and where the base is chosen at random over the range (2...N-2), it has been shown that a composite number will be proven composite at least ¾ of the time. • For the vast majority of candidates N, Rabin-Miller is much better than that. Many composite N exist which can be proven composite using any base in the valid range. • Only a very small subset of composites fall into the “difficult” category, where nearly ¼ of the tests fail to prove compositeness. • For large randomly-chosen numbers such as the ones used in cryptography (512 bits, 1024 bits, etc.), a transient arithmetic error in your processor is actually more likely than even a single failure of Rabin-Miller.
Rabin-Miller specifics • Smallest number which fails Rabin-Miller for base 2 is 2047. (It is an “Euler strong pseudo-prime to base 2”...) • A commonly known test which correctly identifies every composite number up to 1012 is to run four iterations of Rabin-Miller, using the bases 2, 13, 23, and 1662803.
Proving primality • Sometimes, a more rigorous proof of primality is desired, other than just saying that a number passed some number of probabilistic Rabin-Miller tests. • The easiest such primality proofs for N are done when we are able to completely factor N-1. • If we can find a value of a such that aN-1 == 1 (modulo N), but ax != 1 (modulo N) for every other x which divides N-1, that constitutes a rigorous proof of the primality of N. • As an example, let's prove the primality of 3*266+1.
3*266+1 is prime • Here, N-1 is easily factored. • In fact, all we need to do is find a value of a such that the following three statements are simultaneously true:a266 != 1 (mod N)a3*265 != 1 (mod N)a3*266 == 1 (mod N) • An exhaustive search shows that a == 10 is the first such value of a that works.
Factoring by Fermat's Method • Works for moderately sized numbers which are the product of two cofactors of roughly equal size • Makes use of the idea that a*b can be written as u2-v2 where u == (a+b)/2 and v == (a-b)/2. • Look for squares which are a “little” larger than the target N. If you find such a square which differs from N by a perfect square, you can proceed directly to splitting N into factors.
Example of Fermat's Method • Factor N = 5671 into primes • Look at primes “a little” larger than 5671:762 == 5776 == N + 105772 == 5929 == N + 258782 == 6084 == N + 413792 == 6241 == N + 570802 == 6400 == N + 729 • Aha, 729 is 272, so 5671 can be written as 6400-729 = 802-272 = (80+27)(80-27) =(107)(53).
Factoring by Pollard Rho • Pollard Rho is a probabilistic method of factoring which makes use of the idea that iterating a particular polynomial operation modulo any prime tends to fall into a short cycle before too long. • For instance, if we continually iterate the polynomial x2+1 (modulo 11), beginning with 1, we get: 1 -> 2 -> 5 -> 4 -> 6 -> 4.
How Pollard Rho works • When a polynomial is iterated modulo M, where M is a composite number, the result will be a combination of the cycles modulo each of the prime power divisors of M. Most notably, those cycles will probably be entered at different times, and likely be of different lengths. • How do we detect when we've fallen into a cycle modulo one divisor of M, but not another divisor? Use GCD.
Pollard Rho in action • Let's use Pollard Rho to factor 5671. • Choose the polynomial x2+3. • Iterate beginning with 1, modulo 5671... • 1 -> 4 -> 19 -> 364 -> 2066 -> 3767. • Take the GCD of (3767-4) and 5671. • It's 53. This is because the polynomial x2+3 has already fallen into a cycle modulo 53, but not yet fallen into a cycle modulo 107. So this can be used to split the factors apart.
Loop detection • How do we do loop detection, as required by Pollard Rho? Do we test every prior element of the sequence against the current element? • Two common simple ways which can be done with only limited storage. • First way is to test the 2nd element of the sequence against the 1st, the 4th element against the 2nd, the 6th against the 3rd, etc. • Second way is to test each element against the last element whose sequence number was a power of 2. For instance, the 5th, 6th, 7th, and 8th elements are tested against the 4th, etc. • Either method is guaranteed to eventually find any loop of any period.