The Minimum Number of Givens in a Fair Sudoku Puzzle (is 17!)

1 / 24

# The Minimum Number of Givens in a Fair Sudoku Puzzle (is 17!) - PowerPoint PPT Presentation

The Minimum Number of Givens in a Fair Sudoku Puzzle (is 17!). Joshua Cooper USC Department of Mathematics. 1. 3. 7. 8. 9. 7. 4. 8. 5. 9. 2. 8. 1. 6. 6. 8. 7. 1. 2. 8. 4. 7. 1. 8. 1. 3. 7. 5.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'The Minimum Number of Givens in a Fair Sudoku Puzzle (is 17!)' - kasi

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### The Minimum Number of Givens in a Fair Sudoku Puzzle (is 17!)

Joshua Cooper

USC Department of Mathematics

1

3

7

8

9

7

4

8

5

9

2

8

1

6

6

8

7

1

2

8

4

7

1

8

1

3

7

5

Rules: Place the numbers 1 through 9 in the 81 boxes, but do not let any number

appear twice in any row, column, or 33 “box”.

You start with a subset of the cells labeled, and try to finish it.

6

5

4

2

8

3

2

6

1

5

9

9

2

4

7

6

1

3

4

7

5

3

3

1

2

9

4

6

5

8

7

5

3

9

4

5

9

6

1

3

7

2

2

3

6

9

5

8

4

4

2

9

6

COLUMN

STACK

1

3

9

7

8

BOX

7

4

BAND

1

6

5

4

3

9

7

2

8

8

5

7

8

3

2

6

1

4

5

9

9

2

8

1

6

9

2

4

8

5

7

6

1

3

ROW

4

7

9

5

2

8

1

3

6

GIVEN

6

8

7

1

2

3

1

2

9

4

6

5

8

7

8

4

CELL

6

5

8

7

1

3

2

9

4

7

1

5

9

6

1

8

4

3

7

2

8

1

3

7

5

2

3

7

6

9

5

8

4

1

BOARD

PUZZLE

8

4

1

3

7

2

9

6

5

A Sudoku puzzle designer has two main tasks:

1. Come up with a board to use as the solution state.

2. Designate some subset of the board’s squares as the initially exposed

numbers (“givens”).

For example:

We’re going to focus on task #1: How to choose a “fair” Sudoku board?

For a Sudoku puzzle, i.e., a set of givens, to be “fair”, it must have two properties:

1. It has a solution. (Solvability)

2. There is only one solution. (Uniqueness)

Question: What is the fewest number of givens in a fair puzzle?

Possible solution (“Brute Force”):

1. Enumerate all possible sets of givens.

2. Check each one to see if it is solvable.

3. Check the solvable ones to see if they are unique.

4. Count up the number of givens in the smallest uniquely solvable

puzzle, and output the minimum such number.

Why Brute Force Is Impractical:

1. Enumerate all possible sets of givens.

With 81 cells, there are 281 ≈ 2.4 ∙ 1024 sets of cells one could fill in.

Actually, the situation is even worse, because we have 9 options for the contents of

each cell. That means a total number

of possible sets of givens.

1 + 9 ∙ 81 + 92 ∙ () + 93 ∙ ()+ … + 980 ∙ () + 981 ∙ ()

81

81

81

81

2

3

81

80

“81 choose 3” = the number of

ways to choose 3 objects from

a collection of 81

Why Brute Force Is Impractical:

1. Enumerate all possible sets of givens.

With 81 cells, there are 281 ≈ 2.4 ∙ 1024 sets of cells one could fill in.

Actually, the situation is even worse, because we have 9 options for the contents of

each cell. That means a total number

of possible sets of givens.

1 + 9 ∙ 81 + 92 ∙ () + 93 ∙ ()+ … + 980 ∙ () + 981 ∙ ()

81

81

81

81

2

3

81

80

“N choose K” = the number of

ways to choose K objects from

a collection of N

Why Brute Force Is Impractical:

1. Enumerate all possible sets of givens…

With 81 cells, there are 281≈ 2.4 ∙ 1024 sets of cells one could fill in.

Actually, the situation is much worse, because we have 9 options for the contents of

each cell. That means a total number

of possible sets of givens.

1 + 9 ∙ 81 + 92 ∙ () + 93 ∙ ()+ … + 980 ∙ () + 981 ∙ ()

81

81

81

81

2

3

81

80

By the Binomial Theorem,

which is approximately the number of atoms in the observable universe.

1. Enumerate all sets of 81 givens, and if a uniquely satisfiable puzzle is

found, enumerate all sets of 80 givens, and if a uniquely satisfiable

puzzle is found, enumerate all sets of 79 givens…

In fact, we can start much lower than 81, since there are many uniquely satisfiable

puzzles known with fewer than 81 givens.

Indeed, there are uniquely satisfiable puzzles known which have only 17 givens.

1

4

2

5

4

7

8

3

1

9

3

4

2

5

1

8

6

Gordon Royle has compiled a list of 49151 (!) inequivalent ones at:

http://mapleta.maths.uwa.edu.au/~gordon/sudokumin.php

1. Permuting the rows and

columns of each band/stack (X 3!6)

2. Permuting bands I, II, and III, and

and stacks A, B, and C (X 3!2)

I

II

III

A

B

C

What does it mean for two Sudoku boards/puzzles to be equivalent?

Two boards are considered equivalent if it is possible to transform one into the other

by a sequence of operations of the form:

3. Permuting the numbers/colors (X 9!)

This generates a group of 3,359,232

different possible operations.

We’ll call this the “Sudoku group.”

1. Enumerate all sets of 16 givens…

How many such sets are there?

916 ∙ () ≈6.22 ∙ 1031

81

16

It would be silly to look at all of these, though:

1. We can rule out anything that has two of the same symbol in any

column, row, or box.

2. Once we examine one, we don’t have to look at all the ones equivalent

to it.

Approximate total number of inequivalent configurations of 16 “non-conflicting” givens:

3.64 × 1023

Still way too big.

Even if we could enumerate all of these, and even if we knew how to generate a list of

one representative of each equivalence class (= orbit under the Sudoku group)…

}

Use backtracking.

2. Check each one to see if it is solvable.

3. Check the solvable ones to see if they are unique.

NEWS FLASH!!!

January 1, 2012: McGuire, Tugemann, Civario, University College Dublin

There is no 16-Clue Sudoku: Solving the Sudoku Minimum Number of Clues Problem

Posted on the arXiv, so it has not been published (i.e., vetted by a referee).

Nonetheless, it looks legit.

Q: How the *\$?&!* did they do that!?

A: Some clever mathematics, some very clever programming, and a RIDICULOUS

amount of computing power:

7.1 million core hours on an SGI Altix ICE 8200EX cluster with 320 compute nodes, each of which has two Intel (Westmere) Xeon E5650 hex-core processors and 24GB of RAM = approx 1 year real time

The general strategy:

1. Construct a catalogue of all 5,472,730,538 inequivalent boards.

Done by Glenn Fowler, AT&T labs. Full enumeration, with a very

clever and specialized compression algorithm.

Uncompressed data size: 418 GB.

Compressed data size: 6 GB. (That’s 8.77 bits/board!)

2. Search each board for sub-puzzles with 16 givens, and check

each one to see if it can be uniquely completed to a valid Sudoku

board.

BIG PROBLEM:

So, McGuire et al were smarter about which sets of cells they looked at.

Observation:

9 3 7 8 5 6 2 4 1

5 6 2 1 9 4 3 8 7

4 8 1 2 7 3 5 6 9

8 2 3 6 4 7 9 1 5

6 1 5 9 3 2 4 7 8

7 4 9 5 8 1 6 2 3

3 7 8 4 6 9 1 5 2

1 9 6 7 2 5 8 3 4

2 5 4 3 1 8 7 9 6

Every fair puzzle must contain

at least one of the red numbers.

Call such a set of cells

“unavoidable”.

Observation:

9 3 7 8 5 6 2 4 1

5 6 2 1 9 4 3 8 7

4 8 1 2 7 3 5 6 9

8 2 3 6 4 7 9 1 5

6 1 5 9 3 2 4 7 8

7 4 9 5 8 1 6 2 3

3 7 8 4 6 9 1 5 2

1 9 6 7 2 5 8 3 4

2 5 4 3 1 8 7 9 6

Every fair puzzle must contain

at least one of the red numbers.

Call such a set of cells

“unavoidable”.

Smarter strategy for searching for

16 cell puzzles:

1. For each completed board, find lots of unavoidable sets.

2. Enumerate all the sets of 16 cells that hit each unavoidable

set at least once.

3. Check each set of 16 cells to see if it is a fair puzzle.

1. For each completed board, find lots of unavoidable sets.

Strategy: Ed Russell compiled a list of 525 “blueprints” (which includes all

of them on 11 or fewer cells).

Apply the Sudoku group to these blueprints to obtain a large collection of

them, and then compare to each puzzle in turn.

Example blueprint:

1

2

3

4

4

3

1

2

3

4

2

1

set at least once.

This is the so-called “hitting set” problem, well known to be NP-hard.

Definition. Given a collection of subsets of clues (the unavoidable sets), a hitting set (or transversal)forthis collection is a set of clues that intersects every one of the subsets.

• Algorithm:
• At each step, find the smallest unavoidable set that does not contain any of the clues picked so far, and then try each element of this unavoidable set as the next clue.
• Repeat until 16 clues have been chosen.
• If the collection of unavoidable sets is exhausted before we get to the 16th clue, simply add the remaining clues needed in all possible ways.

Small but crucial improvement: whenever we add a clue to the hitting set from an unavoidable set, we consider all smaller clues from that unavoidable set as dead, i.e., we exclude these smaller clues from the search (in the respective branch of the search tree only).

3. Check each set of 16 cells to see if it is a fair puzzle.

McGuire et al used an open-source Sudoku solver written by Brian Turner, available online. This solver can check around 50,000 16-clue puzzles per second for a unique completion.

One “little” issue: is this a proof ?

It’s not human-checkable: the computation is too big.

As long as our understanding of physics is sufficiently

accurate to completely predict the behavior of a processor

under the given instruction set, the computation is to be

believed…

… unless there is a bug in their code…

… or there is a bug in the kernel of the OS running the code…

… or a cosmic rays streams in from outer space and knocks

an electron out of place at just the right (wrong?) moment…

… or a radioactive atom in the chip’s substrate material

decays, tossing off an alpha particle…

… or random noise is caused by transient EMF fields,

perhaps from inductive or capacitative “crosstalk”…

… or our understanding of physics isn’t quite good enough…

Are these issues really worth

worrying about, or are they so

rare that they are not a

problem?

Tezzaron Semiconductor, 2004 whitepaper “Soft Errors in Electronic Memory”

estimates that modern memory is subject to 1000 to 5000 FIT (bit flip per billion

hours of use) per Mbit of memory.

A yearlong computation probably has lots of these errors, then!

What to do!?

Define a graph Sud on the set of cells with a complete subgraph in each row, column, and box.

Definition. A graph G is said to be k-colorable if it is possible to assign k colors to the

vertices in such a way that no edge has both its vertices colored the same.

Definition. The chromatic number χ(G) of a graph G is the smallest integer k so

that G is k-colorable.

Definition. For a graph G and a proper vertex coloring c with exactly χ(G) colors,

define a “determining set” to be a set of vertices so that the coloring, restricted to those

vertices, can be completed to a bona fide proper vertex coloring of the graph in exactly

one way.

Definition. For a graph G and a proper vertex coloring c with exactly χ(G) colors,

define a “critical set” to be a determining set so that removing any vertex makes the

set non-determining.

Definition. For a graph G and a proper vertex coloring c with exactly χ(G) colors,

define scs(G;c) to be the size of the smallest critical set for G and c, and lcs(G;c)

to be the size of the largest critical set.

Definition. For a graph G, define

Theorem (McGuire et al ‘12).

Perhaps by studying these parameters, we can eventually construct a

(human-readable) mathematical proof of this result.

For example…

Theorem (C., Kirkpatrick ’12+). For n even,

Theorem (C., Kirkpatrick ’12+). For n odd,

These parameters (by other names) have been studied before in other contexts, particularly for Latin squares.

Definition. A Latin square of order nis an nX nmatrix whose cells are filled with

the numbers 1, …, n, so that each column and row contains exactly one of each

symbol.

Definition. The Latin square graph of order nis the Cartesian product

Kn□Kn

of two complete graphs on n vertices, i.e., (a, b) ∈[n] X [n] is adjacent to

(c, d) ∈ [n] X [n] iff a = b or c = d.

Theorem (Cavenagh ‘07).scs(Kn□Kn) ≥ cn (log n)1/3.

NB. This is the first superlinear lower bound! The proof uses very special properties

of Latin squares. More generalizable proof?

Theorem (Cooper, Donovan, Seberry ‘91).scs(Kn□Kn) ≤ ⌊n2/4⌋.

n is odd.

Theorem (Gower, ‘00).lcs(Kn□Kn) ≥ n2(1-o(1)).

Theorem (Dejter, Horak, ‘07).lcs(Kn□Kn) ≤ n2 – 7n / 2.

Theorem (Ghandehari, Hatami, Mahmoodian, ‘05).

Thanks!

P.S. There are as many open problems about this as there are graphs. If you are interested in doing some research, contact me at cooper@math.sc.edu.