Automatic Theory Formation: Problem-Solving & New Concepts

Theory Formation as Search Problem solving by applying formulas and methods can be automated relatively easily, but coming up with new concepts? new hypotheses/conjectures? new theories? CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Outline Human problem solving and heuristic search Newell and Simon’s GPS Theory-formation systems Lenat’s AM program “Pythagoras” Other approaches, conclusions CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Human problem solving and heuristic search Let’s assume that man is a symbolic information processor*. Intelligence may be considered as efficient searching in very large spaces. * this is part of the “physical symbol system hypothesis” of Newell and Simon: "a physical symbol system has the necessary and sufficient means of general intelligent action." CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Newell and Simon’s GPS GPS provided a state-space search implementation based on a heuristic called “Means-Ends Analysis”: “Determine the difference between the goal and current state and try applying operators that might reduce the difference.” The General Problem Solver was developed and used during 1957-1969. Applications areas included: symbolic integration, missionaries and cannibals puzzle, question answering, logic, chess, high-school algebra. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Simon’s thesis All human problem-solving and design activity can be modeled as state-space search. Herbert Simon: The Sciences of the Artificial (primary reference cited by the NSF program on Science of Design). This includes not only engineering of solutions to problems but also scientific discovery and mathematical theory formation. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Theory Formation Why theory formation? • As data is collected more and more rapidly (by more and more sensors, cameras, telescopes, microscopes, etc., scientists need help analyzing it. • Coming up with new theories is a prime field for AI. • Intelligent assistants for theory formation may help humans to create better theories. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Representations for Theories • A theory is about something: objects (numbers in number theory), phenomena in physics, people and their behavior in psychology. • Representation for a theory requires representations for the objects and/or phenomena, and for expressing properties and rules about them. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Mathematical Theories • Sets • Numbers • Geometric objects • Logical expressions • Algebraic systems • Graphs and data structures • Computers (without any robotic peripherals) can perform mathematical experiments. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Scientific Theories • Measurements from scientific instruments • Functional relationships among experimental variables • Computers (with robotics) can also perform experiments – e.g., in drug design. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Heuristics for Creative Thought A problem: cheese cutting... It’s difficult to cut good, aged cheese with a knife because the cheese crumbles. General heuristics*: 1. Check whether there is a method for handling a more general problem than the one being faced. 2. Consider the variables that affect the success or failure of the current method. Consider extreme cases of the relationships among the variables. 3. Ask what is really wanted; perhaps the problem can be avoided with another approach. *D. Lenat CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Solutions... Extreme cases of known relationships: 1. thinness of knife vs thinness of cheese slices Try making the blade as thin as possible... (result is the wire cheese cutter) 2. most cheese can be cut thinner if it’s softer Try heating the blade to melt the cheese. Combine the solutions: the “hot-wire knife” CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Application of the Extremal Heuristic in Math Exploration Assume: We have discovered the factoring of numbers, and it is “interesting” However: Factoring of numbers can lead to many alternative factorizations. Take 12=3 x 4: For sets (or multisets) of factors, we can get... {1, 12}, {1, 2, 6}, {1, 3, 4}, {1, 2, 2, 3} To explore further, we can refine the notion of factor sets to get: divisors_of(12) = {1, 2, 3, 4, 6, 12} The extremal heuristic says to pay special attention to extremal cases. For sets, extremal ones include the empty set, singleton sets, doubleton sets, and perhaps sets with three elements... CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Extremal Heuristic in Math (cont) Examples of extremal sets of divisors: divisors_of(n) empty set: none singleton sets: {1} doubleton sets, {1, 2}, {1, 3}, {1, 5}, {1, 7} sets with three elements: {1, 2, 4}, {1, 3, 9}, {1, 5, 25}, {1, 7, 49} CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Extremal Heuristic in Math (cont) Examples of extremal sets of divisors: divisors_of(n) empty set: none singleton sets: {1} doubleton sets, {1, 2}, {1, 3}, {1, 5}, {1, 7} PRIME NUMBERS sets with three elements: {1, 2, 4}, {1, 3, 9}, {1, 5, 25}, {1, 7, 49} SQUARES OF PRIMES PRIMES = Numbers having exactly 2 divisors CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

AM: Automated Mathematician Developed by Douglas Lenat in his Ph.D. research at the Stanford A.I. Lab. Completed in 1976. AM contained ~250 heuristic rules. It started with 110 core concepts of set theory. AM could create new concepts, find examples of concepts, and make conjectures. It did not prove theorems. It computed “interestingness” values for each concept, and used these values to help direct the search. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Determining “Interestingness” A concept is more interesting if examples of it can be found. Heuristic for finding examples: IF specializations of concept C have just been created, and the current task is to find examples of each of them, THEN one method is to look over the known examples of C; they may be examples of some of the new specialized concepts as well. IF all examples of a concept C turn out to be examples of another concept D as well, and C was not previously known to be a specialization of D, THEN conjecture that C is a specialization of D, and raise the “interestingness” value of both concepts. 1 2 CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

“Interestingness” (cont) IF all examples of a concept turn out to be in the domain of a rarely-applicable function F, THEN it’s worth computing all their F-values (their images under the function F), and studying that collection of F-values as a separate concept. 3 CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

“Interestingness” (cont) Heuristic 1 gives us a way to find examples of the new specializations of NUMBERS: Numbers with 0 divisors: (none found) Numbers with 1 divisor: 1 Numbers with 2 divisors: 2, 3, 5, 7, 11, 13, 17, 19, ... Numbers with 3 divisors: 4, 9, 25, 49, 121, 169, 289, ... Heuristic 2, applied to each set, determines that Numbers with 3 divisors is a subset of PERFECT_SQUARES. Heuristic 3 leads us to take the square roots of Numbers with 3 divisors, and then to discover that this set is the Numbers with 2 divisors – the primes. This greatly increases the interestingness of both Numbers with 2 divisors and Numbers with 3 divisors. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Concept Representation in AM NAME: Prime numbers, Primes, Numbers-with-2-Divisors DEFINITIONS: ORIGIN: Number-of-divisors-of(x) = 2 PRED.-CALCULUS: Prime(x) = (forall z)(z|x -> z=1 XOR z=x) ITERATIVE: (for x>1): For i from 2 to x-1, ~(i|x) EXAMPLES: 2, 3, 5, 7, 11, 13, 17 BOUNDARY: 2, 3 BOUNDARY-FAILURES: 0, 1 FAILURES: 12 GENERALIZATIONS: Numbers, Numbers with an even no. of divisors, Numbers with a prime no. of divisors SPECIALIZATIONS: Prime pairs, Prime uniquely-addables CONJECS: Unique factorization, Goldbach’s conjecture ANALOGIES: Maximally-divisible numbers are converse extremes of Divisors-of INTEREST: Conjec’s tying Primes to Times, to Divisors-of, and to other closely related operations WORTH: 800 CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

The Agenda Control Structure in AM Agenda: A priority queue of tasks (jobs). Typical tasks: Fill-in examples of PRIMES Rules are applied that appear relevant to performing the current task. Rule actions may include: Creating a new concept: PRIMES which are uniquely representable as the sum of two other primes Filling in examples (or other facets) of a concept. Creating a new job and putting it on the agenda. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Task Justifications • Each task on the AM agenda had a list of justifications. • A justification was a “quasi-symbolic reason” explaining why the job is worth considering. • Justifications were used by AM in 3 ways: • If a task on the agenda was resuggested, then the supporting reasons were examined. If there were new reasons, then the priority of the task was raised. If there were no new reasons, the priority didn’t change. • Once a task was selected, the quality of the reasons was used to determine how much time and space the task could use, before AM went on to the next task. • 3. For explaining to humans why a particular task was being done. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Some of AM’s (re)discoveries Natural numbers Prime numbers Unique factorization Every (whole) number greater than 1 can be written uniquely as a product of prime numbers. Goldbach’s conjecture: Every even integer greater than 2 can be written as the sum of two primes. 4 = 2 + 2 6 = 3 + 3 8 = 5 + 3 10 = 5 + 5 or 10 = 7 + 3 CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Human Interaction with AM After AM invented a new concept or conjecture, the human operator could assign it a name. “Call Concept 123 ‘PRIME NUMBERS’” “Call Conjecture 17 ‘UNIQUE FACTORIZATION’” This would make subsequent reports from AM more intelligible. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Controversy over AM To a certain extent, AM was “set up” to (re)discover certain concepts. After essentially exhausting its heuristics, AM got bogged down in lots of uninteresting concepts, and it could not escape from that morass. All the reported interesting discoveries took place within the first hour of CPU time on a DEC-10. AM was written in Interlisp. A follow-on project “Eurisko” tried to discover new heuristics automatically, so that programs like AM might later be able to escape from the limitations of their initial sets of heuristics. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

“Pythagoras” – A simple concept-space exploration demonstration Written by S. Tanimoto in LISP to demonstrate some of the essential mechanisms of a concept-exploration system. Domain: 2-D polygons, including categories and examples. Concept creation mechanism: creation of new conjunctions of existing predicates. Interestingness computations: based on the numbers of examples found, in relation to non-examples found. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Pythagoras’ universe of possible examples CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Pythagoras’ Given Predicates equals_sides(p) many_sides(p) nonzero_area(p) Each new concept is defined in terms of a conjunction of given predicates. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Pythagoras’ Concept Space Concept C11 is polygons with equal sides and nonzero area. Examples are RHOMBUS and SQUARE CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Pythagoras’ Obvious Limitations Needs an example GENERATOR (possibly with random vertices) Needs a rich set of predicates (perhaps involving angle computations, detection of parallel sides) Needs to be able to formulate conjectures (e.g., equivalence of concepts, vacuousness of concepts) Domain is very restricted CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Other Approaches to Theory Formation Discovery of scientific laws from numerical data. BACON (Langley, 1981) rediscovered: Ohm’s law ( V = I R ), Boyle’s law ( P V = k ) Snell’s law ( n1 sin 1 = n2 sin 2 ) Other systems have handled structural models – Dalton (Langley, 1987), and process models (Kocabas & Langley, 1998) Today, statistical data mining, is common, and some DM techniques are used in computer-aided scientific discovery, esp. in genome science. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Conclusions Heuristic search is a general, powerful methodology. It’s applicable not only to problem-solving but also to concept discovery and theory formation. The formulation of powerful sets of heuristics remains challenging. The availability of ever faster processors in ever larger networks and with ever larger memories may lead to breakthroughs in automatic theory formation systems, particularly in domains where computers can perform experiments, such as mathematics. CSE 415 -- (c) S. Tanimoto, 2008 Automatic Theory Formation

Automatic Theory Formation: Problem-Solving & New Concepts