Computing as an Experimental Science or Exaggerated Formalist Rhetoric Considered Harmful

Computing as an Experimental Science orExaggerated Formalist RhetoricConsidered Harmful Raymond J. Mooney Dept. of Computer Sciences University of Texas at Austin

Philosophy and Methodology Matters • One’s beliefs about the philosophy and methodology of computer science greatly impacts: • The problems on which one chooses to work. • The approach one takes to these problems. • One’s perception of the significance of results and the quality of others’ work. • One’s beliefs about the education and training of students and CS curriculum issues.

Programs as Mathematical Objects • A computer program is a formally defined mathematical object, i.e. a Turing machine. • Properties of such a mathematical object can be formally proven: • Correctness according to a formal specification. • Termination. • Time and space complexity: • Worst case. • Average case (assuming a formal specification of the input distribution).

Exaggerated Formalist Rhetoric • Since programs are formal mathematical objects, experiments and empirical analysis have no place in computer science. • Computer science is mathematics and consists of definitions, theorems, and proofs. • Without theorems, there is no rigorous science, just unprincipled hacking. • Students primarily need to be taught appropriate mathematics and how to prove theorems. • Students do not need to be taught experimental methodology appropriate for natural and social sciences.

Formal and Empirical Specifications • Some problems have clear, mathematical, formal specifications. • These lend themselves to theoretical analysis. • Some problems have “empirical” specifications that depend on physical (biological/psychological/social) phenomena that, at least currently, have no adequate mathematical formalization. • These requireexperimental analysis.

A Tale of Two Bugs • Formalists’ Poster Child: The Intel Pentium division bug illustrates a problem (floating-point division) that has a clear formal definition. • Experimentalists’ Poster Child: The Apple Newton’s insufficiently accurate hand-writing recognition illustrates a problem whose specification relies on a psychological phenomenon with no known formalization: human visual perception of written language.

9/3 4 A CPU is a terrible thing to waste. Only formal verification can prevent bugs.

Tino1 exon qRH Final exam 9AM Only experiments can ensure accuracy

Formalist $100K Challenge Problem! • If you believe that hand-writing recognition can be given a formal specification suitable for mathematical verification, then I strongly encourage you to write it down! • If, in my lifetime, you can formulate such a specification and use it to develop and verify hand-writing recognition software and demonstrate perfect accuracy on a standard, realistic benchmark dataset… • I will personally award you a $100,000 prize!

Other Problems with Empirical Specifications • Speech recognition. • Natural-language question answering. • Filtering spam from email. • Retrieval of documents or images for a web-search query that a human user finds relevant. • Predicting the secondary or tertiary structure of proteins from amino-acid sequences. • Lossy compression of images or movies that are still “acceptable” to human perception. • Rendering images or visualizations that humans perceive as natural or useful for solving problems.

Choosing the Right Methodology • When the problem is easily formalized, one should attempt to prove one’s algorithms and programs correct. • When the problem is empirical, one should run well-designed, controlled experiments on real data, using multiple trials, and analyze the statistical significance of results with respect to a well-defined hypothesis.

Formal and Empirical Input Distributions • Some problems have clear formal input distributions that lend themselves to theoretical average-case analysis. • Some problems have “empirical” input distributions that depend on phenomena in the physical (biological/ psychological/ social) world that, at least currently, have no adequate mathematical formalization.

Average-Case Analysis Examples • Formal Distribution: Time to sort a list of randomly ordered items. • Empirical Distribution: Time to run a typical user program, where program behavior can vary with respect to: • Locality of memory references • Predictability of branch outcomes • … Human-written programs for solving typical human problems exhibit regularities not present in programs randomly generated by any known statistical distribution.

Other Empirical Problem Distributions • Typical traveling-salesman problems encountered in applications and industry. • Typical scheduling problems encountered in applications and industry. • Typical problems for automated theorem proving. • TPTP problem set

Experimental Methodology 101 • An appropriate, meaningful measure of performance: • Character error rate. • A clear hypothesis. • Method A has lower character error rate than method B on English non-cursive handwriting. • A large set of realistic benchmark data. • Millions of words of human-labeled handwritten text from a diverse set of English writers. • A clear separation of training (development) and test data. • Labeled hand-written text that the developers have never seen.

Experimental Methodology 101 (cont.) • A well-controlled study. • The only difference between the two conditions is the algorithm being tested (e.g. same training and test data). • Multiple trials on different independent data sets in order to measure variance. • Statistical analysis demonstrating significant difference. • Significant t-test result (p< 0.05) on the difference between the mean character error rates of A and B in order to reject the “null hypothesis” that performance difference is attributable to random variation.

CS as Poor Experimental Science • Generally, computer scientists’ experimental methodology is severely lacking. • “Experimental” computer science frequently means hacking-up a new system and illustrating performance on a few demo problems. • “Look Ma, no hands” • “Dancing bears” • Even when quantitative results are gathered and presented, frequently there is no: • Clearly stated hypothesis that is being tested by a well-controlled experiment. • Measure of variance or statistical analysis of results.

The Poor Experimental Methodology of a Turing-Award Winner • Perhaps my own research area of machine learning has become one of the most experimentally rigorous areas in CS. • An ICML-01 paper on classifying gene-expression data co-authored by R. Karp was properly criticized during Q&A after the presentation for lacking statistical analysis of its experimental results. • This lapse by leading computer scientists was quite surprising to my 1st year graduate student.

CS Education in Experimental Methods • In most natural and social sciences, experimental methodology and statistical analysis of results is specifically taught in laboratory or statistics classes. • Computer scientists receive virtually no formal training in basic experimental methodology or statistical analysis. • I had to learn it from psychologists! • I have to teach it in a CS graduatedepth course! • CS curricula assume theory is the only source of rigor.

Misapplied Formalism • Sometimes researchers misapply formal methods to fundamentally empirical problems. • A particular formal specification or input distribution is assumed and analyzed. • Without evidence, this formalism is motivated by, or claimed to be relevant to, some important empirical problem. • The result is an insignificant theoretical result that has little or no bearing on the problem of interest. • For empirical problems, experimental evidence must be presented to demonstrate that a particular formalism truly characterizes the actual problem.

Beauty is NOT Our Primary Business • Frequently, striving for elegant formalism leads some computer scientists to study mathematical problems that are mere caricatures of important empirical problems. • They focus on what can be proven and ignore the complexity of the real problem. • Proving theorems about caricatures of empirical problems contributes little to either theoretical or applied computer science. • Science should focus on demonstrably solving interesting, important problems, not on formulating elegant formalisms that do not reflect reality.

Kepler vs. Keats • J. Kepler wasted years of his life trying to model planetary orbits with elegant, beautiful circles before empirical data forced him to realize that astronomical reality was more complex. • J. Keats makes nice poetry but lousy science. Beauty is truth, Beauty is in the eye of the beholder. truth beauty. Beauty is only skin deep. • In science, truth is a theory that accurately predicts relevant empirical data. but

Experimental Analysis of Formal Problems • Although a problem may have a clear formal definition, theoretical analysis may currently be intractable. • Chess. • Nonlinear dynamic systems. • Cellular automata. • Random satisfiability problems. • In this case, experimental analysis may also be the best approach. • Experimentation may result in conjectures that may subsequently be proven.

Experimental Mathematics • Many conjectures in mathematics originate from empirical observations. • Fermat’s last theorem • Goldbach’s conjecture • P  NP • The experimental aspects of mathematics have generally not been publicized or appreciated. • Partly due to influence from computer science, mathematics has begun to embrace its experimental side: • Experimental Mathematics journal (started 1992) (www.expmath.org)

Epistemology • Many believe that mathematical proof is a fundamentally more trustworthy source of knowledge than experimentation. • Mathematics as the “Queen of the sciences” • I believe this erroneous belief is based on a long tradition of rationalism that ignores the fact that mathematics is a human enterprise, and therefore equally based in the empirical world. • Rationalism vs. empiricism is a 2,400 year long philosophical debate, which, apparently, continues today to impact computer science methodology.

Empirical Basis of Mathematics • All mathematical proofs rely on accepting a set of fundamental axioms without proof. • Gödel proved that even the consistency of the axioms of arithmetic can not be proven formally. • Newsflash! (1931) “Gödel knocks Queen from throne”. • Most humans are willing to accept these axioms based on intuitions that are based on empirical experience and/or innate pre-conceptions that have evolved to increase survival and reproduction. • These intuitions may be misleading. • Non-Euclidian geometry and General Relativity • Mathematics: The Loss of Certainty, M. Kline, 1982.

Philosophy of Mathematics • Platonism is a mystical belief in a non-material world of mathematical concepts to which humans somehow have infallible access. • I believe a much more scientifically defensible view is that mathematics is based on human psychological processing that is grounded in the material world. • I recommend the following recent books: • Number Sense: How the Mind Creates Mathematics, S. Dehaene, 2000. • Where Mathematics Comes From: How the Embodied Mind Brings Mathematics into Being, G. Lakoff & R. Nuñez, 2001. • The Math Gene: How Mathematical Thinking Evolved, K.J. Devlin, 2001.

Conclusions • In contradiction to exaggerated formalist rhetoric, experimental computer science can be well-motivated and rigorous. • Some computational problems are fundamentally empirical and properly approached using experimental methodology. • Sometimes the right thing to do is to prove a theorem, sometimes to run an experiment. • Compared to theoretical CS, rigorous experimental CS is relatively immature. • Progress in experimental CS requires changes to existing educational practice and curricula.

Computing as an Experimental Science or Exaggerated Formalist Rhetoric Considered Harmful