Computing as an Experimental Science orExaggerated Formalist RhetoricConsidered Harmful Raymond J. Mooney Dept. of Computer Sciences University of Texas at Austin
Philosophy and Methodology Matters • One’s beliefs about the philosophy and methodology of computer science greatly impacts: • The problems on which one chooses to work. • The approach one takes to these problems. • One’s perception of the significance of results and the quality of others’ work. • One’s beliefs about the education and training of students and CS curriculum issues.
Programs as Mathematical Objects • A computer program is a formally defined mathematical object, i.e. a Turing machine. • Properties of such a mathematical object can be formally proven: • Correctness according to a formal specification. • Termination. • Time and space complexity: • Worst case. • Average case (assuming a formal specification of the input distribution).
Exaggerated Formalist Rhetoric • Since programs are formal mathematical objects, experiments and empirical analysis have no place in computer science. • Computer science is mathematics and consists of definitions, theorems, and proofs. • Without theorems, there is no rigorous science, just unprincipled hacking. • Students primarily need to be taught appropriate mathematics and how to prove theorems. • Students do not need to be taught experimental methodology appropriate for natural and social sciences.
Formal and Empirical Specifications • Some problems have clear, mathematical, formal specifications. • These lend themselves to theoretical analysis. • Some problems have “empirical” specifications that depend on physical (biological/psychological/social) phenomena that, at least currently, have no adequate mathematical formalization. • These requireexperimental analysis.
A Tale of Two Bugs • Formalists’ Poster Child: The Intel Pentium division bug illustrates a problem (floating-point division) that has a clear formal definition. • Experimentalists’ Poster Child: The Apple Newton’s insufficiently accurate hand-writing recognition illustrates a problem whose specification relies on a psychological phenomenon with no known formalization: human visual perception of written language.
9/3 4 A CPU is a terrible thing to waste. Only formal verification can prevent bugs.
Tino1 exon qRH Final exam 9AM Only experiments can ensure accuracy
Formalist $100K Challenge Problem! • If you believe that hand-writing recognition can be given a formal specification suitable for mathematical verification, then I strongly encourage you to write it down! • If, in my lifetime, you can formulate such a specification and use it to develop and verify hand-writing recognition software and demonstrate perfect accuracy on a standard, realistic benchmark dataset… • I will personally award you a $100,000 prize!
Other Problems with Empirical Specifications • Speech recognition. • Natural-language question answering. • Filtering spam from email. • Retrieval of documents or images for a web-search query that a human user finds relevant. • Predicting the secondary or tertiary structure of proteins from amino-acid sequences. • Lossy compression of images or movies that are still “acceptable” to human perception. • Rendering images or visualizations that humans perceive as natural or useful for solving problems.
Choosing the Right Methodology • When the problem is easily formalized, one should attempt to prove one’s algorithms and programs correct. • When the problem is empirical, one should run well-designed, controlled experiments on real data, using multiple trials, and analyze the statistical significance of results with respect to a well-defined hypothesis.
Formal and Empirical Input Distributions • Some problems have clear formal input distributions that lend themselves to theoretical average-case analysis. • Some problems have “empirical” input distributions that depend on phenomena in the physical (biological/ psychological/ social) world that, at least currently, have no adequate mathematical formalization.
Average-Case Analysis Examples • Formal Distribution: Time to sort a list of randomly ordered items. • Empirical Distribution: Time to run a typical user program, where program behavior can vary with respect to: • Locality of memory references • Predictability of branch outcomes • … Human-written programs for solving typical human problems exhibit regularities not present in programs randomly generated by any known statistical distribution.
Other Empirical Problem Distributions • Typical traveling-salesman problems encountered in applications and industry. • Typical scheduling problems encountered in applications and industry. • Typical problems for automated theorem proving. • TPTP problem set
Experimental Methodology 101 • An appropriate, meaningful measure of performance: • Character error rate. • A clear hypothesis. • Method A has lower character error rate than method B on English non-cursive handwriting. • A large set of realistic benchmark data. • Millions of words of human-labeled handwritten text from a diverse set of English writers. • A clear separation of training (development) and test data. • Labeled hand-written text that the developers have never seen.
Experimental Methodology 101 (cont.) • A well-controlled study. • The only difference between the two conditions is the algorithm being tested (e.g. same training and test data). • Multiple trials on different independent data sets in order to measure variance. • Statistical analysis demonstrating significant difference. • Significant t-test result (p< 0.05) on the difference between the mean character error rates of A and B in order to reject the “null hypothesis” that performance difference is attributable to random variation.
CS as Poor Experimental Science • Generally, computer scientists’ experimental methodology is severely lacking. • “Experimental” computer science frequently means hacking-up a new system and illustrating performance on a few demo problems. • “Look Ma, no hands” • “Dancing bears” • Even when quantitative results are gathered and presented, frequently there is no: • Clearly stated hypothesis that is being tested by a well-controlled experiment. • Measure of variance or statistical analysis of results.
The Poor Experimental Methodology of a Turing-Award Winner • Perhaps my own research area of machine learning has become one of the most experimentally rigorous areas in CS. • An ICML-01 paper on classifying gene-expression data co-authored by R. Karp was properly criticized during Q&A after the presentation for lacking statistical analysis of its experimental results. • This lapse by leading computer scientists was quite surprising to my 1st year graduate student.
CS Education in Experimental Methods • In most natural and social sciences, experimental methodology and statistical analysis of results is specifically taught in laboratory or statistics classes. • Computer scientists receive virtually no formal training in basic experimental methodology or statistical analysis. • I had to learn it from psychologists! • I have to teach it in a CS graduatedepth course! • CS curricula assume theory is the only source of rigor.
Misapplied Formalism • Sometimes researchers misapply formal methods to fundamentally empirical problems. • A particular formal specification or input distribution is assumed and analyzed. • Without evidence, this formalism is motivated by, or claimed to be relevant to, some important empirical problem. • The result is an insignificant theoretical result that has little or no bearing on the problem of interest. • For empirical problems, experimental evidence must be presented to demonstrate that a particular formalism truly characterizes the actual problem.
Beauty is NOT Our Primary Business • Frequently, striving for elegant formalism leads some computer scientists to study mathematical problems that are mere caricatures of important empirical problems. • They focus on what can be proven and ignore the complexity of the real problem. • Proving theorems about caricatures of empirical problems contributes little to either theoretical or applied computer science. • Science should focus on demonstrably solving interesting, important problems, not on formulating elegant formalisms that do not reflect reality.
Kepler vs. Keats • J. Kepler wasted years of his life trying to model planetary orbits with elegant, beautiful circles before empirical data forced him to realize that astronomical reality was more complex. • J. Keats makes nice poetry but lousy science. Beauty is truth, Beauty is in the eye of the beholder. truth beauty. Beauty is only skin deep. • In science, truth is a theory that accurately predicts relevant empirical data. but
Experimental Analysis of Formal Problems • Although a problem may have a clear formal definition, theoretical analysis may currently be intractable. • Chess. • Nonlinear dynamic systems. • Cellular automata. • Random satisfiability problems. • In this case, experimental analysis may also be the best approach. • Experimentation may result in conjectures that may subsequently be proven.
Experimental Mathematics • Many conjectures in mathematics originate from empirical observations. • Fermat’s last theorem • Goldbach’s conjecture • P NP • The experimental aspects of mathematics have generally not been publicized or appreciated. • Partly due to influence from computer science, mathematics has begun to embrace its experimental side: • Experimental Mathematics journal (started 1992) (www.expmath.org)
Epistemology • Many believe that mathematical proof is a fundamentally more trustworthy source of knowledge than experimentation. • Mathematics as the “Queen of the sciences” • I believe this erroneous belief is based on a long tradition of rationalism that ignores the fact that mathematics is a human enterprise, and therefore equally based in the empirical world. • Rationalism vs. empiricism is a 2,400 year long philosophical debate, which, apparently, continues today to impact computer science methodology.
Empirical Basis of Mathematics • All mathematical proofs rely on accepting a set of fundamental axioms without proof. • Gödel proved that even the consistency of the axioms of arithmetic can not be proven formally. • Newsflash! (1931) “Gödel knocks Queen from throne”. • Most humans are willing to accept these axioms based on intuitions that are based on empirical experience and/or innate pre-conceptions that have evolved to increase survival and reproduction. • These intuitions may be misleading. • Non-Euclidian geometry and General Relativity • Mathematics: The Loss of Certainty, M. Kline, 1982.
Philosophy of Mathematics • Platonism is a mystical belief in a non-material world of mathematical concepts to which humans somehow have infallible access. • I believe a much more scientifically defensible view is that mathematics is based on human psychological processing that is grounded in the material world. • I recommend the following recent books: • Number Sense: How the Mind Creates Mathematics, S. Dehaene, 2000. • Where Mathematics Comes From: How the Embodied Mind Brings Mathematics into Being, G. Lakoff & R. Nuñez, 2001. • The Math Gene: How Mathematical Thinking Evolved, K.J. Devlin, 2001.
Conclusions • In contradiction to exaggerated formalist rhetoric, experimental computer science can be well-motivated and rigorous. • Some computational problems are fundamentally empirical and properly approached using experimental methodology. • Sometimes the right thing to do is to prove a theorem, sometimes to run an experiment. • Compared to theoretical CS, rigorous experimental CS is relatively immature. • Progress in experimental CS requires changes to existing educational practice and curricula.