ARTIFICIAL INTELLIGENCE AND SCIENTIFIC DISCOVERY

ARTIFICIAL INTELLIGENCE AND SCIENTIFIC DISCOVERY Luigi Abruzzese, Luciano Bonvissuto, Giuseppe Carluccio, Mario Ceresa, Michele Garbugli, Davide Lo Pinto, Luca Di Rienzo Residenza Universitaria Torrescalla Milano -Italy

INTRODUCTION Scientific discovery is one of the most characterizing activity of the human mind Can computers emulate human mind in scientific discovery? Or are computers only a strong help in this activity?

Artificial Intelligence Theoretical foundations Methodologies Techniques

Artificial Intelligence Software Hardware Performances of human mind

Philosophical background MODEL ABDUCTION INDUCTION DEDUCTION PHENOMENON ADDUCTION LAW

Historical analysis The early days of artificial intelligence saw various attempts to automate creative tasks of scientific and mathematical inference. Perhaps the earliest examples (on electronic computers) of symbolic mathematical or scientific inference were master’s theses at MIT (J.F. Nolan) and at Temple (H. G. Kahrimanian) in 1953 on analytical differentiation in the calculus. Starting in the 1960’s, Lederberg invented an algorythm for generating molecular structures efficiently, which led to the Stanford Dendral project whose goal was to elucidate molecular structure on the basis of mass spectograms and other experimental evidence.

Space-state search

BACON DISCOVERS KEPLER’S THIRD LAW: The squares of the periods of planets are proportional to the cubes of the mean radii of their orbits: P^2 / D^3.

BACON • AI program developed in 1970s. • BACON is provided with knowledge of certain mathematical relationships. • It carries out a search through the space of possible compositions of those relationships

Heuristics of BACON DmPn =constant

In conclusion: • BACON does not know what it has discovered. It is BACON creators who comprehend the significance of the discovery. So who is the real discoverer, human or machine?

Mathematical definitions and demonstrations Computer (Artificial Intelligence)

The Four Color Theorem

First demonstration 1922 Franklin max 25 regions Reducibility and discharging Heesch Appel e Haken 1476 particular cases 1200 hours processing

a not rigorous demonstration The demonstration couldn’t be verified by a human brain Proving something to the people would mean persuading a sufficient number of qualified people. If we accept this kind of definition, in the future it will be possible that calculators will help men in the discovery of the new laws of math

FROM AUTOMATED DISCOVERY PROGRAMS TO COMPUTATIONAL SCIENTIFIC DISCOVERY SYSTEMS • Up to the 80’s automated discovery programs discovered laws already known • The state space approach leads to the explosion of possibilities • Only the scientists can introduce heuristics that can limit the number of possible states • Who really makes the discovery : man or machines?

RECENT RESEARCH • The idea of totally automated discoveries is abandoned • The new trend is towards computer supported scientific discovery • The new goal is to obtain really new discoveries, that can be published on specialized literature

MACHINE LEARNING Different kinds of machine learning : • Supervised learning • Unsupervised learning • Reinforcement learning

SUPERVISED LEARNING NEURAL NETS • Ensemble of elemental units combined in a reticular structure • Net elements, called neurons, are organized in layers and are tightly interconnected • To each link is associated a weight that represents a kind of inner knowledge MAIN FEATURES • Learning • Prediction TRAINING • Training-set of examples (as input/output) • Learning Algorithm • Weights “calibration” GOAL • Generalization of training results based on test-set ( prediction ) Formal neuron structure TRAINING PHASE

NEURAL NETS The endeavour of emulating the real neural nets leads to conceive various kinds of nets that can be classified according to some parameters : • Use • Learning algorithms • Links structure The most known models are : • Feedforward nets (with back-propagation algorithm,most used) • Associative nets • Stochastic nets • Self-organizing nets (Kohonen, also unsupervised) • Genetic nets ( models from Darwin evolution theory)

UNSUPERVISED LEARNING DATA MINING Process of exploration and analysis ,with automatic and semi-automatic tools, of vast amount of data, oriented to discover significant structures and rules and to develop predictive or explicative models of a specific phenomenon. We have many techniques of data mining : Decisional trees , data warehouse , clustering , associative rules and temporal sequences......

CLUSTERING Cluster: Objects/data collection – Similar compared with each object in the same cluster – Different from other clusters objects CLUSTERING ANALYSIS :To group objects together in cluster • Clustering is defined as unsupervised classification: It doesn’t use any background knowledge on studied data set TIPICAL APPLICATIONS • As stand-alone tool to try to understand how data are distributed (for ex. in genic expression data analysis,astronomic data elaboration....) • As preprocessing pass for other algorithms

REINFORCEMENT LEARNING • System acts directly on problem making attempts • A teacher “rewards” or “punishes” the system through a numerical signal of reinforce , depending on system instant behaviour Rewards and punishments

An example : GOLEM( built by Muggleton and Feng, 1992 ) PROBLEM : Prediction of secondary structure of protein from the _________ sequence of amino acids. • A traditional method used for discovering the secondary structure is X-raycrystallography , but a crystal structure determination may require one or more man-year. • In general, other techniques also used for this problem are costly,time-consuming and often limitated by some proteins parameters (like size ...). • From this the need of computational systems support.

GOLEM : problem description • The two main substructures of proteins are : • α – helix structure • β – filamants structure • GOLEM : • Restricts the field of his analysis to • α– helix proteins • Attempts to predict, from primary _structure, if a particular residue (amino acid) belongs or not to the α– helix_type. β – filaments structure _ α – helix structure

GOLEM FUNCTIONING (1) TRAINING SET : 12 proteins ,non homologous, with well known structures (LEARNING) of α– helix type , comprising 1612 residues. + BACKGROUND KNOWLEDGE = SMALL SET OF RULES used for predicting which residues belong to α–helix _ proteins. TEST SET : 4 proteins (structure known) , α–helix type, comprising 416 _ __ residues ACCURACY(on test set): 81% ( ±2 )

GOLEM FUNCTIONING (2) • Information coded in 1 or 2 parts predicates Ex: α(155C,105) means that a particular protein (155c) residue (in 105 ___ position) is a α–helix type . • Preferential research toward residues that show particular links characters with the others (data mining) • Research of rules carried out with an iterative procedure that involves a bootstrapping learning process. • Then the rules generated by GOLEM can be considered hypothesis about the ways through which α–helix form in nature.They define the pattern of relations that ,if present in the sequence of residues, indicates that a specific residue could be part of a α–helix .

GOLEM RESULTS • One of the rules produced by GOLEM ,concerning protein structure is for example the RULE 12 : There is a α–helix residue in the protein A in position B if : 1 – The residue in B -2 is not proline 2 – The residue in B -1 is neither aromatic nor proline 3 – The residue in B is big, neither aromatic and nor lysine 4 – The residue in B +1 is hydrophobic and not lysine 5 – The residue in B +2 is neither aromatic nor proline 6 – The residue in B +3 is neither aromatic nor proline, and or small or polar and, 7 – The residue in B +4 is hydrophobic and not lysine This rule has an ACCURACY of 95% in training and of 81% on test set. This rule was not known before GOLEM discovered it and it has contributed to one of the most important actual problem of natural sciences. That’s why we can credit to GOLEM the discover of a natural law.

CONCLUSION We have presented some attempts of creating artificial intelligence programs that make scientific discoveries From the historical analysis we have shown that the first idea of A.I programs that autonomously discover scientific laws has been abandoned The new trend is that of computer supported scientific discovery in which Artificial Intelligence is a useful and sometimes necessary tool for scientific research

ARTIFICIAL INTELLIGENCE AND SCIENTIFIC DISCOVERY

ARTIFICIAL INTELLIGENCE AND SCIENTIFIC DISCOVERY

Presentation Transcript

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Scientific Discovery

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

Artificial Intelligence

SCIENTIFIC DISCOVERY

Artificial Intelligence

Artificial life and artificial intelligence

Artificial Intelligence

Artificial Intelligence

ARTIFICIAL INTELLIGENCE