1 / 17

Experiment Databases: Towards better experimental research in machine learning and data mining

Experiment Databases: Towards better experimental research in machine learning and data mining. Hendrik Blockeel Katholieke Universiteit Leuven. Motivation. Much research in ML / DM involves experimental evaluation Interpreting results is more difficult than it may seem

baruch
Download Presentation

Experiment Databases: Towards better experimental research in machine learning and data mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiment Databases:Towards better experimental research in machine learning and data mining Hendrik Blockeel Katholieke Universiteit Leuven

  2. Motivation • Much research in ML / DM involves experimental evaluation • Interpreting results is more difficult than it may seem • Typically, a few specificimplementations of algorithms, with specificparametersettings, are compared on afewdatasets, and then general conclusions are drawn • How generalizable are these results really? • Evidence exists that too general conclusions are often drawn • E.g., Perlich & Provost: different relative performance of techniques depending on size of dataset

  3. Very sparse evidence Algorithm parameter space (AP) x x A few points in an N-dim space, where N is very large: very sparse evidence! x Dataset space (DS)

  4. An improved methodology • We here argue in favour of an improved experimental methodology: • Perform much more experiments • Better coverage of algorithm – dataset space • Store results in an “experiment database” • Better reproducability • Mine that database for patterns • More advanced analysis possible • The approach shares characteristics of inductive databases: • The database will be mined for specific kinds of patterns: inductive queries, constraint based mining

  5. Classical setup of experiments • Currently, performance evaluations of algorithms rely on few specific instantiationsofalgorithms (implementations, parameters), tested on few datasets (with specific properties), often focusing on specific evaluationcriteria, and with a specific research question in mind • Disadvantages: • Limited generalisability (see before) • Limited reusability of experiments • If we want to test another hypothesis, we need to run new experiments, with a different setup, and now recording other information

  6. Setup of an experiment database • The ExpDB is filled with results from random instantiations of algorithms, on random datasets • Algorithm parameters, dataset properties are recorded • Performance criteria are measured and stored • These experiments cover the whole DS x AP-space Choose alg. Choose param Generate dataset Run Store Alg. par., dataset prop., results CART C4.5 Ripper ... Leaf size > 2 Heuristic = gain ... #examples=1000 #attr=20 ...

  7. Setup of an experiment database • When experimenting with 1 learner, e.g., C4.5: Algorithm parameters Dataset characteristics Performance MLS heur ... Ex Attr Compl ... TP FP RT ... 2 gain ... 1000 20 17 ... 350 65 17 ...

  8. Setup of an experiment database • When experimenting with multiple learners: • More complicated setting, will not be considered here ExpDB Alg. Inst. PI Ex Attr Compl ... TP FP RT ... DT C4.5 C45-1 1000 20 17 ... 1000 20 17 ... DT CART CA-1 2000 50 12 ... 1000 20 17 ... C4.5ParInst CART-ParInst PI MLS heur ... PI BS heur ... C45-1 2 gain ... CA-1 yes Gini ...

  9. Experimental questions and hypotheses • Example questions: • What is the effect of Parameter X on runtime ? • What is the effect of the number of examples in the dataset on TP and FP? • .... • With classical methodology: • Different sets of experiments needed for each • (Unless all questions known in advance, and experiments designed in order to answer all of them) • ExpDB approach: • Just query the ExpDB table for the answer • New question = 1 new query, not new experiments

  10. Inductive querying • To find the right patterns in the ExpDB, we need a suitable query language • Many queries can be answered with standard SQL, but (probably) not all (easily) • We illustrate this with some simple examples

  11. Investigating a simple effect • The effect of #Items on Runtime for frequent itemset algorithms Runtime SELECT NItems, Runtime FROM ExpDB SORT BY NItems x x x x x x x x x x SELECT NItems, AVG Runtime FROM ExpDB GROUP BY NItems SORT BY NItems NItems

  12. Investigating a simple effect • Note: • Setting all parameters randomly creates more variance in the results • In the classical approach, these other parameters would simply be kept constant • This leads to clearer, but possibly less generalisable results • This can be simulated easily in the ExpDB setting! • + : condition is explicit in the query • - : we use only a part of the ExpDB • So, ExpDB needs to have many experiments SELECT NItems, Runtime FROM ExpDB WHERE MinSupport=0.05 SORT BY NItems

  13. Investigating interaction of effects • E.g., does effect of NItems on Runtime change with MinSupport and NTrans? FOR a=0.01, 0.02, 0.05, 0.1 DO FOR b=103,104, 105,106,107 DO PLOT SELECT Nitems, Runtime FROM ExpDB WHERE MinSupport=$a AND $b <= NTrans < 10*$b SORT BY NITems

  14. Direct questions instead of repeated hypothesis testing (“true” data mining) • What is the algorithm parameter that has the strongest influence on the runtime of my decision tree learner? SELECT ParName, Var(A)/Avg(V) as Effect FROM AlgorithmParameters, (SELECT $ParName, Var(Runtime) as V, Avg(Runtime) as A FROM ExpDB GROUP BY $ParName) GROUP BY ParName SORT BY Effect Not (easily) expressible in standard SQL ! (pivoting: possible by hardcoding all attribute names in the query: not very readable or reusable)

  15. A comparison Classical approach ExpDB approach 1) Experiments are goal-oriented 2) Experiments seem more convincing than they are 3) Need to do new experiments when new research questions pop up 4) Conditions under which results are valid are unclear 5) Relatively simple analysis of results 6) Mostly repeated hypothesis testing, rather than direct questions 7) Low reusability and reproducibility 1) Experiments are general-purpose 2) Experiments seem as convincing as they are 3) No new experiments needed when new research questions pop up 4) Conditions under which results are valid are explicit in the query 5) Sophisticated analysis of results possible 6) Direct questions possible, given suitable inductive query languages 7) Better reusability and reproducibility

  16. Summary • ExpDB approach • Is more efficient • The same set of experiments is reusable and reused • Is more precise and thrustworthy • Conditions under which the conclusions hold are explicitly stated • Yields better documented experiments • Precise information on all experiments is kept, experiments are reproducible • Allows more sophisticatedanalysis of results • Interaction of effects, true data mining capacity • Note: interesting for meta-learning!

  17. The challenges... (*) • Good dataset generators necessary • Generating truly varying datasets is not easy • Could start from real-life datasets (build variations) • Extensive descriptions of datasets and algorithms • Vary as many possibly relevant properties as possible • Database schema for multi-algorithm ExpDB • Suitable inductive query languages (*) note: even without solving all these problems, some improvement over the current situation is feasible and easy to achieve

More Related