1 / 24

Contributions to MiningMart

Contributions to MiningMart. Petr Berka Laboratory for Intelligent Systems University of Economics, Prague berka@vse.cz. University of Economics, Prague. LISp - Laboratory for Intelligent Systems

amal
Download Presentation

Contributions to MiningMart

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Contributions to MiningMart Petr Berka Laboratory for Intelligent Systems University of Economics, Prague berka@vse.cz

  2. University of Economics, Prague • LISp - Laboratory for Intelligent Systems • SALOME - Laboratory for Multidisciplinary Approaches to Decision-making Support in Economics and Management MiningMart prezentation (c) Petr Berka, LISp, 2001

  3. LISp research • probabilistic methods - decomposable probability models and bayesian networks • symbolic ML methods - 4FT association rules and decision rules • logical calculi for knowledge discovery in databases MiningMart prezentation (c) Petr Berka, LISp, 2001

  4. Organized conferences ECML’97, PKDD’99 Organized workshops Discovery Challenge (PKDD‘99, PKDD2000, PKDD20001), WUPES‘97, WUPES2000 International Projects MLNet, Sol-Eu-Net, EUNITE, MUM, MGT KDNet LISp activities MiningMart prezentation (c) Petr Berka, LISp, 2001

  5. SALOME research • Quantitative and AI (pattern recognition, fuzzy, neural nets) approaches to support of decision making in econmics and management MiningMart prezentation (c) Petr Berka, LISp, 2001

  6. Organized workshops STIPR‘97, MME‘99 International Projects Univ. Salzburg, Univ. Hokkaido, Univ. Cambridge SALOME activities MiningMart prezentation (c) Petr Berka, LISp, 2001

  7. LISp software • LISp-Miner (data mining system) • DataSource (fordata manipulation) • 4FT Miner (4FT association rules) and • KEX (decision rules) • experimental software for building graphical models • preprocessing procedures • related to KEX • based on information theoretic approach MiningMart prezentation (c) Petr Berka, LISp, 2001

  8. LISP-Miner procedures • DataSource creating new (virtual) attributes using SQL ekvidistant and equifrequent discretization grouping attribute values computing attribute-value frequencies MiningMart prezentation (c) Petr Berka, LISp, 2001

  9. LISP-Miner procedures • 4FT-Miner (GUHA procedure) 4FT association rules in the form Ant ~ Suc / Cond • KEX weighted decision rules in the form Ant  C (weight) MiningMart prezentation (c) Petr Berka, LISp, 2001

  10. 4FT-Miner basic idea • Generate a (potential) rule, e.g. COLOUR(red)  SIZE(small) 0.9, 20 TEMP(high) AGE(21-30)  SALARY(low) 0.85,15 PAYMENTS (High)  LOAN(bad) • Verify a rule using four-fold table MiningMart prezentation (c) Petr Berka, LISp, 2001

  11. KEX basic idea • Generate a (potential) rule, e.g. YEARS-IN-COMPANY(0-3)  AGE(0-25)  LOAN(GOOD) • If rule refines current set of rules (validity a/(a+b) differs from weight inferred during consultation) add into rule base with proper weight MiningMart prezentation (c) Petr Berka, LISp, 2001

  12. LISp-Miner architecture MetaData (ODBC ACCESS) Results Data (ODBC ACCESS) LM Windows MiningMart prezentation (c) Petr Berka, LISp, 2001

  13. Preprocessing (LISp) • KEX-oriented • (fuzzy) discretization + grouping of values • computing the amount of noise in data • random sampling + balancing of data • handling missing values • Information theory • attribute selection • attribute grouping MiningMart prezentation (c) Petr Berka, LISp, 2001

  14. … fuzzy discretization MiningMart prezentation (c) Petr Berka, LISp, 2001

  15. … amount of noise Amount of noise: 20% max. possible accuracy = 80% MiningMart prezentation (c) Petr Berka, LISp, 2001

  16. … data sampling • random split into training and testing set • select random stratified sample • balance unbalanced classes MiningMart prezentation (c) Petr Berka, LISp, 2001

  17. … handling missing values • remove example • substitute missing with new value • substitute missing with majority value • proportional substitution MiningMart prezentation (c) Petr Berka, LISp, 2001

  18. … information theory • Attribute selection - based on mutual information • Attribute grouping - based on information content MiningMart prezentation (c) Petr Berka, LISp, 2001

  19. Preprocessing architecture Input data (ASCII) Output data (ASCII) procedure Results Data (ASCII) procedure MiningMart prezentation (c) Petr Berka, LISp, 2001

  20. SALOME software • Feature Selection Toolbox (Multi-Purpose Tool for Pattern Recognition) • feature selection • approximation-based modeling • classification a consulting system helping to choose the most suitable method is being developed MiningMart prezentation (c) Petr Berka, LISp, 2001

  21. Search strategies for FS Search for a subset maximizing a criterion function (distance, divergence): • with apriori information • exhaustive search • branch and bound based algorithms • floating search algorithms • without apriori information • approximation method • divergence method MiningMart prezentation (c) Petr Berka, LISp, 2001

  22. FST architecture Data (ASCII) Results FST Windows MiningMart prezentation (c) Petr Berka, LISp, 2001

  23. References LISp-Miner: • Berka,P. - Ivanek,J.: Automated Knowledge Acquisition for PROSPECTOR-like Expert Systems. In: (Bergadano, deRaedt eds.) Proc. ECML'94, Springer 1994, 339-342. • Berka,P. - Rauch,J.: Data Mining using GUHA and KEX. In: (Callaos, Yang, Aguilar eds.) 4th. Int. Conf. on Information Systems, Analysis and Synthesis ISAS'98, 1998, Vol 2, 238- 244. • Rauch,J.: Classes of Four Fold Table Quantifiers. In: (Zytkow, Quafafou eds.) Principles of Data Mining and Knowledge Discovery. Springer 1998, 203 - 211. MiningMart prezentation (c) Petr Berka, LISp, 2001

  24. References Preprocessing: • Bruha,I. - Berka,P.: Discretization and Fuzzification of Numerical Attributes in Attribute-Based Learning. In: Szepaniak, Lisboa, Kacprzyk (eds.): Fuzzy Systems in Medicine, Physica Verlag, 2000, 112-138. • Pudil, P., Novovičová J.: Novel Methods for Subset Selection with Respect to Problem Knowledge, IEEE Transactions on Intelligent Systems - Special Issue on Feature Transformation and Subset Selection 1998, 66-74 • J. Zvarova and M. Studeny: Information theoretical approach to constitution and reduction of medical data. International Journal of Medical Informatics 45 (1997), n. 1-2, pp. 65-74. MiningMart prezentation (c) Petr Berka, LISp, 2001

More Related