1 / 14

Introduction to knowledge Discovery in Databases and Data Mining

Introduction to knowledge Discovery in Databases and Data Mining. Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute. What is Data Mining? or more generally, Knowledge Discovery in Databases (KDD).

oneida
Download Presentation

Introduction to knowledge Discovery in Databases and Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction toknowledge Discovery in Databases and Data Mining Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute

  2. What is Data Mining?or more generally, Knowledge Discovery in Databases (KDD) “Non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” [Fayyad et al. 1996] • Raw Data Data Mining • Patterns • Analytical Patterns (rules, decision trees) • Statistical Patterns (data distribution) • Visual Patterns Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. "From Data Mining to Knowledge Discovery in Databases" AAAI Magazine, pp. 37-54. Fall 1996.

  3. Need for Data Mining • Data are being gathered and stored extremely fast • Computational tools and techniques are needed to help humans in summarizing, understanding, and taking advantage of accumulated data

  4. data analysis • data mining • analytical • statistical • visual clean data models • data “pre”- • processing • noisy/missing data • dim. reduction data sources • data • management • databases • data warehouses • model/pattern • evaluation • quantitative • qualitative data “good” model • model/patterns • deployment • prediction • decision support new data Data Analysis (KDD)Process

  5. KDD is Interdisciplinarytechniques come from multiple fields • Databases • Contributes efficient data storage, data cleansing, and data access techniques • Data Visualization • Contributes visual data displays and data exploration • High Performance Comp. • Contributes techniques to efficiently handling complexity • Application Domain • Contributes domain knowledge • Machine Learning (AI) • Contributes (semi-)automatic induction of empirical laws from observations & experimentation • Statistics • Contributes language, framework, and techniques • Pattern Recognition • Contributes pattern extraction and pattern matching techniques

  6. Data Mining Modes • Exploratory (discovery) • Prescriptive patterns • Patterns for predicting behavior of newly encountered entities • Descriptive patterns • Patterns for presenting the behavior of observed entities in a human-understandable format • Confirmatory (verification) • Given a hypothesis, verify its validity against the data

  7. IF A & B THEN IF A & D THEN 0.5 IF a & b & c THEN d & k IF k & a THEN e A B C D A, B -> C 80% C, D -> A 22% 0.75 0.3 What do you want to learn from your data?KDD approaches regression classification clustering Data change/deviation detection summarization dependency/assoc. analysis

  8. Commercial Data Mining Systems Matlab Oracle data mining and lots more ….

  9. academic Data Mining Systems WEKAFrank et al., University of Waikato, New Zealand RapidMinerKlinkenberget al., Univ. of Dortmund, Germany R Programming LanguageRoss Ihaka and Robert Gentleman, Univ. of Auckland, New Zealand and many more ….

  10. Data Mining Resources – Journals • Data Mining and Knowledge Discovery Journal Newsletters: • ACM SIGKDD Explorations Newsletter Related Journals: • TKDE: IEEE Transactions in Knowledge and Data Engineering • TODS: ACM Transaction on Database Systems • JACM: Journal of ACM • Data and Knowledge Engineering • JIIS: Intl. Journal of Intelligent Information Systems

  11. Data Mining Resources – Conferences • KDD: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining • ICDM: IEEE International Conference on Data Mining, • SIAM International Conference on Data Mining • PKDD: European Conference on Principles and Practice of Knowledge Discovery in Databases • PAKDD Pacific-Asia Conference on Knowledge Discovery and Data Mining • DaWak: Intl. Conference on Data Warehousing and Knowledge Discovery Related Conferences: • ICML: Intl. Conf. On Machine Learning • IDEAL: Intl. Conf. On Intelligent Data Engineering and Automated Learning • IJCAI: International Joint Conference on Artificial Intelligence • AAAI: American Association for Artificial Intelligence Conference • SIGMOD/PODS: ACM Intl. Conference on Data Management • ICDE: International Conference on Data Engineering • VLDB: International Conference on Very Large Data Bases

  12. Data Mining Resources – Books, Datasets, … See resources webpage at: • http://web.cs.wpi.edu/~ruiz/KDDRG/resources.html

  13. Summary • KDD is the “non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data” • The KDD process includes data collection and pre-processing, data mining, and evaluation and validation of those patterns • Data mining is the discovery and extraction of patterns from data, not the extraction of data • Important challenges in data mining: privacy, security, scalability, real-time, and handling non-conventional data

  14. KDDRG: KNOWLEDGE DISCOVERY AND DATA MINING RESEARCH GROUP • KDDRG Meetings • WHEN? Fridays at 1 pm • WHERE? Beckett Conference Room in Fuller Labs • To receive announcements of the talks, please subscribe to the KDDRG mailing list • I’ll send you an email with instructions on how to do so

More Related