1 / 16

Knowledge Discovery in Database (KDD)

Knowledge Discovery in Database (KDD). Knowledge Discovery Process. The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database It includes data selection , cleaning , enrichment , coding , data mining , and reporting

melina
Download Presentation

Knowledge Discovery in Database (KDD)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowledge Discovery in Database (KDD)

  2. Knowledge Discovery Process • The whole process of extraction of implicit, previously unknown and potentially useful knowledge from a large database • It includes data selection, cleaning, enrichment, coding, data mining, and reporting • Data Mining is the key stage of Knowledge Discovery Process • The process of finding the desired information from large database

  3. Knowledge Discovery Process • Example: the database of a magazine publisher which sells five types of magazines– cars, houses, sports, music and comics • Data mining: Find interesting customer properties • What is the profile of a reader of a car magazine? • Is there any correlation between an interest in cars and an interest in comics? • Apply knowledge discovery process

  4. Data Selection • Select the information about people who have subscribed to a magazine

  5. Cleaning • Pollutions: Type errors, moving from one place to another without notifying change of address, people give incorrect information about themselves • Pattern Recognition Algorithms

  6. Cleaning • Lack of domain consistency

  7. Enrichment • Need extra information about the clients consisting of date of birth, income, amount of credit, and whether or not an individual owns a car or a house

  8. Enrichment • The new information need to be easily joined to the existing client records • Extract more knowledge

  9. Coding • We select only those records that have enough information to be of value (row) • Project the fields in which we are interested (column)

  10. Coding

  11. Coding • Code the information which is too detailed • Address to region • Birth date to age • Divide income by 1000 • Divide credit by 1000 • Convert cars yes-no to 1-0 • Convert purchase date to month numbers starting from 1990 • The way in which we code the information will determine the type of patterns we find • Coding has to be performed repeatedly in order to get the best results

  12. Coding • We are interested in the relationships between readers of different magazines • Perform flattening operation

  13. Knowledge Discovery Process

  14. Business-Question-Driven Process

  15. Steps of a KDD Process • Learning the Application Domain • Relevant Prior Knowledge and Goals of Application • Creating a Target Data Set • Data Selection • Data Cleaning and Enrichment • May Take 60% of Effort • Data Reduction and Transformation (Coding) • Find Useful Features, Dimensionality Reduction • Choosing Functions of Data Mining • Summarization, Association, Classification, Regression, Clustering, … • Choosing the mining algorithm(s) • Data mining • Search for Patterns of Interest • Pattern Evaluation and Knowledge Presentation • Visualization, Transformation, Removing Redundant Patterns, etc. • Use of Discovered Knowledge

  16. Exercises 1 • 何謂 RFM 指標? 功能為何? • 何謂資料探勘 (Data Mining)?目標為何? • 為何小型公司不需要資料探勘 ? • 大型公司要如何了解客戶? • 請描述一個資料探勘的應用實例 (不可以與投影片的例子相同). • 請列出並解釋Knowledge Discovery in Database (KDD) 處理的步驟.

More Related