1 / 25

An Excel-based Data Mining Tool

An Excel-based Data Mining Tool. Chapter 4. 4.1 The iData Analyzer. 4.2 ESX: A Multipurpose Tool for Data Mining. 4.3 iDAV Format for Data Mining. 4.4 A Five-step Approach for Unsupervised Clustering. Step 1: Enter the Data to be Mined Step 2: Perform a Data Mining Session

reilly
Download Presentation

An Excel-based Data Mining Tool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Excel-based Data Mining Tool Chapter 4

  2. 4.1 The iData Analyzer

  3. Figure 4.1 The iDA system architecture

  4. Figure 4.2 A successful installation

  5. 4.2 ESX: A Multipurpose Tool for Data Mining Figure 4.3 An ESX concept hierarchy

  6. 4.3 iDAV Format for Data Mining

  7. 4.4 A Five-step Approach for Unsupervised Clustering Step 1: Enter the Data to be Mined Step 2: Perform a Data Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Individual Class Results Step 5: Visualize Individual Class Rules

  8. Step 1: Enter The Data To Be Mined Figure 4.4 The Credit Card Promotion Database

  9. Step 2: Perform A Data Mining Session Figure 4.5 Unsupervised settings for ESX

  10. Figure 4.6 RuleMaker options

  11. Step 3: Read and Interpret Summary Results Class Resemblance Scores Domain Resemblance Score Domain Predictability

  12. Figure 4.8 Summery statistics for the Acme credit card promotion database

  13. Figure 4.9 Statistics for numerical attributes and common categorical attribute values

  14. Step 4: Read and Interpret Individual Class Results Class Predictability is a within-class measure. Class Predictiveness is a between- class measure.

  15. Figure 4.10 Class 3 summary results

  16. Figure 4.11 Necessary and sufficient attribute values for Class 3

  17. Step 5: Visualize Individual Class Rules Figure 4.7 Rules for the credit card promotion database

  18. 4.5 A Six-Step Approach for Supervised Learning Step 1: Choose an Output Attribute Step 2: Perform the Mining Session Step 3: Read and Interpret Summary Results Step 4: Read and Interpret Test Set Results Step 5: Read and Interpret Class Results Step 6: Visualize and Interpret Class Rules

  19. Read and Interpret Test Set Results Figure 4.12 Test set instance classification

  20. 4.6 Techniques for Generating Rules • Simple Procedure for Creating Best Set of Covering Rules • Choose an attribute that best differentiate all domains. • Use the attribute to further subdivide instances into classes. • For each subclass created in step 2 • 3.1 If the instances in the subclass satisfy a predefined criteria • Then generate a defining rule for the subclass. • 3.2 If the subclass does not satisfy the predefined criteria • Then repeat step 1

  21. 4.6 Techniques for Generating Rules (RuleMaker) Define the scope of the rules. Choose the instances. Set the minimum rule correctness. Define the minimum rule coverage. Choose an attribute significance value.

  22. 4.7 Instance Typicality • The average similarity of instance to all other instances within its class. • Identify prototypical and outlier instances. • Select a best set of training instances. • Used to compute individual instance classification confidence scores.

  23. Figure 4.13 Instance typicality

  24. 4.8 Special Considerations and Features Avoid Mining Delays – at some point copy the original data into another Excel sheet The Quick Mine Feature – recommended when the dataset contains more than 2000 instances Erroneous and Missing Data – blank lines, beyond the last column, invalid characters

More Related