1 / 12

Data Mining Quantitative Values

This article discusses the challenges of data mining with quantitative values and provides approaches to convert the data for associational rule mining. It also explores the speed and efficiency differences between static and dynamic conversion methods. The article includes sample rules generated using different conversion approaches and suggests future work to improve the process.

msanon
Download Presentation

Data Mining Quantitative Values

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Quantitative Values By Noah Clemons Andrew Seidel

  2. Associational Rule Mining • Data in market basket format: Each “Basket” is list of items (integers) present. • Returns rules based on items. • Rules useful to discover trends.

  3. Quantitative Data

  4. Problem • Data not in market basket format. • How do we fit data to necessary format? • Convert the data.

  5. Conversion table

  6. Approaches To Conversion • Static Approach: • Convert data before using associational mining tool. • Good if doing a lot of runs on one dataset with one conversion table. • Speed depends on tools used to convert.

  7. Approaches To Conversion • Dynamic Approach • Convert data as it is used by associational mining tool. • Can be much faster than Static. • Good for changing datasets or conversion tables.

  8. Static vs. Dynamic • Speed of 16 Static Runs: • 769.05 Seconds • Speed of 16 Dynamic Runs: • 27.53 Seconds • Static 27.9 times slower.

  9. Rules • Run with 20 Buckets, .1% Support 80% Confidence • 646 Rules • Sample Rules: • AB_551_558 RBI_116_147 ==> HR_37_51 (0.866667, 13) • BB_35_37 H_193_226 ==> AB_637_689 (0.846154, 11) • IBB_18_31 SO_136_180 R_112_137 ==> RBI_116_147 (0.833333, 5) • GIDP_5 AB_543_550 ==> 3B_2 (0.833333, 5)

  10. Rules • Run with 80 Buckets, .1% Support 80% Confidence • 60 Rules • Sample Rules: • H_112_114 2B_22 ==> 3B_2 (0.833333, 5) • AB_465_469 SH_3 ==> 3B_4 (0.833333, 5) • SB_25 HBP_4 ==> GIDP_8 (0.833333, 5) • H_200_205 IBB_4 ==> CS_4 (1, 5) • BB_57 SB_1 ==> CS_1 (0.833333, 5)

  11. Problems Encountered • Hard to pick good values for support, confidence, conversion table. • Many values related, lead to large rules. • At Bats, Games, Etc.

  12. Future Work • Use correlated mining to find items. • Create tool to find optimum values for support, confidence, and conversion table.

More Related