1 / 35

Evaluation of MineSet 3.0

Evaluation of MineSet 3.0. By Rajesh Rathinasabapathi S Peer Mohamed Raja. Guided By Dr. Li Yang. MineSet. Introduction Problem in existing analytical tools MineSet Client/Server Architecture MineSet Enterprise Manager. INTRODUCTION. Product of Silicon Graphics Inc. Supported by

bermanj
Download Presentation

Evaluation of MineSet 3.0

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang

  2. MineSet • Introduction • Problem in existing analytical tools • MineSet Client/Server Architecture • MineSet Enterprise Manager

  3. INTRODUCTION • Product of Silicon Graphics Inc. • Supported by • Windows NT 4.0 (Server & Client) • Windows 95 & 98 (Client) • Memory varies with the size of data • 64MB RAM • 1024  768 Resolution with 65K colors • IRIX 6.4 and above (for server parellelization)

  4. MineSet • Helps to pinpoint and understand the complex patterns, relationships, and anomalies that are implicitly present in your data.

  5. Problems in Existing Tools • You must specify directly any relationships between data elements. • ExampleQuery for all the sales by region. • Presupposes you have an idea that sales vary by region. • Relationships may be uncovered that you did not know existed.

  6. MineSet

  7. MineSet Client/Server Architecture • Client and Server can be on a same systemor on a different system • Server responsibilities • Accessing Data files • Data Transformations (Data Mover) • Running Mining operations (Classification, Association, etc…) • Generating visualization files

  8. MineSet Client/Server Architecture • Client’s Responsibility Providing GUI • Integration with other systems • Support for ODBC complaint databasefor example SQL Server, DB2, Oracle ,Sybase • Open Architecture allows you to coexist with other tools. For example SAS • inegrate with web using hotlinks • Custom Algorithms

  9. MineSet Enterprise Manager • MineSet Tool Manager • MineSet 3D Visualizer • MineSet Cluster Visualizer • MineSet Record Visualizer • MineSet Statistics Visualizer

  10. MineSet Tool Manager • Data Access and Data Transformation. • Data Destinations.

  11. MineSet Tool Manager

  12. Basic Transformations • Adding New Columns • Removing existing Columns • Aggregation • Filtering • Sampling • Binning • Apply Classifier

  13. Adding New Columns Addition of new columns is possible to the existing dataset. Columns added can be derived from existing column by using expressions.

  14. Removing Existing Columns Removing columns that are not persistent, are redundant, or contain obvious, uninteresting predictors.

  15. Aggregation Grouping records together and finding the sum, maximum, minimum, or average.

  16. Filtering Visualization To view strongest rules or the most profitable customer segments

  17. Sampling Sampling the data to get a random subset of the data

  18. Binning Breaking up of continuous range of data into discrete segments

  19. Apply Classifier

  20. Data Mining Tools Association. Classification. Cluster. Regression. Column Importance.

  21. Column Importance Column importance helps one to discover which are the most important columns in predicting different values for a label column one chooses. This unlike clustering lets one to decide which label one will use to determine the importance of columns.

  22. Column Importance Options when finding column importance • One can specify Num of columns to find. • Either to use weights or not. • Specify the weight. • No of additional importance columns. • Specify purity of the columns present on right or left.

  23. Association Rules Options • Height –Bars. • Height – Disks. • Color – Bars. • Color – Disks. • Label – Bars. Confidence (1-100) Support (1-100) Use weights or not. Unlimited items per rule / the no of items per rule.

  24. Association Rules

  25. Association Rules Interpreting association rules in Scatter Visualizer: • The LHS represents items in this axis. • The RHS represents items in this axis. • Bar height corresponds to support. • Bar colors represent lift. • By pointing on the object on the bar one can get the specifications of the bar.

  26. Clustering Single K-means Default method Iterative K-means

  27. Clustering Single K-means Default method In single K means clustering one specifies the number of clusters Iterative K-means In iterative K – means one specifies the minimum, maximum no of clusters

  28. Clustering Options present in creating clusters are : The distance measure (Euclidean / Manhattan). The number of iterations. The Random seeds. The Random seeds.

  29. Clustering

  30. Clustering · The orders in which attributes are displayed represent the importance of the attributes. · The population shows the default settings. · Every column represents the different cluster. · On clicking each column at the top its attribute importance is shown. · Each box represents the max, min, median and deviation of the values in them.

  31. Classifier Classification is the task of assigning a discrete label value to an unlabeled record Different modes : Classifier and Error Classifier Only Estimate Error Learning Curve

  32. Classifier Classifier Mode Classifier mode uses all the available data to build the classifier. It is useful when you are not concerned with error estimation. Classifier and Error It uses the Holdout Error Estimation. Instead of using all the data to build the model, you can hold out the part of the data as a training set to induce the classifier. The classifier and error mode automatically partitions the data set into independent training and test subsets. Holdout ratio/ Random seed.

  33. Classifier Error Estimate It uses the Cross Validation Error Estimation. Cross-validation is used for building the final classifier or for small datasets. Cross-validation is a method for getting a more precise estimate of error. Learning Curve The Learning Curve shows the error of the classifier generated by an inducer in proportion to the number of records used to create the classifier.

  34. Classifier

  35. Classifier The classification process can be induced by the following methods Decision Tree Option Tree Evidence Decision Table

More Related