1 / 50

Data Mining and Data Visualization

Data Mining and Data Visualization. SOM 485 Fall 2007. Getting Started. What is Data Mining? Online Analytical Processing Data Mining Techniques Market Basket Analysis Limitations and Challenges to Data Mining Data Visualization Siftware Technologies. What is Data Mining (DM)?.

bmcdonald
Download Presentation

Data Mining and Data Visualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining and Data Visualization SOM 485 Fall 2007

  2. Getting Started • What is Data Mining? • Online Analytical Processing • Data Mining Techniques • Market Basket Analysis • Limitations and Challenges to Data Mining • Data Visualization • Siftware Technologies

  3. What is Data Mining (DM)? • Group of activities used to find different patterns in data • Information provided through a Data Warehouse • Provides valuable information for different types of research.

  4. Customer Relationship Management (CRM) software is an application that can benefit DM Activities of CRM One-to-One Marketing Sales Force Automation Sales Campaign Management Marketing Encyclopedia Call Center Automation Applications of DM

  5. Verification of DM • Requires a lot of prior knowledge on the decision maker’s part • Used mainly in casinos • i.e. Can determine if a new customer is a high roller, a souvenir buyer, a ticket purchaser, etc. • Uses Siftware to help discover new patterns of customer spending habits • Allows effective targeting to a specific group of customers

  6. Online Analytical Processing • Online Analytical Processing (OLAP) was introduced by E. F. Codd in 1993 • OLAP: computer process that allows a user to extract data from different view points • Scientific and Academic organizations store about 1 terabyte (1 trillion bytes) of new data each day.

  7. OLAP continue… Codd’s 12 Rules for OLAP • Multidimensional View • Transparent to the User • Accessible • Consistent Reporting • Client-Server architecture • Generic Dimensionality • Dynamic Sparse Matrix Handling • Multi-user Support • Cross-Dimensional Operations • Intuitive Data Manipulation • Flexible Reporting • Infinite Levels of Dimension and Aggregation

  8. OLAP: MOLAP & ROLAP • OLAP data is stored in a Multidimensional Database (MBD) • MOLAP: OLAP application that accesses data from a multidimensional database • MBD are frequently created using input from an existing Relational Database • ROLAP: Relational Database server that can work with SQL for portability and scalability.

  9. DATA MINING TECHNIQUES

  10. FOUR MAJOR CATEGORIES • Classification • Association • Sequence • Cluster

  11. CLASSIFICATION • Mining processes intended to discover rules that define whether an item belongs to a particular class of data • Two Sub-processes: 1) Building a Model 2) Predicting Classifications

  12. ASSOCIATION • Techniques that employ association search all details from operational systems for patterns with a high probability of repetition • Example: Market Basket Analysis

  13. SEQUENCE • Time series analysis methods relate events in time based on a series of preceding events • Through analysis, various hidden trends, often highly predictive of future events, can be discovered. • Example: Mail Industry

  14. CLUSTER • To create partitions so that all members of each set are similar according to some metric • Simply a set of objects grouped together by virtue of their similarity or proximity to each other • Example: Credit Card Transactions

  15. DATA MINING TECHNOLOGIES • Providing new answers to old questions • Developing new knowledge and understanding through discovery • Statistical Analysis – statistically evaluating products and making a decision based on logical reasoning • Neural Networks – attempts to mirror the way the human brain works in recognizing patterns by developing mathematical structures with the ability to learn

  16. DATA MINING TECHNOLOGIES CONT’ • Genetic Algorithms and Fuzzy Logic – machine learning techniques derive meaning from complicated and imprecise data and can extract patterns from and detect trends within the data that are far too complex to be noticed by humans • Decision Trees – assists in data mining applications by the classification of items or events contained within the warehouse

  17. NEW APPLICATIONS FOR DATA MINING • Two new categories of applications 1) Text Mining – summarizes, navigates, and clusters documents contained in a database 2) Web Mining – integrates data and text mining within a Web site; enhances the Web site with intelligent behavior, such as suggesting related links or recommending new products to the consumer

  18. Market Basket Analysis

  19. Market Basket Analysis

  20. Market Basket Analysis • Market Basket Analysis is an algorithm that examines a long list of transactions in order to determine which items are most frequently purchased together. • It takes its name from the idea of a person in a supermarket throwing all of their items into a shopping cart (a "market basket").

  21. Market basket analysis one of the most common and useful types of data analysis for marketing. • With the data gathered from MBA, marketers can group products that customers like and group them together. • Market basket analysis can improve the effectiveness of marketing and sales tactics.

  22. Benefits of Market Basket Analysis: • A good indication of consumer behavior • Increase in sales • Improves customer satisfaction • Tracks what types of products interest consumer and finds relative alternative ones to introduce to the consumer.

  23. ASSOCIATION RULES for MBA • Support • Confidence • Lift • Method Association rules- are a common undirected data mining technique and complement market basket analysis. These rules are unidirectional Left-hand side rule IMPLIES Right-hand side rule ex. Pasta IMPLIES Wine, but Wine IMPLIES Pasta may not hold

  24. 40% of transactions that contain Pasta also contain Wine. 4% of transaction contain both of these items. Support- % measure of baskets where the association rule is true between the Left-hand side & the Right-hand side. ex. 4% of transactions contain both Confidence- Probability that the Right-hand side item is present once the Left-hand side item is present. ex. 40% of transactions that contain Pasta… p=.40 Lift- compares the likelihood of finding the right-hand side item in any random basket. Measures how well and associative rules performs by comparing how well an item can sell without the other item (improvement).

  25. Method

  26. Market Basket Analysis Market Basket analysis- determines what products customers purchase together

  27. Limits to Market Basket Analysis • A large number of data is req. to obtain meaningful data, but data’s accuracy is compromised if all the products don’t occur w/in similar frequency. • ex. Milk sells almost every transaction, but Elmer’s glue sells sporadically, its not effective to put them in same basket analysis. • Sometimes presents results that are actually due to the success of previous market campaigns. • ex. Discounted price of cola with purchase of pizza.

  28. Using Data from MBA • Once information has been gathered about different items and how they sell with respect to other items, a store may want to change their layout of items to improve their profits. • ex. Lunchboxes and School Supplies • For business without an actual storefront, they may want to offer promotions for products that sell together-increasing sales.

  29. MARKET BASKET ANALYSIS In a Nutshell

  30. Current Limitations and Challenges to Data Mining

  31. Current Limitations & Challenges to Data Mining • New and underdeveloped field • Identification of missing information • Most companies run legacy systems • Not DW (data warehouse) friendly • DW designers have to convert existing ODSs (operational data stores) to homogenous form of DW

  32. Current Limitations & Challenges to Data Mining • Not all knowledge about application domains are present in the data • ODSs are normally limited to those needed by the operational application associated with that DB • Data warehouse designers need to include mechanisms for “inventorying” data

  33. Data noise & missing values • Most operational databases contain data errors in their values and/or classification • Errors lead to misclassification • Future data mining systems must incorporate more sophisticated mechanisms for treating “noisy data” • Bayesian technique – a statistical technique

  34. Large Databases & high dimensionality • Databases are large & dynamic • Contents are always changing • Data patterns must be constantly updated • New discovery applications have to portion problems into smaller chunks of manageable data without losing any essential attributes of the data

  35. Data Visualization • Process by which numerical data are converted into meaningful 3-D images • Example • Intended to analyze complex data • Data from: satellite photos, sonar measurements, surveys, or computer simulations

  36. History of Data Visualization • Originated from statistics and science • Example of 2-D • Advancement credited to NCSA • National Center for Supercomputing Applications • Newest developments by Xerox PARC in virtual reality

  37. Human Visual Perception • Human visual cortex dominates our perception • Accelerates the identification of hidden patterns in data • “A picture is worth a thousand words”

  38. Geographical Information Systems (GIS) • A special-purpose DB which common spatial coordinate system is primary means of reference • Requires: • Data input • Data storage, retrieval, and query • Data transformation, analysis, and modeling • Data reporting • Integrates info. and aids in decision making

  39. GIS continued • Spatial Data – elements stored in map form • Contain three basic components: • Points • Lines • Polygons • Attribute Data – describes spatial data • Example of GIS

  40. Applications of Data Visualization Techniques • Retail Banking • Government • Insurance • Health Care and Medicine • Telecommunications • Transportation • Capital Markets • Asset Management

  41. Siftware Technologies

  42. Siftware Technologies • IBM • Informix • Red Brick • DB2 • Oracle • Silicon Graphics • Sybase

  43. Offers several Data Mining solutions, depending on users need. • IBM Information Warehouse Solutions • IBM Visualizer • Red Brick

  44. Informix • Three-tier model • Tier 1: “Client” presentation layer • Tier 2: Hewlett-Packard hardware • Tier 3: Data layer INFORMIX –OnLine database

  45. Sybase Warehouse WORKS • Assemble data from may sources • Transform data for a consistent and understandable view • Distribute data where needed • Provide high-speed access to the data

  46. Leading company for large-scale data mining • Data spread across mutliple databases • Data spread across processors for faster queries

  47. Discover new patterns and trends that may not be realized using traditional SQL • Three-dimensional Visualization • Visual models can save days and even months from the review process

  48. Review • Data mining (DM) • Techniques used to mine data • Market Basket Analysis: The King of DM Algorithms

  49. Review continued….. • Current Limitations and Challenges to Data Mining • Data Visualization • Siftware Technologies

  50. پایگاه پاورپوینت ایرانwww.txtzoom.comبانک اطلاعات هوشمند پاورپوینت

More Related