1 / 25

Data Mining

Data Mining. Knowledge Creation in Data Warehouses. Objectives. Understand what data mining is and identify where it might be used in a warehouse environment. Explain the techniques used by data mining Identify data mining tools Identify the possible results of a data mining query.

snowy
Download Presentation

Data Mining

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Knowledge Creation in Data Warehouses

  2. Objectives • Understand what data mining is and identify where it might be used in a warehouse environment. • Explain the techniques used by data mining • Identify data mining tools • Identify the possible results of a data mining query

  3. What is Data Mining? • New technology • The goals include discovery of patterns, and relationships otherwise hidden to users. • Answers business questions – that would be too time consuming or impossible to answer by traditional methods of data analysis.

  4. What else is it? • data mining is often used synonymously with knowledge discovery and has even been described as information archaeology • can be used to help predict trends, or buying patterns for example. • automated discovery of new facts, information and data relationships in large volumes of data

  5. Why Data Mining • Organisations generate more information in a week that most people could actually read in a lifetime – humanely impossible to decipher and interpret this data • Sheer volume of data – not practical for analysis by traditional methods (SQL query) • Wide range of sophisticated applications on the market • Data mining needs lots of computing power – parallel hardware and componentry is becoming affordable

  6. More reasons • Organisations want to build better customer relationships • Desire to find out who the valuable long term customers are • Want to discover natural classifications (segmentation) • Competition

  7. Uses of Data Mining • The Focus – Business Objectives • Who are my customers - Customer Profiling • What products and/or services are most valued by my customers? • Market Segmentation (by demographics) • Driving promotions or discounts • Trend prediction • Pattern discovery

  8. OLAP vs. Data Mining

  9. Data mining applications • Retail sector to determine and market products • Affinities between customer groups and products • Identification of customers that are more likely to respond to marking efforts • Clustering – groups or segments within customer data set.

  10. Applications cont. • Predicting “churn” • Building separate business models for different customer types. • Identification of successful employees (HR) • Market Basket Analysis, co-occurrence relationships between activities performed by individuals or groups. Purchase behaviour of customers. (Cross or upselling) • For eg. Shampoo and conditioner are usually purchased together – so it would not increase sales to put them on a promotion together.

  11. Applications cont. • Alpha consumers – using Market basket analysis. • Michael Wolf • The first few people to see the next hot movie, the first few people to own a cellular phone, the first few people to wear the new pastels and brights—all achieve a status boost by being in the know, being the one others follow.  • Theirs is the key role of connecting with the concept behind a product, then adopting that product, and finally validating it for the rest of society. • Allows companies to predict future buying trends

  12. Science and engineering • Bioinformatics • Genetics • Electric Power generation • Condition monitoring • Spatial data mining • Finding patterns in geography, populations, environmental, climate change, land use, spread of disease etc.

  13. Other uses • Surveillance • The US Govt’s use is extensive • TIA system and MATRIX • Collects data from social networks • Credit card usage, emails, • Medical records, phone calls • Data is collected without requiring search warrants • MATRIX - Multi-state Anti-Terrorism Information Exchange (defunded in 2005)

  14. Pattern and Subject Mining • Pattern-based data mining looks for patterns (including anomalous data patterns) that might be associated with terrorist activity — these patterns might be regarded as small signals in a large ocean of noise. • Subject-based data mining uses an initiating individual or other datum that is considered, based on other information, to be of high interest, and the goal is to determine what other persons or financial transactions or movements, etc., are related to that initiating datum

  15. Privacy and Ethics • The ways in which data mining can be used can raise questions regarding privacy, legality, and ethics. • Can uncover information or patterns which may compromise confidentiality and privacy obligations. • Individuals should be made aware, purpose of data collection, how data will be used, who will see data, security surrounding the data, how collected data might be updated.

  16. Facebook, Not what you think

  17. Data Mining TechniquesWhat DM does • Cluster detection • What kinds of values, attitudes, ideas do people have – and how could you identify them. • Linkage Analysis (Associations and Sequential Patterns) Heavy use in Genetics • Self-motivating/ learning (genetic algorithms and neural networks)

  18. How DM does it • Cluster Analysis • addresses market segmentation problems • algorithms search for groupings, clusters or “segments” of data • Individuals could belong to one cluster, or could overlap • Probability weightings

  19. Techniques – Cont. • Decision Trees • rule-based tools used for classification and prediction • bank or mortgage company may use a decision tree algorithm to determine whether it should grant you a loan • the algorithm will make some predictions about your ability to payback the loan.

  20. Techniques – Cont • Associations Discovery • affinities between items • looking for a degree of confidence higher than that of a random guess • discovery of associations such as this enables organisations to run more effective marketing and merchandising campaigns

  21. Techniques – Cont • Neural Networks and Genetic Algorithms Input layer might be for example, parameters required for a mortgage, such as, Salary, Age, Married or not, Education, Time in job etc. Output evaluates to True or False, however all parameters, input and output are usually values between 0 and 1

  22. Neural networks • Best when data appears shapeless or other techniques have not worked so well. • Nodes receive the data values (predictors at the input nodes). Weightings are applied to predictors.The inner or hidden layer reapply the predictor weightings over and over using a variety of different functions.

  23. Neural Networks • Attempts to model how the human brain • number of inputs representing a variety of variables (dimensions) • input and output values have continuous values in the range between 0 and 1 • Neural networks are very effective at discovering and predicting outcomes in datasets

  24. Evaluating Data Mining Tools • Is the existing infrastructure and data representation method able to support the proposed data-mining tool? • Is the data set adequately prepared for data-mining? • How well does the tool integrate with other enterprise data warehouse toolsets? • What kind of modelling support is offered? • What is the output of the data-mining tool? • Is the tool scalable? • User Support and security?

  25. Overview of Data Mining Applications • E-Commerce • Marketing • Decision Making • Retail Sector • Stock investment analysis, futures, and commodities trading • Web Analysis • Banking • Biotechnology and Health. • Customer Relationship Management (CRM) • Telecommunications • Tourism and Travel • Surveillance • Fraud detection

More Related