the art and science of data mining l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Art and Science of Data Mining PowerPoint Presentation
Download Presentation
The Art and Science of Data Mining

Loading in 2 Seconds...

play fullscreen
1 / 12

The Art and Science of Data Mining - PowerPoint PPT Presentation


  • 115 Views
  • Uploaded on

The Art and Science of Data Mining. By Marianna Dizik, Williams-Sonoma, Inc. . What is Data Mining?. Data Mining is the process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Art and Science of Data Mining' - mahala


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the art and science of data mining

The Art and Scienceof Data Mining

By Marianna Dizik,

Williams-Sonoma, Inc.

what is data mining
What is Data Mining?
  • Data Mining is the process of exploration and analysis, by automatic or semi-automatic means, of large quantities of data in order to discover meaningful patterns and rules
what can data mining do
Directed Data Mining:

Classification

Estimation

Prediction

The goal:

To build a model that describes/predict one particular variable of interest in terms of the rest of available data

Undirected Data Mining:

Affinity grouping or association rules

Clustering

Description or visualization

The goal:

To establish some relationship among all the variables

What Can Data Mining Do?
the business context for data mining
The Business Context for Data Mining
  • DM as a Research Tool
      • Pharmaceutical, bioinformatics
  • DM for Process Improvement
      • Manufacturing process control
  • DM for Marketing
  • DM for Customer Relationship Management
technical context for data mining
Technical Context for Data Mining
  • DM and Machine Learning:
      • AI, pattern recognition
  • DM and Statistics
  • DM and Decision Support
      • Data Warehouse:
      • OLAP, Data Marts, and Multidimensional Databases
        • Single view of the data to increase speed and responsiveness
  • DM and Computer Technology
classical techniques statistics
Classical Techniques:Statistics
  • Descriptive Statistics and Visualization
  • Statistic for Prediction:
      • Linear Regression
          • Power Transformation
      • WLS Regression
      • Logistic Regression: ln(p/(1-p))=a+Bx
          • p/(1-p) – odds ratio
          • Ln(p/(1-p)) - logit
classic techniques neighborhoods
Classic Techniques: Neighborhoods
  • Clustering: grouping similar objects
    • Hierarchical and Non-Hierarchical
      • Clustering for clarity (data reduction)
      • Clustering for action (policies, rules, promotions)
      • Clustering for outliers (fraud detection)
  • Nearest Neighbor Prediction
      • Recommendation Engines
      • Response Prediction
      • Stock Market Predictions
      • Building Taxonomies
      • Text Mining, Spam Identification
next generation techniques trees
Next Generation Techniques:Trees
  • Decision Trees: Segmentation with a Purpose
      • Exploration
      • Preprocessing
      • Prediction
          • CHAID (Chi-Square Interaction Detector)
          • CART (Gini metrics)
next generation techniques neural networks
Next Generation Techniques:Neural Networks
  • Neural Networks: Black Box Approach
      • Supervised Learning
        • Predictions
      • Unsupervised Learning
        • Clustering
          • Kohonen Network
        • Outlier Analysis
        • Feature Extraction
  • NN Models
      • Input Nodes
      • Hidden Nodes
      • Output Nodes
next generation techniques rule induction
Next Generation Techniques:Rule Induction
  • Rule Induction: Knowledge Discovery in unsupervised learning system
      • “If pickles are purchased, then ketchup is purchased”
  • Rule +Accuracy + Coverage
  • Target
      • The Antecedent
      • The Consequent
        • Based on Accuracy
        • Based on Coverage
        • Based on “Interestingness”
  • Rules do not Imply Causality
conclusion next generation business intelligence and data mining
Conclusion: Next GenerationBusiness Intelligence and Data Mining
  • Information Mining:
      • Uncover associations, patterns and trends
      • Detect deviations
      • Group and classify information
      • Develop predictive models
  • Knowledge Management (KM)
      • Text Mining
  • Data “Archeology”:
      • Semantic Computing Tools
      • Artificial Perception Systems