Data mining with clementine
Download
1 / 50

Data Mining with Clementine - PowerPoint PPT Presentation


  • 963 Views
  • Uploaded on

Data Mining with Clementine Girish Punj Professor of Marketing School of Business University of Connecticut Agenda How to introduce data mining to students Why Clementine? Clementine features and capabilities A typical data mining class Useful teaching resources

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Data Mining with Clementine' - Mia_John


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Data mining with clementine l.jpg

Data Mining with Clementine

Girish Punj

Professor of Marketing

School of Business

University of Connecticut


Agenda l.jpg
Agenda

  • How to introduce data mining to students

  • Why Clementine?

  • Clementine features and capabilities

  • A typical data mining class

  • Useful teaching resources

  • Questions?


Introduce data mining to students l.jpg
Introduce Data Mining to Students

  • Data mining chosen as one of top 10 emerging technologies..” (MIT Technology Review)

  • Data mining expertise is most sought after...” (Information Week Survey)

  • Data mining skills are an important part of the “toolkit” needed by managers in a complex business world

  • Data Mining for job advancement and as career insurance during good and bad economic times


Introduce data mining to students4 l.jpg
Introduce Data Mining to Students

“When I looked at what companies were doing with

analytics I found it had moved from the back room to

the board room…a number of companies weren’t just

using analytics, they were now competing on

analytics -- they had made analytics the central strategy

of their business.”

(Tom Davenport, author of ‘Competing on Analytics’)

“We are drowning in information but starved for

knowledge.”

(John Naisbitt author of ‘Megatrends’)


Applications retail l.jpg

Use data mining to understand customers’ wants, needs, and preferences

Based on this information, deliver timely, personalized promotional offers

Applications: Retail


Applications insurance l.jpg

Leverage data and text mining to speed claims processing and help reduce fraud

Applications: Insurance


Applications manufacturing l.jpg
Applications: help reduce fraudManufacturing

Model historical production and quality data to reduce development time and improve quality of production processes


Applications telecom l.jpg
Applications: Telecom help reduce fraud

Use data mining to identify appropriate customer segments for new marketing initiatives

Predict likelihood of customer churn and target those likely to leave with retention campaigns



Data mining and knowledge discovery l.jpg
Data Mining and Knowledge Discovery help reduce fraud

  • Data mining is the process of discovery of interesting, meaningful and actionable patterns hidden in large amounts of data (Han and Kamber 2006)

  • Knowledge Discovery (KD) as a more inclusive term

  • Knowledge Discovery using a combination of artificial and human intelligence

  • Data → Information → Knowledge


Data mining and statistics l.jpg

Data Mining help reduce fraud

No hypotheses are needed

Can find patterns in very large amounts of data

Uses all the data available

Terminology used: field, record, supervised learning, unsupervised learning

Statistics

Uses Hypothesis testing

Techniques are not suitable for large datasets

Relies on sampling

Terminology used: variable, observation, analysis of dependence, analysis of interdependence

Data Mining and Statistics


Deal with numerophobia l.jpg
Deal with Numerophobia help reduce fraud

http://www.youtube.com/watch?v=nRKzseCLja8

Emphasize Differences between Statistics and Data Mining to advantage (no probability distributions)

Use a math primer for numerically challenged students


Introduce software to students l.jpg
Introduce Software to Students help reduce fraud

  • Clementine 12.0:

    • Student Version (Clementine GradPack) is of enterprise strength

    • Student License extends for about eight months beyond course completion date

    • Directly address cost concerns by discussing value of “investment”


Who was clementine l.jpg
Who was Clementine? help reduce fraud

http://www.empire.k12.ca.us/capistrano/mike/capmusic/the_wild_west/gold_rush/clemtine.mid

Daughter of a miner during the 1849 California Gold Rush who developed a reputation…

“In a cavern, in a canyon,Excavating for a mineDwelt a miner, forty niner,And his daughter Clementine…”


Introduce software to students15 l.jpg
Introduce Software to Students help reduce fraud

  • Visual approach makes model building an art form

  • Concept of “data flow” enables building of multiple models

  • Point-and-click model building (no manual coding)

  • Comprehensive portfolio of models for the Business Analyst as well as the Technical Expert









Clementine basics visualize data l.jpg

Create tables and charts for means, ranges, and correlations of all variables

Clementine Basics: Visualize Data


Clementine basics visualize data24 l.jpg

Examine associations among variables using visual displays of all variables

Clementine Basics: Visualize Data


Clementine basics select target and predictors l.jpg
Clementine Basics: of all variablesSelect Target and Predictors




Building models in clementine l.jpg
Building Models in Clementine of all variables

Up sell/ Cross sell

Creating business rules for Up sell & Cross Sell

Identify and target likely churn candidates, and create retention offerings to decrease their likelihood to churn

Models

Propensity to respond/purchase

Customer Churn

Develop models on desired purchase behavior, and target candidates that are most likely to respond


A typical clementine model l.jpg
A Typical Clementine Model of all variables


Modeling approaches l.jpg

Can use auto “c.h.d” settings (beginning user) of all variables

Modeling Approaches

  • But can also use expert capabilities (advanced user)


Data mining procedures l.jpg

Estimation of all variables

Prediction

Classification

Clustering

Affinity/Association

Data Mining Procedures


Specific methodologies available l.jpg

Estimation & Prediction of all variables:

- Neural networks

Classification:

- Decision trees (2 types)

Specific Methodologies Available


Specific methodologies available33 l.jpg
Specific Methodologies Available of all variables

  • Clustering:

    - K-means

    - Kohonen networks

  • Affinity/Association:

    - Association rules (2 types)


Positioning the course l.jpg
Positioning the Course of all variables

Business

Applications

Theory and

Concepts

Clementine Models

Focus of the

Course


A typical class l.jpg
A Typical Class of all variables

  • Discuss business applications of methodology based on brief articles from the business press (30 minutes)

  • Present theory and concepts (30 minutes)

  • Build a Clementine model for students (30 minutes)

  • Ask students build a Clementine model (30 minutes)

  • Discuss homework assignment (15 minutes)

  • Students complete a homework assignment after class (requires three hours)


  • Discuss business applications l.jpg
    Discuss Business Applications of all variables

    “Wal-Mart's next competitive weapon is advanced data mining, which it will use to forecast, replenish and merchandise on a micro scale

    By analyzing years' worth of sales data--and then cranking in variables such as the weather and school schedules--the system could predict the optimal number of cases of Gatorade, in what flavors and sizes, a store in Laredo, Texas, should have on hand the Friday before Labor Day

    Then, if the weather forecast suddenly called for temperatures 5 hotter than last year, the delivery truck would automatically show up with more”

    From: “Can Wal-Mart Get Any Bigger,” Time, 13 January, 2003


    Present theory and concepts l.jpg
    Present Theory and Concepts of all variables

    ?

    Are window cleaning products also purchased when detergents and orange juice are bought together?

    ?

    Where should detergents be placed

    in the Store to maximize their sales?

    Is soda typically purchased with

    bananas? Does the brand of soda

    make a difference?

    ?

    How are the demographics of

    the neighborhood affecting what

    Customers are buying?

    ?

    From: Data Mining Techniques

    by Michael J. A. Berry and Gordon S. Linoff


    Present theory and concepts38 l.jpg
    Present Theory and Concepts of all variables

    • Start with a record of past purchase transactions that link items purchased together

    From: Data Mining Techniques

    by Michael J. A. Berry and Gordon S. Linoff


    Slide39 l.jpg

    Create a co-occurrence matrix that pairs items purchased together in the form of a table

    Present Theory and Concepts

    The co-occurrence matrix shows the number of times

    the “row” item was purchased with the “column” item (note that the matrix is symmetrical)

    From: Data Mining Techniques

    by Michael J. A. Berry and Gordon S. Linoff


    Slide40 l.jpg

    Rule Support = Percentage of transactions with both the items of interest

    What is the Support for the rule “If Soda, then OJ” ?

    OJ and Soda are purchased together in 2 out of 5 transactions

    Hence Support is 40%

    What is the support for the rule “If OJ, then Soda” ?

    Still 40%

    Present Theory and Concepts

    From: Data Mining Techniques

    by Michael J. A. Berry and Gordon S. Linoff


    Slide41 l.jpg

    Confidence = Ratio of the number of transactions with both the items of interest to the number of transactions with the “If” items

    What is the Confidence for “If Soda, then OJ” ?

    2 out of 3 soda purchase transactions also include OJ

    Hence Confidence is 66.66%

    What is the Confidence for “If OJ, then Soda” ?

    2 out of 4 OJ purchase transactions also include soda

    Hence Confidence is 50%

    Present Theory and Concepts

    From: Data Mining Techniques

    by Michael J. A. Berry and Gordon S. Linoff


    Slide42 l.jpg

    Present Theory and Concepts the items of interest to the number of transactions with the “If” items

    • Support (Prevalence): Percentage of records in the dataset that match the antecedent Support = p (antecedent)

    From: Data Mining Techniques

    by Michael J. A. Berry and Gordon S. Linoff


    Slide43 l.jpg

    Present Theory and Concepts the items of interest to the number of transactions with the “If” items

    • Confidence (Predictability): Percentage of records in the dataset that match the antecedent and also match the consequent

      Confidence =

    p (antecedent and consequent)

    p (antecedent)

    From: Data Mining Techniques

    by Michael J. A. Berry and Gordon S. Linoff


    Slide44 l.jpg

    Present Theory and Concepts the items of interest to the number of transactions with the “If” items

    • Lift (Improvement): How much better a rule is at predicting the consequent than chance alone?

      • Lift =

      • A rule is only useful if Lift is > 1

    • confidence

    • p (consequent)

    From: Data Mining Techniques

    by Michael J. A. Berry and Gordon S. Linoff


    Slide45 l.jpg

    Build a Clementine Model the items of interest to the number of transactions with the “If” items


    Homework assignment l.jpg
    Homework Assignment the items of interest to the number of transactions with the “If” items

    • Conduct a Market Basket Analysis on the dataset using both the Apriori and GRI modeling nodes in Clementine.

    • Reconcile the association rules discovered as a result of the Apriori and GRI modeling nodes.

    • Provide a narrative description that attempts to explain the convergence (or lack thereof) between the results obtained from the two modeling nodes. 

    • Select those association rules discovered during your Market Basket Analysis that would make the most intuitive sense to the category managers involved and create demographic profiles of shoppers who appear to fit those rules.


    Instructor s laptop screen l.jpg
    Instructor’s Laptop Screen the items of interest to the number of transactions with the “If” items


    Student s laptop screen l.jpg
    Student’s Laptop Screen the items of interest to the number of transactions with the “If” items


    Resources l.jpg
    Resources the items of interest to the number of transactions with the “If” items

    • “Data Mining Techniques” by Michael J. A. Berry and Gordon S. Linoff (second edition), Wiley, 2004

    • “Discovering Knowledge in Data” by Daniel T. Larose, Wiley, 2005

    • “Making Sense of Statistics” by Fred Pyrczak (fourth edition), Pyrczak Publishing, 2006

    • Recent articles from the business press identified using the “Factiva” database and “data mining” “predictive analytics” as search keywords

    • www.kdnuggets.com


    Thank you for your time and participation l.jpg
    Thank you for your time and participation the items of interest to the number of transactions with the “If” items

    • Questions?

    • Additional Information: Please see my syllabus at http://www.spss.com/academic/educator/curriculum/index.htm?tab=1

    • Comments and suggestions are welcome. Please send them to: [email protected]


    ad