From business objectives to data mining towards a sistematic way of data mining project development
1 / 43


  • Uploaded on
  • Presentation posted in: General

FROM BUSINESS OBJECTIVES TO DATA MINING: TOWARDS A SISTEMATIC WAY OF DATA MINING PROJECT DEVELOPMENT. Ernestina Menasalvas Facultad de Informática Universidad Politecnica de Madrid. Spain November 2004. Background(I). 1995: doctoral student.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Ernestina Menasalvas

Facultad de Informática

Universidad Politecnica de Madrid. Spain

November 2004


  • 1995: doctoral student.

    • Visit University of Regina (Prof. Ziarko)

    • Visit Warsaw University (Prof. Pawlak)

  • 1998: Defend thesis. Data Mining process model (Anita Wasilewska & C. Fernandez-Baizan)

  • Since then:

    • Data Bases Professor: Data bases, data mining

    • Coordinator of the Data Mining group at Facultad de Informática UPM

      • Techniques: Rough Sets, Bayes, …

      • Methodologies for data mining process management

        • Evaluation in Data Mining

        • Experimentation in Web Mining

      • Web Mining: Web Goal Mining


  • Projects developed:

    • Pure Research:

      • Data Mining to be integrated on RDBMS

      • Web Profiler

      • Methodology for Data Mining process management

    • Research and application:

      • Data Mining applied on different domains:

        • Car dealers

        • Travel agency

        • ….

Data Mining Project Development

  • Methodologies for Data Mining project development

    • Is it really Data Mining a Science?

    • Are we developing proyects as an art?

    • Has the research got the same results in all the areas??

      • Algorithms

      • Data Preparation

      • Data enrichment

      • Conceptualization of Data Mining problems

Data Mining: an art, a science?

  • Since it appeared a lot of algorithms have been programmed

  • Standards:

    • Crisp-DM

    • SEMMA

    • PMML 3.0

  • Process depends on the expertise of the data miner

  • User speaks about business problems

  • Data Miner speaks about algorithms

Data Mining as a project

  • Data Mining is data intensive activity

    • Data understanding

    • Data Preparation

  • Database manager:

    • Transactional databases

    • Datawarehouses

  • The end result of a data mining project is a tool (software project) for better decision making process:

    • Software development project

  • IT department has to be involved

Project Management

  • Why?

    • In order to organize the process of develpoment and to produce a project plan

  • How?

  • Establish how the process is going to be develop:

    • Sequential

    • Incremental

  • What?

  • Establish how is the process is splitted into phases and define the tasks to be developed in each step:

    • RUP

    • XP


  • Way of making things

  • Independent of the process being developed


  • Particular tasks

  • Detail of tasks to be developed


Common pitfall of data mining implementation

  • The common pitfall of data mining implementation the following:

    • Not being able to efficiently communicate mining results within an organization.

    • Not having the right data to conduct effective analysis.

    • Not using existing data correctly.

    • Not being able to evaluate results

  • Questions that arise:

    • Can the adequateness of a set of data for a problem be established when preparing the project plan?

    • How the set of data can be used to produce the expected results?

    • How we can evaluate the results?

    • Cost estimation?

Data Mining Approaches

  • Vendor independent:

    • CRISP-DM

  • Based on the commercial tools:

    • CAT’s

    • SEMMA

  • CRM Methodology:

    • CRM Catalyst

Model Process

Not Real Methodology

Based on Crisp-DM

Globlal CRM process

Does not concentrate on Data Mining step

Cross-Industry Standard Process for Data Mining:CRISP-DM

Data Mining as a project: CATs

  • CATs :ClementineApplicationTemplates : [CATs]

    • Specific libraries of best practices that provide inmediate value right out of the box

    • Following the CRISP-DM standard. Every CAT stream is assigned to a CRISP-DM phase

    • They provide long term value as they can always be used with a new data set for new insight in other projects.

  • Available as an add-on module to Clementine, include:

    • Telco CAT - improve retention and cross-selling efforts for telecommunications

    • CRM CAT - understand and predict customer migration between segments,

    • Microarray CAT - accelerate biological discoveries, find genes Fraud CAT - predict and detect instances of fraud in financial transactions, claims, tax returns …

    • Web CAT

What is a CAT?[CATs]


  • SEMMA (Sample, Explore, Modify, Model, Assess): [SEMMA]

    • Is not a data mining methodology

    • Rather a logical organization of the functional tool set of SAS Enterprise Miner for carrying out the core tasks of data mining.

    • Enterprise Miner can be used as part of any iterative data mining methodology adopted by the client.

    • Naturally steps such as formulating a well defined business or research problem and assembling quality representative data sources are critical to the overall success of any data mining project.


  • SEMMA is focused on the model development aspects of data mining:[SEMMA]

    • Sample the data to extract a portion of a large data set big enough to contein significant information, yet small to manipulate quickly.

    • Explore the data by searching for anticipated trends and anomalies in order to gain understanding and ideas.

    • Modify the data by creating selecting and transforming the variables to focus the model selection problem.

    • Model the data allowing the software to search automatically for a combination of data that reliably predicts a desired outcome. Modelling techniques include neural networks, tree-clasiffiers, statistical models, etc.

    • Assess the data by evaluating the usefulness and reliability of the findings from the data mining process and estimate how well it performs.

Methods for Project Management:CRM Catalyst(1)

  • Developed jointly by CustomISe, MACS and SalesPathways. Together they have formed the Catalyst Foundation


  • CRM projects are difficult to execute successfully because of the wide range of factors influencing their success. So it can take a long time to make CRM work properly for an organisation.

  • Solution: CRM Catalyst.

  • Methodology acts as a catalyst for CRM projects enabling them to achieve their objectives more reliably and in less time.

  • It gives a project life cycle with a set of defined phases broken down into steps with clearly stated inputs and outputs.

Methods for Project Management: CRM Catalyst(2)

Implementation requires

Data Mining development process

Progressive Lifecycle Model

The resutls are obtained in a progressive way

Implementation is Knowledge intensive

In some steps Knowledge Intensive Methdology could be appropriate

Main steps in a Data Mining Project

  • Define the goals:

    • Business and data mining experts together have to define the goals

    • Each goal must be defined with measurements for success

  • Obtain the models:

    • Apply data mining algorithms.

    • Preprocesing is important

  • Evaluate results:

    • ascertaine the value of an object according to specified criteria, operationalised in terms of measures.

  • Deploy:

    • Decide patterns and models that can be deployed

  • Evaluate

    • After product working it should be contrasted the result

1. Define the goals

  • Distinguish between :

    • Data Mining goals

    • Business goals

  • How do we translate?

Increase the lifetime value of valuable customers







It has to be solved in the Business Understanding step of CRISP-DM

Business Understandingin the CRISP-DM Process

Business Understanding

Business Success Criteria


Business Objectives

Determine Business Objectives

Inventory & Resources

Reqs, Assumptions &Constraints

Risks & Contingencies


Costs & Benefits

Assess Situation

Determine Data Mining Goals

Data Mining Goals

Data Mining Success Criteria

Produce Project Plan

Initial Assessment of Tools & Techniques

Project Plan

1.1 Determine Business objectives and success criteria

  • Not only business objectives have to be established but measures in order to be able to evaluate the results

  • Business objectives:

    • What is the customer's primary objective?

      • Increase the number of loyal customers

      • Selling more of a certain product

      • Have a positive marketing campaing

  • Business success criteria:

    • What constitutes a successful outcome of the project?

    • Objectives measures so that the success can be established

    • ROI

1.2 Costs & Benefits

  • Perform a cost-benefits analysis

    • Compute the benefits of the project

      • Which measures do we have?

      • ROI

      • APEX

      • OPEX....

    • Compute the costs of the project (equipment, human resources...)

      • Which methodology do we have?

      • COCOMO for sortware

    • Quantify the risk that the project fails

      • Knowledge not available

      • Data Not available

      • Proper tools

Data Mining Estimation Model

  • Establishing a parametrical estimation model for Data Mining (Marban’03)


(Data Mining COst MOdel)

Data Mining Cost Estimation

  • Main factors in a Data Mining project

    • Data Sources (number, kind, nature, …)

    • Data mining problem to be solved (descriptive, predictive, …)

    • Development platform

    • Available tools

    • Expertise of the development team

  • Drivers

  • Data Drivers

  • Model Drivers

  • Platform Drivers

  • Tools and techniques Drivers

  • Project Drivers

  • People Drivers

1.3 Data Mining goals and success

Data mining goals:

  • Translate the customer's primary objective into a data mining goal, e.g.

    • Loyalty program translated into segmentation problem

    • Decreasing the attrition rate transformed into classification problem

  • Data mining success criteria:

    • Determine success in technical terms

      • Translate the notion of sucess into confidence, support and lift and other parameteres

      • Determine de cost of errors

  • How do we make the translation?

  • Methodology

    • Which is the methodology to be followed to translate business objectives into data mining objectives?

    • Unluckily, there is no such methodology. First we have to solve:

      • How a business objective is expressed?

      • What is a data mining goal?

      • How are data mining goals achieved?

      • Which are the requirements of data mining functions?

    In order to describe everything in a standard way:

    Conceptualize the problem

    Conceptualization in other disciplines

    • Data Bases:

      • E/R diagrams

      • Independent of the domain

      • A tool for business understanding and for data base designer

      • Translation from E/R to implementation

    External view n

    External view 1

    Conceptual Schema

    Internal Schema

    3 levels proposed architecture

    Business problem

    Business problem

    Requirements of algorithms will

    be solved at this level

    Conceptual Schema

    Internal Schema

    Tools requirements to be solved

    SAS, WEKA, Clementine…

    3 layers architecture for data mining

    • It is the bridge:

      • Between business goals and the final tool

      • Independent of the domain

    • Provides independence:

      • Changes in the tool do not reflect to the solution

    • It has to be decided what to model in the conceptualization

    • Automatic translation of business goals into data mining goals

    • Data Mining goals +constraints = feasible data mining goals

    Elements to conceptualize

    • Elements to be taken into account:

      • Data:

        • Quality from data mining point of view

        • Adequateness for the problem

        • Classification for data mining purposes

      • Knowledge:

        • Related to the process being analyzed

        • Related to the data used

      • People

        • Owners of data

        • Experts in the process

      • Data mining problems requirements

      • Data mining methods requirements

    Proposed process


    • Data Mining Modelling Objects:

      • Data

      • Knowledge

      • Constraints of data and applications

      • Data Mining objects

        • Algorithms

        • Measures

        • Methods

    • To bridge the gap between data miners and business users

    Are data adequate for analysis?

    • The adequateness of the data is analyzed taking into account goals to fulfil.

    • Data together with the knowledge extracted from the experts can be transformed so that just by being the input of a certain data mining algorithm will produce the required patterns.

    • Quality of the data, in this context:

      • is not only related to the technical quality: proper model, percentage of null values,

    • but also has to do with:

      • meaning of the attributes,

      • Where each piece of data comes from,

      • relationship among data, and

      • finally how the data fulfil the requirements of the data mining functions

    2. Data Mining: obtain models

    • Apply data mining process model

    • Associated problems solved by the 3 layers architecture:

      • Comparison of approaches

      • Evaluate costs

      • Pros and cons of approaches

    • Only experience or a conceptualization can help

    • The conceptual model will help to establish the process to obtain each feasible model.

    • Requirements and transformations implicit in the model

    2.1 Determine type of problem

    • What are data mining problems?

      • Classification

      • Estimation

      • Association

      • Segmentation

    • In the conceptual model requirements for each type will be settled

    2.2 Apply CRISP-DMprocess model

    • Data Mining problem has to be settled before going into modeling step

    • Requierements will be established in Business understanding

    • Requierements will be checked in Data Understanding and data Preparation

    • Preparation will be guided by conceptual model

    • Evaluation on feasibility can be done before applying the model

    Business Understanding

    Business Understanding

    Data Understanding






    3. Evaluate results

    [Spilipopou, Berendt]

    • Evaluation: the act of ascertaining the value of an object according to specified criteria, operationalised in terms of measures.

      • Object= model already obtained

      • Criteria and Measures and has to do with goals

    • Evaluation requires a well-defined notion of success, which must be in place before

      • the evaluation takes place

      • the data mining phase starts

      • any work with the data starts

    • i.e. already during the business understanding process.

    • Here once again conceptualization plays its role

    Evaluation in the CRISP-DM Process

    • The CRISP-DM process is

      • a non-ending circle of iterations

      • a non-sequential process, where backtracking at previous phases is usually necessary

    • In each sequential instantiation evaluation takes place:

    • But it is a cycle

    • In all the iterations all the steps should be revisited

    • Results have to be evaluated!!

    Business Understanding

    Business Understanding

    Data Understanding






    4. Deployment

    • All the models that have possitive evaluation can be deployed

    • For measurements of success to trust deployment has to follow rules established at the beginning of the project

      • The real evaluation has not yet been performed

    5. Evaluate after deployment

    • After deployment there is the need to proof that the improvements are really due to the actions taken after a data mining discovery and not to any other factor or action carried out in the company

    • None of the obvious claims about success of data mining have ever been systematically tested.

    • Experiments are crucial to establish if the impact of the deployment is really positive or negative

    • Experiments have to be designed at the beginning of the project


    • Data mining projects are being developed more as art than a science

    • Many algorithms have been implemented but no systematically proof of one better than another in real case is done after deployment

    • Conceptual model is required:

      • To map business goals to the model

      • To map data mining algorithms to a conceptual model

    • Achievements of the model:

      • Will be used along the process to guide the project

      • Evaluation tool

    Future works

    • Conceptual model

      • Define DMMO objects

    • Evaluation techniques related to the model:

      • Evaluate data mining goals

      • Evaluate business goals

    • Experimentation methods:

      • obstursively and

      • non obstrusivelsly


    • Evaluation in Web mining Tutorial at ECML/PKDD 2004 Pisa, Italy; 20th September, 2004. Bettina Berendt, Myra Spiliopoulou, Ernestina Menasalvas

    • Towards a Methodology for Data mining Project Development : The Importance of Abstraction. Menasalvas, Millán, Gonzalez-Aranda, Segovia

    • Bettina Berendt, Andreas Hotho, Dunja Mladenic, Maarten van Someren, Myra Spiliopoulou, Gerd Stumme: Web Mining: From Web to Semantic Web, First European Web Mining Forum, EMWF 2003, Cavtat-Dubrovnik, Croatia, September 22, 2003, Revised Selected and Invited Papers Springer 2004

    • Myra Spiliopoulou, Carsten Pohle: Modelling and Incorporating Background Knowledge in the Web Mining Process. Pattern Detection and Discovery 2002: 154-169







  • Login