Data mining how to make islands of knowledge emerging out of oceans of data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 104

Data Mining: How to make islands of knowledge emerging out of oceans of data PowerPoint PPT Presentation


  • 42 Views
  • Uploaded on
  • Presentation posted in: General

Data Mining: How to make islands of knowledge emerging out of oceans of data. Hugues Bersini IRIDIA - ULB. PLAN. Rapid intro to data warehouse data mining: two super techniques of data mining incomprehensible :. Understand and predict. Lazy for time series prediction.

Download Presentation

Data Mining: How to make islands of knowledge emerging out of oceans of data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Data mining how to make islands of knowledge emerging out of oceans of data

Data Mining:How to make islands ofknowledge emerging out of oceans of data

Hugues Bersini

IRIDIA - ULB


Data mining how to make islands of knowledge emerging out of oceans of data

PLAN

  • Rapid intro to data warehouse

  • data mining:

  • two super techniques of data mining incomprehensible:

Understand and predict

Lazy for time series prediction

Bagfs for classification


The data miner steps

The Data Miner Steps

  • Data Warehousing

  • Data Preparation

    • Cleaning + Homogeneisation

    • Transformation - Composition

    • Reduction

    • For time series: time adjustment

  • Data Modelling : What researchers are mainly interested in.


Data warehouse

Data Warehouse


Re organization of data

Re-organization of data

  • Subject oriented

  • integrated

  • transversals

  • with history

  • non volatile

  • from production data ---> to decision-based data


Data mining

Data Mining

Uunderstand and predict


Modelling the data only if structure and regularities in the data data mining is not olap

Modelling the data: only if structure and regularities in the dataData mining IS NOT OLAP

WHY ??

To predict

new data

To understand

the data


The main techniques of data mining

The main techniques of data-mining

  • Clustering

  • Outlier detection

  • Association analysis

  • Forecasting

  • Classification


Data mining to understand and or to predict

Data Mining: to understand and/or to predict

discovering

structure in data

discovering I/O

relationship in data


Nothing new under the sun

Nothing new under the sun

  • New methods extending old ones in the domain of non-linear (NN) and symbolic (decision tree)

  • Exponential explosion of data

  • Extracting from huge data base

More sensitive than ever


Data mining how to make islands of knowledge emerging out of oceans of data

Exploit

Decisions

Data store

Data Store

Data Store

Data Store

  • Data volume doubles every 18 months

  • world-wide

  • Problem

  • How to extract relevant knowledge for

  • our decisions from such amounts of data?

  • Solutions

  • Throw it away before using it (most popular)

  • Query it (Query and OLAP tools)

  • Summarize it: extract essence from the bulk

  • according to targeted decision (Data Mining)

CEDITI September 2, 1998

  • 3


Discovering structure in data

Discovering structure in data

  • When in a space with a metric

    • Hierarchical clustering

    • K-Means

    • NN clustering - Kohonen’s map

  • In space without any metric but a cost function:

    • Grouping Genetic Algorithms ....


Clustering and outlier

Clustering and outlier


Market basket analysis association analysis

Market Basket Analysis: Association analysis

Quantity bought


Calcul of improvement

Calcul of Improvement


Discovering i o relationship in data

Discovering I/O relationship in data

?

y

x(t)

??

x

t

classification

time series prediction

O = the class

I = (x,y)

O = x(t+1)

I = x(t)

understanding I/O relationship

Predicting which O for new I


Le cv d iridia en data mining

Le CV d’IRIDIA en data mining

  • Reconnaissance de défauts vitreux chez Glaverbel

  • Prediction de fluctuations boursières avec MasterFood et dieteren

  • Reconnaissance d’incidents etprédiction de charge électrique avec Tractebel

  • Analyse des retards aériens avec Eurocontrôle

  • Modélisation de Processus Industrielavec Honeywell, FAFER et Siemens

  • Moteur de recherche Internet convivial avec la Region Wallonne

  • Classification de pixels pour les images de satelittes


Financial prediction

Financial prediction

Task: predict the future trends of the financial series.

Goal: automatic trading system to anticipate the fluctuations of the market.


Economic variables

Economic variables

Task: predict how many cars will be matriculated next year.

Goal: support the marketing campaign of a car dealer.


Modeling of industrial plants

Modeling of industrial plants

Rolling steel mill

Task: predict the flow stress of the steel plate as a function of the chemical and physical properties of the material.

Goal: cope with different types of metals, reduce the

production time and improve final quality.


Control

Control

Waste water treatment plant

Task: model the dynamics of the plant on the basis of

accessible information.

Goal: control the level of water pollutants.


Environmental problems

Environmental problems

Algae summer blooming

Task: predicting the biological state

(e.g. density of algae communities)

as a function of chemicals.

Goal: make automatic the analysis of

the state of the river by monitoring

chemical concentrations.


In the medical domain

In the medical domain

  • automatic diagnosis of cancer

  • detection of respiratory problems

  • electrocardiogram analysis

  • help to paraplegic


Data mining how to make islands of knowledge emerging out of oceans of data

APPLICATION DU DATA MINING DANS LE DOMAINE DU CANCER: Application à l'aide au diagnostic et au pronosticen pathologie tumorale.

En collaboration avec le Laboratoire d'Histopathologie (R. Kiss),

Faculté de Médecine, U.L.B.


Data mining how to make islands of knowledge emerging out of oceans of data

critères histologiques:

- perte de différenciation

- invasion

critères cytologiques:

- taille des noyaux

- mitoses

- plages d’hyperchromatisme

bilan

clinique

patient

tumeur

chirurgie

DIAGNOSTIC

(pathologistes)

traitement

adjuvant

faible, modéré, élevé

Amélioration du diagnostic Adéquation du traitement

Augmentation de la survie


Data mining how to make islands of knowledge emerging out of oceans of data

  • Exemple:Tumeurs primitives cérébrales (adultes): GLIOMES


Data mining how to make islands of knowledge emerging out of oceans of data

“Objectivation” d’éléments diagnostiques

quantification de critères (cytologiques et histologiques)

microscopie assistée

par ordinateur

traitement des données

Extraction d’informations diagnostiques et/ou prognostiques fiables et reproductibles


Data mining how to make islands of knowledge emerging out of oceans of data

  • 500 à 1000 noyaux

  • par tumeurs.

  • 30 variables tumorales:

  • moyenne

  • déviation standard


Application to teledetection

Application to teledetection


Data mining how to make islands of knowledge emerging out of oceans of data

Bagfs


On internet

On internet

  • The Hyperprisme project

  • Text Mining

  • Automatic profiling of users

    • Key words: positif, negatif,…

  • Automatic grouping of users on the basis of their profiles

  • See Web


Different approaches

Different approaches

Data

Model

Non readable

Accuracy of prediction

Non comprehensible

Comprehensible

SVM

Local

Global


Understanding and predicting

Understanding and Predicting

Building Models

A model needs data to exist but,

once it exists, it can exist without the data.

Structure

To fit the data

Model

Parameters

Linear, NN, Fuzzy, ID3, Wavelet, Fourier, Polynomes,...


From data to prediction

From data to prediction

RAW

DATA

TRAINING DATA

PREPROCESSING

MODEL

LEARNING

PREDICTION


Supervised learning

Supervised learning

input

PHENOMENON

output

error

OBSERVATIONS

MODEL

prediction

  • Finite amount of noisy observations.

  • No a priori knowledge of the phenomenon.


Model learning

Model learning

MODEL

GENERATION

PARAMETRIC

IDENTIFICATION

MODEL

VALIDATION

STRUCTURAL

IDENTIFICATION

MODEL

SELECTION


The practice of modelling

The Practice of Modelling

Accurate

Simple

Robust

Understandable

good for

decision

Data + Optimisation

Methods

THE MODEL

Physical Knowledge

Engineering Models

Rules of Thumb

Linguistic Rules


Comprehensible models

Comprehensible models

  • Decision trees

  • Qualitative attributes

  • Force the attributes to be treated separately

  • classification surfaces parallel to the axes

  • good for comprehension because they select and separate the variables


Decision trees

Decision trees

  • Very used in practice. One of the favorite data mining methods

  • Work with noisy data (statistical approaches) can learn logical model out of data expressed by and/or rules

  • ID3, C4.5 ---> Quinlan

  • Favoring little trees --> simple models


Data mining how to make islands of knowledge emerging out of oceans of data

  • At every stage the most discriminant attribute

  • The tree is being constructed top-down adding a new attribute at each level

  • The choice of the attribute is based on a statistical criteria called : “the information gain”

  • Entropie = -pouilog2poui - pnonlog2pnon

  • Entropie = 0 if Poui/non = 1

  • Entropie = 1 if Poui/non = 1/2


Information gain

Information gain

  • S = set of instances, A set of attributes and v set of values of attributes A

  • Gain (S,A) = Entropie(S)-Sv|Sv|/|S|*Entropie(Sv)

  • the best A is the one that maximises the Gain

  • The algorithm runs in a recursive way

  • The same mechanism is reapplied at each level


Data mining how to make islands of knowledge emerging out of oceans of data

Mais !!!!

Remboursement d’emprunt

Is a good client if (x - y)>30000

.

Salaire mensuel

30000


Other comprehensible models

Other comprehensible models

  • Fuzzy logic

  • Realize an I/O mapping with linguistic rules

  • If I eat “a lot” then I take weight “a lot”


Exemple trivial

Exemple trivial

Linéaire, optimal

automatique, simple

Y

X


Le flou

Le flou


Data mining how to make islands of knowledge emerging out of oceans of data

Si x est très petit alors y est petit

Si x est petit alors y est moyen

Si x est moyen alors y est moyen

Y

lisible ?

interfaçable ?

adaptatif

universel

semi-automatique

X


Non comprehensible models

Non comprehensible models

  • From more to less

    • linear discriminant

    • local approaches

      • fuzzy rules

      • Support Vector Machine

      • RBF

  • global approaches

    • NN

    • polynômes, wavelet,…

    • Support Vector Machine


Data mining how to make islands of knowledge emerging out of oceans of data

Le neuronal


Data mining how to make islands of knowledge emerging out of oceans of data

précis

universel

black-box

Semi-automatique


Nonlinear relationship

Nonlinear relationship

output

input


Observations

0.6

0.4

0.2

0

-0.2

-0.4

-0.6

-0.8

-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Observations

output

query

query

query


Data mining how to make islands of knowledge emerging out of oceans of data

Global modeling

output

input


Prediction with global models

Prediction with global models

query

query

query


Advantages

Advantages

  • Exist without data

  • Information compression

    • Mainly SVM: mathématiques, pratiques, logique et génériques.

  • Detect a global structure in the data

  • Allow to test the sensitivity of the variables

  • Can easily incorporate prior knowledge


  • Drawbacks

    Drawbacks

    • Make assumption of uniformity

    • Have the bias of their structure

    • Are hardly adapting

    • Which one to choose.


    Weak classifiers ensembles

    `Weak classifiers´ ensembles

    • Classifier capacity reduced in 2 ways :

      • simplified internal architecture

      • NOT all the available information

    • Better generalisation, reducing overfitting

    • Improving accuracy

    • by decorrelating classifiers errors

    • by increasing the variability in the learning space.


    Two distinct views of the information

    Two distinct views of the information

    • "Vertically", weighting the samples

      • active learning - not investigated

      • bagging, boosting, ECOC for multiple classifier systems

    • "Horizontally", selecting features

      • feature selection methods

      • MFS and its extensions.

    • also : manipulating class label (ECOC) - not investigated yet


    Bagging resampling the learning set

    `Bagging´ : resampling the learning set

    • Bootstraps aggregating (Leo Breiman)

      • random and independant perturbation of the learning set.

      • vital element : instability of the inducer*.

        • e.g. C4.5, neural network but not kNN !

      • increaseaccuracy by reducingvariance

    * inducer = base learning algorithm : c4.5, kNN, ...


    Learning set resampling arcing

    Learning set resampling : `Arcing´

    • Adaptive resampling or reweighting of the learning set (Leo Breiman terminology).

    • Boosting (Freund & Schapire)

      • sequential reweighting based on the description accuracy.

        • e.g. AdaBoost.M1 for multi-class problems.

      • needs unstability so as bagging

      • better variability than bagging.

      • sensible to noisy databases.

      • better than bagging on non-noisy databases


    Mutliple feature subsets stephen d bay 1 2

    Mutliple Feature Subsets : Stephen D. Bay (1/2)

    • problem ?

      • kNN is stablevertically so Bagging doesn't work.

    • horizontally : MFS - combining random selections of features with or without replacement.

    • question ?

      • what about other inducers such C4.5 ??


    Data mining how to make islands of knowledge emerging out of oceans of data

    Multiple Feature Subsets : Stephen D. Bay (2/2)

    • Hypo : kNN uses its ‘ horizontal ’ instability.

    • Two parameters :

      • K=n/N, proportion of features in subsets.

      • R, number of subsets to combine.

    • MFS is better than single kNN with FSS and BSS, feature selections techniques.

    • MFS is more stable than kNN on added irrelevant features.

    • MFS decreases variance and bias through randomness.


    Bagfs a multiple classifier system

    BAGFS : a multiple classifier system

    • BAGFS = MFS inside each Bagging.

    • BAGMFS = MFS & Bagging together.

    • 3 parameters

      • B, number of bootstraps

      • K=n/N, proportion of features in subsets

      • R, number of feature subsets

    • decision rule : majority vote


    Bagfs architecture around c4 5

    BAGFS architecture around C4.5

    not useful


    Material

    Material

    • 15 UCI continuous databases

    • 10-fold cross validations

    • C4.5 Rel 8 with CF=0.25, MINOBJ=2, pruning

    • no normalisation, no other pre-treatment on data.

    • majority vote


    Experiments

    Experiments

    • Testing parametrization

      • optimizing K between 0.1 and 1 by means of a nested 10-fold cross-validation

      • R= 7, B= 7 for two-level method : Bagfs 7x7

      • set of 50 classifiers otherwize : Bag 50, BagMfs 50, MFS 50, Boosting 50


    Experimental results

    Experimental Results

    • McNemar test of significance (95%) : Bagfs performs never signif. worse and even sign. better on at least 4 databases (see red databases).


    Bagfs discussions

    BAGFS : discussions

    • How adjusting the parameters B, K, R

      • internal cross validation ?

      • dimensionality and variability measures hypothesis

    • Interest of a second level ?

      • About irrelevant and (un)informative features ?

      • Does bagging + feature selections work better ?

      • How proving the interest of MFS randomness ?

    • How using bootstraps complementary ?

      • Can we ?

      • What to do ?

    • How proving horizontal unstability of C4.5 ?

    • Comparison with 1-level bagging and MFS

      • Same number of classifiers ?

      • Advantage of tuning parameters ?


    Bagfs next

    BAGFS : next...

    • More databases

      • including nominal features

      • including missing values

    • Other decision rules : Bayesian approach, ranking, ...

    • Other inducers : LDA, kNN, logistic regr., MLP ??

    • Another level : boosting + MFS ?


    Which best model when they all can perfectly fit the data

    Which best model ??when they all can perfectly fit the data

    They all can perfectly fit the data but

    !

    they don’t approach the data

    in the same way. This approach

    depends on their structure


    This explains the importance of cross validation

    This explains the importance ofCross-validation

    this value

    makes the difference

    Model A vs Model B

    B

    training

    A

    testing


    Which one to choose

    Which one to choose

    • Capital role of crossvalidation.

    • Hard to run

    • One possible response


    Data mining how to make islands of knowledge emerging out of oceans of data

    • Lazy methods

    • Coming from fuzzy


    Model or examples

    Model or Examples ??

    Build a Model

    Prediction based

    on the examples

    Prediction

    based on the model


    Data mining how to make islands of knowledge emerging out of oceans of data

    A model

    ?

    ?

    ?

    ?


    Lazy methods

    Lazy Methods

    • Accuracy entails to keep the data and don’t use any intermediary model: the best model is the data

    • Accuracy requires powerful local models with powerful cross-validation methods

    • Made possible again due to the computer power

    lazy methods is a new trend which

    is a revival of an old trend


    Lazy methods1

    Lazy methods

    • A lot of expressions for the same thing:

      • memory-based, instance-based, examples-based,distance-based

      • nearest-neighbour

    • lazy for regression, classification and time series prediction

    • lazy for quantitative and qualitative features


    Data mining how to make islands of knowledge emerging out of oceans of data

    Local modeling


    Prediction with local models

    0.5

    0.4

    0.3

    0.2

    0.1

    0

    -0.1

    -0.2

    -0.3

    -0.4

    -0.5

    -1

    -0.8

    -0.6

    -0.4

    -0.2

    0

    0.2

    0.4

    0.6

    0.8

    1

    Prediction with local models

    query

    query

    query


    Local modeling procedure

    Local modeling procedure

    The identification of a local model can be summarized in these steps:

    • Compute the distance between the query and the training samples according to a predefined metric.

    • Rank the neighbors on the basis of their distance to the query.

    • Select a subset of the nearest neighbors according to the bandwidth which measures the size of the neighborhood.

    • Fit a local model (e.g. constant, linear,...).

    The thesis focused on the bandwidth selection problem.


    Bias variance trade off overfitting

    Bias/variance trade-off: overfitting

    Prediction error

    too few neighbors  overfitting  large prediction error


    Bias variance trade off underfitting

    Bias/variance trade off: underfitting

    Prediction error

    too many neighbors  underfitting  large prediction error


    Validation crois e press

    Validation croisée: Press

    • Fait un leave-one-out sans le faire pour les modèles linéaires

    • Un gain computationnel énorme

    • Rend possible une des validations croisées les plus puissantes à un prix computationel infime.


    Data driven bandwidth selection

    Data-driven bandwidth selection

    identification

    identification

    b(k m),

    MSE (k m)

    b(k m+1),

    MSE (k m+1)

    b(k M),

    MSE (k M)

    validation

    validation

    MSE (k m)

    MSE (k m+1)

    MSE (k M)

    model selection

    PREDICTION


    Advantages1

    Advantages

    • No assumption of uniformity

    • Justified in real life

    • Adaptive

    • Simple


    From local learning to lazy learning ll

    From local learning to Lazy Learning (LL)

    • By speeding up the local learning procedure, we can delay the learning procedure to the moment when a prediction in a query point is required (query-by-query learning).

    • This method is called lazy since the whole learning procedure is deferred until a prediction is required.

    • Example of non lazy methods (eager) are neural networks where learning is performed in advance, the fitted model is stored and data are discarded.


    Static benchmarks

    Static benchmarks

    • Datasets: 15 real and 8 artificial datasets from the ML repository.

    • Methods: Lazy Learning, Local modeling, Feed Forward Neural Networks, Mixtures of Experts, Neuro Fuzzy, Regression Trees (Cubist).

    • Experimental methodology: 10-fold cross-validation.

    • Results: Mean absolute error, relative error, paired t-test.


    Data mining how to make islands of knowledge emerging out of oceans of data

    Observed data

    Artificial data


    Experimental results paired comparison i

    Experimental results: paired comparison (I)

    Each method compared with all the others (9*23 =207 comparisons)

    The lower, the better !!


    Experimental results paired comparison ii

    Experimental results: paired comparison (II)

    Each method compared with all the others (9*23 = 207 comparisons)

    The larger, the better !!


    Lazy learning for dynamic tasks

    Lazy Learning for dynamic tasks

    • long horizon forecasting based on the iteration of a LL one-step-ahead predictor.

    • Nonlinear control

      • Lazy Learning inverse/forward control.

      • Lazy Learning self-tuning control.

      • Lazy Learning optimal control.


    Dynamic benchmarks

    Dynamic benchmarks

    • Multi-step-ahead prediction:

      • Benchmarks: Mackey Glass and 2 Santa Fe time series

      • Referential methods: recurrent neural networks.

    • Nonlinear identification and adaptive control:

      • Benchmarks: Narendra nonlinear plants and bioreactor.

      • Referential methods: neuro-fuzzy controller, neural controller, linear controller.


    Santa fe time series

    Santa Fe time series

    Task: predict the continuation of the series for the next 100 steps.


    Data mining how to make islands of knowledge emerging out of oceans of data

    Lazy Learning prediction

    LL is able to predict the abrupt change around t =1060 !


    Awards in international competitions

    Awards in international competitions

    • Data analysis competition: awarded as a runner-up among 21 participants at the 1999 CoIL International Competitionon Protecting rivers and streams by monitoring chemical concentrations and algae communities.

    • Time series competition: ranked second among 17 participants to the International Competition on Time Series organized by the International Workshop on Advanced Black-box techniques for nonlinear modeling in Leuven, Belgium


    Pragmatic conclusions

    Pragmatic conclusions

    • Comprehension --> decision tree, fuzzy logic

    • Precision:

      • global model

      • lazy methods

  • You need to be determined on why you do models.


  • Login