On reducing classifier granularity in mining concept drifting data streams
Download
1 / 20

On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams. Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi Proc. of the Fifth IEEE International Conference on Data Mining (ICDM ’ 05). Speaker: Yu Jiun Liu Date : 2006/9/26. Introduction. State of the art

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams' - saima


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
On reducing classifier granularity in mining concept drifting data streams

On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams

Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi

Proc. of the Fifth IEEE International Conference on Data Mining (ICDM’05)

Speaker: Yu Jiun Liu

Date : 2006/9/26


Introduction
Introduction Concept-Drifting Data Streams

  • State of the art

    • The incrementally updated classifiers.

    • The ensemble classifiers.

  • Model Granularity

    • Traditional : monolithic

    • This paper : semantic decomposition


Motivation
Motivation Concept-Drifting Data Streams

  • The model is decomposable into smaller components.

  • The decomposition is semantic-aware in the sense.


Monolithic models
Monolithic Models Concept-Drifting Data Streams

  • Stream :

  • Attributes :

  • Class Label :

  • Window :

  • Model (Classifier) :Ci


Rule based models
Rule-based Models Concept-Drifting Data Streams

  • A rule form :

  • minsup = 0.3 and minconf = 0.8

  • Valid rules of W1 are:

  • Valid rules of W3 are:


Algorithm
Algorithm Concept-Drifting Data Streams

  • Phase 1 : Initialization

    • Use the first w records to train all valid rules for window W1.

    • Construct the RS-tree and REC-tree.

  • Phase 2 : Update

    • When record arrives, insert it into the REC-tree and update the sup. and conf. of the rules matched by it.

    • Delete oldest record and update the value matched by it.


Data structure
Data Structure Concept-Drifting Data Streams


Rs tree
RS-Tree Concept-Drifting Data Streams

  • A prefix tree with attribute order

  • Each node N represents a unique rule R : P  Ci

  • N’ (P’  Cj) is a child node of N, iff:


Rec tree
REC-Tree Concept-Drifting Data Streams

  • Each record r as a sequence

  • Node N points to rule

    in the RS-tree if :


Detecting concept drifts
Detecting Concept Drifts Concept-Drifting Data Streams

  • percentage V.S. the distribution of the misclassified records.

The percentage approach cannot tell us which part of the classifier gives rise to the inaccuracy.


Definition
Definition Concept-Drifting Data Streams


Finding rule algorithm
Finding Rule Algorithm Concept-Drifting Data Streams


Update algorithm
Update Algorithm Concept-Drifting Data Streams


Experiments
Experiments Concept-Drifting Data Streams

  • CPU : 1.7 GHz

  • Memory : 256MB

  • Datasets : synthetic and real life dataset.

    • Synthetic :

    • Real life dataset :

      • 10,344 recodes and 8 dimensions.


Effect of model updating

Synthetic Concept-Drifting Data Streams

10 dimensions

Window size 5000

4 dimensions changing

Effect of model updating


The relation of concept drifts and
The relation of concept drifts and Concept-Drifting Data Streams


Effect of rule composition
Effect of rule composition Concept-Drifting Data Streams


Accuracy and time
Accuracy and Time Concept-Drifting Data Streams

  • Window size : 10,000

  • EC : 10 classifiers, each trained on 1000 records.

  • Synthetic data.


Real life data
Real life data Concept-Drifting Data Streams


Conclusion
Conclusion Concept-Drifting Data Streams

  • Overcome the effects of concept drifts.

  • By reducing granularity, change detection and model update can be more efficient without compromising classification accuracy.


ad