1 / 22

Dynamic Classifier Selection for Effective Mining from Noisy Data Streams

Dynamic Classifier Selection for Effective Mining from Noisy Data Streams. Xingquan Zhu, Xindong Wu, and Ying Yang Proc. of KDD 2003 2005/3/25 報告人 : 董原賓. Problem. Problem: Many existing data stream mining efforts are based on the Classifier Combination techniques

indiya
Download Presentation

Dynamic Classifier Selection for Effective Mining from Noisy Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Classifier Selection for Effective Mining from Noisy Data Streams Xingquan Zhu, Xindong Wu, and Ying Yang Proc. of KDD 2003 2005/3/25 報告人:董原賓

  2. Problem Problem: • Many existing data stream mining efforts are based on the Classifier Combination techniques • Dramatic concept drift、Significant amount of noise Solution: • Choose the most reliable classifier

  3. Multiple Classifier System(MCS) • MCS assumption: each base classifier has a particular sub-domain from which it is most reliable • Two categories of MCS integration techniques: • Classifier Combination (CC) techniques • All base classifiers are combined to work out the final decision EX:SAM( Select All Majority ) • Classifier Selection (CS) techniques • Select the single best classifier from base classifiers for the final decision

  4. Classifier Selection techniques Two types of CS techniques: • Static Classifier Selection, during the training phase, EX: CVM (Cross Validation Majority) • Dynamic Classifier Selection, during the classification phase, call it “dynamic” because the classifier used critically depends on the test instance itself, EX: DCS_LA (Dynamic Classifier Selection by Local Accuracy)

  5. Definition • Dataset D, training set X, test set Y and evaluation set Z • Nx, Ny and Nz represent the numbers of instances in X, Y and Z respectively • C1,C2,…,CL the L base classifiers from X • The selected best classifier C* to classify each instance Ix in Y

  6. Definition • The instances in D have M attributes A1,A2,…,AM and each attribute A contains ni values V1Ai,…,VniAi • For an attribute Ai ,use its values to partition Z into ni subsets S1Ai,…,SniAi where S1Ai ∪.. ∪ SniAi = Z • IkAi denotes instance Ik’s value on attribute Ai

  7. Attribute-Oriented Dynamic Classifier Selection (AO-DCS) Three steps of AO-DCS: • Partition the evaluation set into subsets by using the attribute values of the instances • Evaluate the classification accuracy of each base classifier on all subsets • For a test instance, use its attribute values to select the corresponding subsets and select the base classifier that has the highest classification accuracy

  8. Partition by attributes

  9. Partition By Attributes Instance IMary Age:<30(S1A) ≧30(S2A) Height:≦ 160 (S1H) 161~180(S2H) ≧ 181 (S3H) Gender:Male(S1G) Female(S2G) S1G : IDave, IJohn S2G : IMary, IMartha, INancy Base Classifier:C1, C2, C3

  10. Evaluate the classification accuracy Partition by attributes Subsets from Attribute Ai L base classifiers

  11. The classification accuracy

  12. Dynamic Classifier Selection

  13. AverageAcy[2] = 0.63 AverageAcy[3] = 0.56 The accuracy of C1 : AverageAcy[1] = (0.8+0.6+0.4) / 3 = 0.6

  14. Applying AO-DCS in Data Steam Mining Steps: • partition streaming data into a series of chunks, S1 , S2 , .. Si,.., each of which is small enough to be processed by the algorithm at one time. • Then learn a base classifier Cifrom each chunk Si

  15. Applying AO-DCS in Data Steam Mining (cont.) • To evaluate all base classifiers (in the case that the number of base classifiers is too large, we can keep only the most recent Kclassifiers) and determine the “best” one for each test instance note: We will dynamically construct an evaluation set Z (using the most recent instances, because they are likely consistent with the current test instances)

  16. Experiment

  17. Experiment

  18. Experiment

  19. Experiment

  20. Experiment

  21. Experiment

  22. Experiment

More Related