1 / 17

The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method

The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method. C.H Chou, B.H Kuo & F Chang Institute of Information Science, Academia Sinica, Taipei, Taiwan. ICPR 2006. Abstract. we propose a new data reduction algorithm GCNN( Generalized Condensed Nearest Neighbor ) .

tangia
Download Presentation

The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Generalized Condensed Nearest Neighbor Rule as A Data Reduction Method C.H Chou, B.H Kuo & F Chang Institute of Information Science, Academia Sinica, Taipei, Taiwan ICPR 2006

  2. Abstract • we propose a new data reduction algorithm GCNN( Generalized Condensed Nearest Neighbor ) . • It is called the GCNN algorithm because it has weak criterion employed than CNN. • GCNN, moreover, can yield significantly better accuracy than other instance-based data reduction methods. • We demonstrate the last claim through experiments on five datasets, some of which contain a very large number of samples.

  3. Introduction • Most data reduction schemes are based on certain prototype learning methods which can be divided into two types. 1. Instance-based learning (IBL) algorithms - CNN, GCNN 2. Clustering-based learning (CBL) algorithms - k-means clustering algorithm - fuzzy c-means algorithm

  4. Advantage of GCNN • It incorporates CNN as a special case and can outperform CNN. • Under certain conditions, GCNN is consistent. • One of the above conditions requires that any two sets of data with different labels have a positive separation. • GCNN creates prototypes for all labels simultaneously, in contrast to SVM, which creates support vectors for one pair of labels at a time.

  5. positive distance VS margin

  6. CNN Algorithm • Our goal is to extract a subset Un from Xn such that if u is the nearest member of Un to xi, then l(u) = yi, where l(u)is the label of u. • Members of Un are called prototypes set. • u is called prototype. • Samples that match in label with their nearest prototypes are said to be absorbed. • Two labeled entities are homogeneous if they have the same label, and heterogeneous otherwise.

  7. CNN Algorithm • For CNN, a sample x is absorbed if • (1) • where p and q are prototypes: p is the nearest homogeneous prototype to x, and q is the nearest heterogeneous prototype to x.

  8. GCNN Algorithm • For GCNN, a sample x is absorbed if • (2) • We say that a sample is weakly absorbed if it satisfies (1), and strongly absorbed if it satisfies (2). Note that (1) corresponds to the case when ρ=0 in (2) ρδn

  9. GCNN Algorithm • S1 Initiation: For each label y, randomly select a y-sample as a new y-prototype. • S2 Absorption Check: Check whether all samples have been strongly absorbed. If so, terminate the process; otherwise, proceed to the next step. • S3 Prototype Augmentation: For each y, if there are any unabsorbed y-samples, randomly select one as a new y-prototype; otherwise, no new prototype is added to label y. Proceed to step S2.

  10. KNN

  11. CNN

  12. GCNN

  13. GCNN VS. CNN

  14. Datasets

  15. Experimental Results

  16. Experimental Results

  17. Experimental Results

More Related