1 / 2

Improving accuracy by weighting Euclidean distance

yovela
Download Presentation

Improving accuracy by weighting Euclidean distance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A useful feature of iSAX is that it can trivially support weighted queries, unlike DCT, DFT, SVD and CHEB. Weighted queries can be supported by DWT and APCA only with some difficulty. While it has been suggested that weighting might improve classification accuracy of time series, empirical results are lacking. We repair that omission here. The largest time series classification problem we are aware of is the EMP data [A]. The problem is divided into a 2,000/18,000 train/test split of time series which are 2,000 data points long. We indexed the test set with an iSAX word length of eight, and conducted a simple search to find the best weighting parameters. The search algorithm used a Euclidean distance function that could weight the error contribution of the eight corresponding regions the test set. For example with a weight vector of [1 2 1 1 1 1 1 1], the first 250 points are given weight ‘1’, but the next 250 datapoints are given weight ‘2’, etc. Starting with a weight vector of [1 1 1 1 1 1 1 1], we did a greedy forward search where the only operator was to double a weight. On the test set, the error rate started at 12.3% and improved to 4.2% with a weight vector of [2 1 1 32 1 2 1 2]. Using a trivial modification of our exact search algorithm modified to allow such weight queries we quickly determined the utility of weighted indexing. Without weighting the error rate on the holdout set was 13.3%, but using our learned weight vector, the error rate significantly improved to 4.5%. A detailed trace of the search may be found on the next slide. Note that as stated above, our search algorithm can only handle weight vectors that are exactly the length of the iSAX words. This limitation can be removed by a simple lower bounding technique, we omit details for brevity. The next slide shows a trace of the search. [a] Jeffery, C. (2005). Synthetic Lightning EMP Data. http://public.lanl.gov/eads/datasets/emp/index.html Los Alamos National Laboratory

  2. No Improvement @ Pos:0, weight:2, new acc:0.8715 No Improvement @ Pos:1, weight:2, new acc:0.8705 No Improvement @ Pos:2, weight:2, new acc:0.869 Pos:3, weight:2, new acc:0.9145 No Improvement @ Pos:4, weight:2, new acc:0.849 No Improvement @ Pos:5, weight:2, new acc:0.87 No Improvement @ Pos:6, weight:2, new acc:0.867 No Improvement @ Pos:7, weight:2, new acc:0.8625 Improving accuracy by weighting Euclidean distance Original Weight Vector [1 1 1 1 1 1 1 1] acc: 0.877 [2 1 1 1 1 1 1 1] [1 2 1 1 1 1 1 1] [1 1 2 1 1 1 1 1] [1 1 1 2 1 1 1 1] [1 1 1 1 2 1 1 1] [1 1 1 1 1 2 1 1] [1 1 1 1 1 1 2 1] [1 1 1 1 1 1 1 2] [2 1 1 2 1 1 1 1] [1 2 1 2 1 1 1 1] [1 1 2 2 1 1 1 1] [2 1 1 321212] Pos:3, weight:32, new acc:0.9575 After searching though the training set, the training accuracy went from 0.877 to 0.9575. Does this generalize? The accuracy on the 18,000 object test set was 0.867 without weighting, but 0.955 with weighting. Look in the notes section of this slide to see a complete trace of the search.

More Related