Fair Use Agreement
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

Fair Use Agreement PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Fair Use Agreement. This agreement covers the use of this presentation, please read carefully. You may freely use these slides for teaching, if You send me an email telling me the class number/ university in advance.

Download Presentation

Fair Use Agreement

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Fair use agreement

Fair Use Agreement

  • This agreement covers the use of this presentation, please read carefully.

  • You may freely use these slides for teaching, if

    • You send me an email telling me the class number/ university in advance.

    • My name and email address appears on the first slide (if you are using all or most of the slides), or on each slide (if you are just taking a few slides).

  • You may freely use these slides for a conference presentation, if

    • You send me an email telling me the conference name in advance.

    • My name appears on each slide you use.

  • You may not use these slides for tutorials, or in a published work (tech report/ conference paper/ thesis/ journal etc). If you wish to do this, email me first, it is highly likely I will grant you permission.

    • Please get in contact with Prof. Eamonn Keogh, [email protected]

    • (C) {Ken Ueno, Eamonn Keogh, Xiaopeng Xi}, University of California, Riverside

Anytime classification using the nearest neighbor algorithm with application s to stream mining

Draft ver. 12/12/2006

Anytime Classification Using the Nearest Neighbor Algorithm with Applications to Stream Mining

Ken Ueno Toshiba Corporation, Japan

( Visiting PostDoc Researcher at UC Riverside )

Xiaopeng Xi

Eamonn Keogh

Dah-Jye Lee Brigham Young University, U.S.A.

University of California, Riverside, U.S.A.

Outline of the talk

Outline of the Talk

  • Motivation & BackgroundUsefulness of the anytime nearest neighbor classifierfor real world applications including fish shape recognition.

  • Anytime Nearest Neighbor Classifier (ANNC)

  • SimpleRank, the critical ordering method for ANNCHow can we convertconventional nearest neighbor classifierinto the anytime version? What’s the critical intuition?

  • Empirical Evaluations

  • Conclusion

Case study fish recognition application for video monitoring system



























2.0 sec

27.0 sec

Case Study: Fish Recognition- Application for Video Monitoring System -

Preliminary experiments with Rotation-Robust DTW [Keogh 05]

Time intervals tend to vary among fish appearances

Anytime Classifiers

Plausible for Streaming Shape Recognition

Real world problems for data mining

Real World Problems for Data Mining

  • When will it be finished?

  • Challenges for Data Mining in Real World Applications.

    • Accuracy / Speed  Trade Off

    • Limited memory space

    • Real time processing

  • Best-so-far Answer Available anytime?

Motion Search

Fish Migration

Biological Shape Recognition

Multimedia Intelligence

Medical Diagnosis

Anytime algorithms

3. Continue

If you want

2. Peek the results

Anytime Algorithms

  • Trading execution time for quality of results.

  • Always has a best-so-far answer available.

  • Quality of the answer improves with execution time.

  • Allowing users to suspend the process during execution, and keep going if needed.

1. Suspend

Anytime characteristics

Anytime Characteristics

  • InterruptabilityAfter some small amount of setup time, the algorithm can be stopped at anytime and provide an answer

  • Monotonicity

    The quality of the result is a non-decreasing function of computation time

  • Diminishing returns

    The improvement in solution quality is largest at the early stages of computation, and diminishes over time

  • Measurable Quality

    The quality of an approximate result can be determined

  • PreemptabilityThe algorithm can be suspended and resumed with minimal overhead

[Zilberstein and Russell 95]

Bumble bee s anytime strategy

Bumble Bee’s Anytime Strategy

To survive

I can perform the best judgment

for finding real nectars

like “anytime learning” !

“Bumblebees can choose wisely or rapidly, but not both at once.”

Lars Chittka, Adrian G. Dyer, Fiola Bock, Anna Dornhaus, Nature Vol.424, 24 Jul 2003, p.388

Big Question:

How can we make classifiers wiser / more rapid

like bees?

Nearest neighbor classifiers

Nearest Neighbor Classifiers

Anytime Algorithm + Lazy Learning


  • To the best of our knowledge there is no “Anytime Nearest Neighbor Classifier” so far.

  • Inherently familiar with similarity measures.

  • Easily handle time series data by using DTW.

  • Robust & accurate

Nearest neighbor classifiers1

a query instance

the k instances

estimated class of

a set of class labels

# of nearest neighbors

Nearest Neighbor Classifiers

  • Instance-based, lazy classification algorithm based on training exemplars.

  • Giving the class label of the closest training exemplars with unknown instance based on a certain distance measure.

  • As for k-Nearest Neighbor (k-NN) we give the answer by voting.

How can we convert it into anytime algorithm?

Designing the anytime nearest neighbor

How can we make

good Index for

training data?

Designing the anytime Nearest Neighbor



(Constant Time)



Plug-in design for any ordering method

Tentative solution for good ordering

Numerosity Reduction: S must be decidablebefore classification

Anytime Preprocessing: S does not need to be decidablebefore classification

Static  Dynamic

Keypoint: in terms of interrupting time S

Tentative Solution for good ordering

  • Ordering Training Data is critical.

  • Critical points for classification results

  • best first or worst last? put non-critical points last.

  • Numerosity Reduction can partially be the good ordering solutions. The problem is very similar to ordering problem for anytime algorithms.

  • Leave-one-out (k=1) within training data

Jf two class classification problem

JF:two-class classification problem

  • 2-D Gaussian ball

  • Hard to classify correctly because of the round shape.

  • We need non-linear and fast-enough classifier.

Class A

Class B

We cannot use dp for jf problem

Dynamic Programming (DP)

ans(n-1)  ans(n)

We cannot use DP for JF problem

DP is locally optimal.

Ideal Tessellations heavily depend on entire feature space.

Captures the entire classification boundaries in the early stage.

Numerosity reduction

Numerosity reduction

  • Scoring strategy: similar to Numerosity Reduction

    • Random Ranking (baseline)

    • DROP Algorithms [Wilson and Martinez 00]

      Weighting based on enemies / associates for Nearest Neighbor

      DROP1, DROP2,DROP3

  • NaïveRank AlgorithmsSorting based on leave-one-out with 1-Nearest Neighbor

Simplerank ordering

SimpleRank Ordering

based on NaïveRank Algorithm [Xi and Keogh 06]

Sorting by leave-one-out with 1-Nearest Neighbor

NaiveRank Anytime Framework + SimpleRank

1. order training instances by the unimportancemeasure

2. sort it in reverse order.

Observation 1

Penalizing the close instance with the different class label.

Observation 2

Adjust the penalty weights with regard to the num. of Classes

How simplerank works

How SimpleRank works.

Ranking process on JF Dataset

by Simple Rank

Voronoi Tessellation on JF Dataset

Movie ( T = 1 … 50 )


SimpleRank Random Rank


Click here to start movie

wrong class estimation area

Empirical evaluations

Empirical Evaluations

fair evaluations based on diverse kinds of datasets

All of the datasets are public and available for everyone!

UCI ICS Machine Learning Data Archive

UCI KDD Data Archive

UCR Time Series Data Mining Archive

K 1 voting records




Random Test

SimpleRank Test

BestDrop Test









Number of instances seen before interruption, S

K=1: Voting Records

10-fold Cross Validation,





K 1 forest cover type

K=1: Forest Cover Type

Accuracy (%)

# of instances seen before interruption

K 1 3 5 australian credit

K=1,3,5 Australian Credit

10-CV, Euclidean

Accuracy (%)

Australian Credit dataset

# of instances seen before interruption

Preliminary Results in our experiments

K 1 two patterns

K=1 Two Patterns

- Time Series Data -

Future research directions

Future Research Directions

  • Make ordering+sorting much faster O(n log n) for sorting + α

  • Handling Concept Drift

  • Showing Confidence

Conclusion and summary

Conclusion and Summary

  • Our Contributions: - New framework for Anytime Nearest Neighbor.- SimpleRank: Quite simple but critically good ordering.

  • So far our method has achieved the highestaccuracy in diverse datasets.

  • Demonstrates the usefulness for shape recognitionin Stream Video Mining.

Good Job!

This is the best-so-far ordering method familiar with anytime Nearest Neighbor!





Dr. Agenor Mafra-Neto, ISCA Technologies, Inc

Dr. Geoffrey Webb, Monash University

Dr. Ying Yang, Monash University

Dr. Dennis Shiozawa, BYU

Dr, Xiaoqian Xua, BYU

Dr. Pengcheng Zhana, BYU

Dr. Robert Schoenberger, Agris-Schoen Vision Systems, Inc

Jill Brady, UCR

NSF grant IIS-0237918

Many Thanks!!

Fair use agreement


Thank you for your attention.

Any Question?

  • Login