New algorithms for efficient high dimensional nonparametric classification
This presentation is the property of its rightful owner.
Sponsored Links
1 / 14

New Algorithms for Efficient High-Dimensional Nonparametric Classification PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on
  • Presentation posted in: General

New Algorithms for Efficient High-Dimensional Nonparametric Classification. Ting Liu, Andrew W. Moore, and Alexander Gray. Overview. Introduction k Nearest Neighbors ( k -NN) KNS1: conventional k -NN search New algorithms for k -NN classification KNS2: for skewed-class data

Download Presentation

New Algorithms for Efficient High-Dimensional Nonparametric Classification

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


New algorithms for efficient high dimensional nonparametric classification

New Algorithms for Efficient High-Dimensional Nonparametric Classification

Ting Liu, Andrew W. Moore, and Alexander Gray


Overview

Overview

  • Introduction

    • k Nearest Neighbors (k-NN)

    • KNS1: conventional k-NN search

  • New algorithms for k-NN classification

    • KNS2: for skewed-class data

    • KNS3: ”are at least t of k-NN positive”?

  • Results

  • Comments


Introduction k nn

Introduction: k-NN

  • k-NN

    • Nonparametric classification method.

    • Given a data set of n data points, it finds the k closest points to a query point , and chooses the label corresponding to the majority.

    • Computational complexity is too high in many solutions, especially for the high-dimensional case.


Introduction kns1

Introduction: KNS1

  • KNS1:

    • Conventional k-NN search with ball-tree.

    • Ball-Tree (binary):

      • Root node represents full set of points.

      • Leaf node contains some points.

      • Non-leaf node has two children nodes.

      • Pivot of a node: one of the points in the node, or the centroid of the points.

      • Radius of a node:


Introduction kns11

Introduction: KNS1

  • Bound the distance from a query point q:

  • Trade off the cost of construction against the tightness of the radius of the balls.


Introduction kns12

Introduction: KNS1

  • recursive procedure: PSout=BallKNN (PSin, Node)

    • PSin consists of the k-NN of q in V ( the set of points searched so far)

    • PSout consists of

      the k-NN of q in

      V and Node


New algorithms for efficient high dimensional nonparametric classification

KNS2

  • KNS2:

    • For skewed-class data: one class is much more frequent than the other.

    • Find the # of the k NN in the positive class without explicitly finding the k-NN set.

    • Basic idea:

      • Build two ball-trees: Postree (small), Negtree

      • “Find Positive”: Search Postree to find k-nn set Possetk using KNS1;

      • “Insert negative”: Search Negtree, use Possetk as bounds to prune nodes far away and to estimate the # of negative points to be inserted to the true nearest neighbor set.


New algorithms for efficient high dimensional nonparametric classification

KNS2

  • Definitions:

    • Dists={Dist1,…, Distk}: the distance to the k nearest positive neighbors of q, sorted in increasing order.

    • V: the set of points in the negative balls visited so far.

    • (n, C): n is the # of positive points in k NN of q.

      C ={C1,…,Cn},

      Ciis # of the negative points in V closer than the ith positive neighbor to q.

    • and


New algorithms for efficient high dimensional nonparametric classification

KNS2

Step 2 “insert negative” is implemented by the recursive function

(nout, Cout)=NegCount(nin, Cin, Node, jparent, Dists)

(nin, Cin) sumarize interesting negative points for V;

(nout, Cout) sumarize interesting negative points for V and Node;


New algorithms for efficient high dimensional nonparametric classification

KNS3

  • KNS3

    • “are at least t of k nearest neighbors positive?”

    • No constraint of skewness in the class.

    • Proposition:

      • Instead of directly compute the exact values, we compute the lower and upper bound, since

m+t=k+1


New algorithms for efficient high dimensional nonparametric classification

KNS3

P is a set of balls from Postree, N consists of balls from Negtree.


Experimental results

Experimental results

  • Real data


Experimental results1

Experimental results

k=9, t=ceiling(k/2),

Randomly pick 1% negative records and 50% positive records as test (986 points)

Train on the reaming 87372 data points


Comments

Comments

  • Why k-NN? Baseline

  • No free lunch:

    • For uniform high-dimensional data, no benefits.

    • Results mean the intrinsic dimensionality is much lower.


  • Login