slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Identifying Boundary of Different Classes of Objects PowerPoint Presentation
Download Presentation
Identifying Boundary of Different Classes of Objects

Loading in 2 Seconds...

play fullscreen
1 / 71

Identifying Boundary of Different Classes of Objects - PowerPoint PPT Presentation


  • 113 Views
  • Uploaded on

Data Classification with the Radial Basis Function Network Based on a Novel Kernel Density Estimation Algorithm Yen-Jen Oyang Department of Computer Science and Information Engineering National Taiwan University. Identifying Boundary of Different Classes of Objects. Boundary Identified.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Identifying Boundary of Different Classes of Objects' - ramiro


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Data Classification with the Radial Basis Function Network Based on a Novel Kernel Density Estimation AlgorithmYen-Jen OyangDepartment of Computer Science and Information EngineeringNational Taiwan University

the proposed rbf network based classifier
The Proposed RBF Network Based Classifier
  • The proposed algorithm constructs one RBF network for approximating the probability density function of one class of objects.
  • Classification of a new object is conducted based on the likelihood function:
rule generated by the proposed rbf radial basis function network based learning algorithm
Rule Generated by the Proposed RBF(Radial Basis Function) Network Based Learning Algorithm

Let and

If

then prediction=“O”.

Otherwise prediction=“X”.

problem definition of kernel smoothing
Problem Definition of Kernel Smoothing
  • Given the values of function at a set of samples . We want to find a set of symmetric kernel functions and the corresponding weights such that
kernel smoothing with the spherical gaussian functions
Kernel Smoothing with the Spherical Gaussian Functions
  • Hartman et al. showed that a linear combination of spherical Gaussian functions can approximate any function with arbitrarily small error.
  • “Layered neural networks with Gaussian hidden units as universal approximations”, Neural Computation, Vol. 2, No. 2, 1990.
problem definition of kernel density estimation
Problem Definition of Kernel Density Estimation
  • Assume that we are given a set of samples taken from a probability distribution in a d-dimensional vector space. The problem now is how to find a linear combination of kernel functions that approximate the probability density function of the distribution?
slide11
The value of the probability density function at a vector can be estimated as follows:

where n is the total number of samples, is the distance between vector and its k-th nearest samples, and

is the volume of a sphere with radius =

in a d-dimensional vector space.

the existing approaches for kernel smoothing with spherical gaussian functions
The Existing Approaches for Kernel Smoothing with Spherical Gaussian Functions
  • One conventional approach is to place one Gaussian function at each sample. As a result, the problem becomes how to find

for each sample such that

slide14
The most widely-used objective is to minimize

where are test samples and S is the set of training samples.

  • The conventional approach suffers high time complexity, approaching , due to the need to compute the inverse of a matrix.
slide15
M. Orr proposed a number of approaches to reduce the number of units in the hidden layer of the RBF network.
  • Beatson et. al. proposed O(nlogn) learning algorithms using polyharmonic spline functions.
an o n algorithm for kernel smoothing
An O(n) Algorithm for Kernel Smoothing
  • In the proposed learning algorithm, we assume uniform sampling. That is, samples are located at the crosses of an evenly-spaced grid in the d-dimensional vector space. Let denote the distance between two adjacent samples.
  • If the assumption of uniform sampling does not hold, then some sort of interpolation can be conducted to obtain the approximate function values at the crosses of the grid.
the basic idea of the o n kernel smoothing algorithm
The Basic Idea of the O(n) Kernel Smoothing Algorithm
  • Under the assumption that the sampling density is sufficiently high, i.e. , we have the function values at a sample and its k nearest samples, , are virtually equal. That is, .
  • In other words, is virtually a constant function equal to in the proximity of
slide21
In the 1-D example, samples at located at

, where i is an integer.

  • Under the assumption that , we have

and

  • The issue now is to find appropriate and

such that

slide24
In fact, it can be shown that with ,

is bounded by

  • Therefore, we have the following function approximator:
generalization of the 1 d kernel smoothing function
Generalization of the 1-D Kernel Smoothing Function
  • We can generalize the result by setting , where is a real number.
  • The table on the next page shows the bounds of

with various values.

the smoothing effect
The Smoothing Effect
  • The kernel smoothing function is actually a weighted average of the sampled function values. Therefore, selecting a larger value implies that the smoothing effect will be more significant.
  • Our suggestion is set
an example of the smoothing effect
An Example of the Smoothing Effect

The smoothing effect

Elimination of the smoothing effect with a compensation procedure

compensation of the smoothing effect and handling of random noises
Compensation of the Smoothing Effect and Handling of Random Noises
  • Let denote the observed function value at sample , where

is the random noise due to the sampling procedure.

  • The expected value of the random noise at each sample is 0.
the general form of a kernel smoothing function in the multi dimensional vector space
The General Form of a Kernel Smoothing Function in the Multi-Dimensional Vector Space
  • Under the assumption that the sampling density is sufficiently high, i.e. , we have the function values at a sample and its k nearest samples, , are virtually equal. That is, .
slide34
As a result, we can expect that

where are the weights and bandwidths of the Gaussian functions located at , respectively.

slide35
Since the influence of a Gaussian function decreases exponentially as the distance increases, we can set k to a value such that, for a vector in the proximity of sample , we have
slide36
Since we have

our objective is to find and such that

slide37
Let

Then, we have

slide39
Therefore, with ,

is virtually a constant function and

  • Accordingly, we want to set
slide40
Finally, by setting uniformly to , we obtain the following kernel smoothing function that approximates f(v):
application in data classification
Application in Data Classification
  • One of the applications of the RBF network is data classification.
  • However, recent development in data classification focuses on the support vector machines (SVM), due to accuracy concern.
  • In this lecture, we will describe a RBF network based data classifier that can delivers the same level of accuracy as the SVM and enjoys some advantages.
the proposed rbf network based classifier1
The Proposed RBF Network Based Classifier
  • The proposed algorithm constructs one RBF network for approximating the probability density function of one class of objects based on the kernel smoothing algorithm that we just presented.
the proposed kernel density estimation algorithm for data classification
The Proposed Kernel Density Estimation Algorithm for Data Classification
  • Classification of a new object is conducted based on the likelihood function:
slide45
Let us adopt the following estimation of the value of the probability density function at each training sample:
slide46
In the kernel smoothing problem, we set the bandwidth of each Gaussian function uniformly to , where is the distance between two adjacent training samples.
  • In the kernel density estimation problem,

for each training sample, we need to determine the average distance between two adjacent training samples of the same class in the local region.

slide47
In the d-dimensional vector space, if the average distance between samples is ,

then the number of samples in a subspace

of volume V is approximately equal to

  • Accordingly, we can estimate by
slide48
Accordingly, with the kernel smoothing function that we obtain earlier, we have the following approximate probability density function for class-m objects:
slide49
An interesting observation is that, regardless of the value of ,

we have .

  • If the observation holds generally, then
slide50
In the discussion above, is defined to be the distance between sample and its nearest training sample.
  • However, this definition depends on only one single sample and tends to be unreliable, if the data set is noisy.
  • We can replace with
parameter tuning
Parameter Tuning
  • The discussions so far are based on the assumption that the sampling density is sufficiently high, which may not hold for some real data sets.
slide53
One may wonder how should be set.
  • According to our experimental results, the value of has essentially no effect, as long as is set to a value within .
time complexity
Time Complexity
  • The average time complexity to construct a RBF network is

if the k-d tree structure is employed, where n is the number of training samples.

  • The time complexity to classify c new objects with unknown class is
data reduction
Data Reduction
  • As the proposed learning algorithm is instance-based, removal of redundant training samples will lower the complexity of the RBF network.
  • The effect of a naïve data reduction mechanism was studied.
  • The naïve mechanism removes a training sample, if all of its 10 nearest samples belong to the same class as this particular sample.
appendix
Appendix
  • Let

where R and R are two coefficients and yR.

  • We have
slide62
Since q(y) is a symmetric and periodical function, if we want to find the global maximum and minimum values of q(y), we only need to analyze q(y) within interval
  • Let y0 and ,

where n 1 and 0 j < n 1 are integers,

and

  • We have
slide64
Let
  • Since is an

increasing function for

t [(h 1), (h + 1)],

and is a decreasing function for

t [(h 1), (h + 1)],

we have

slide65
(i)
  • (ii)
slide68
If , then we have for any
  • On the other hand, if , then any