Radial basis function network and support vector machine
Download
1 / 68

Radial Basis Function Network and Support Vector Machine - PowerPoint PPT Presentation


  • 497 Views
  • Uploaded on

Radial Basis Function Network and Support Vector Machine. Team 1: J-X Huang, J-H Kim, K-S Cho 2003. 10. 29. Outline. Radial Basis Function Network Introduction Architecture Learning Strategies MLP vs RBFN Support Vector Machine Introduction VC Dimension, Structural Risk Minimization

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Radial Basis Function Network and Support Vector Machine' - rhett


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Radial basis function network and support vector machine

Radial Basis Function Networkand Support Vector Machine

Team 1: J-X Huang, J-H Kim, K-S Cho

2003. 10. 29


Outline
Outline

  • Radial Basis Function Network

    • Introduction

    • Architecture

    • Learning Strategies

    • MLP vs RBFN

  • Support Vector Machine

    • Introduction

    • VC Dimension, Structural Risk Minimization

    • Linear Support Vector Machine

    • Nonlinear Support Vector Machine

    • Conclusion


Radial functions
Radial Functions

  • Characteristic Feature

    • Response decreases (or increase) monotonically with distance from a central point.


Radial basis function network
Radial Basis Function Network

  • A kind of supervised neural networks, a feedforward network with three layers

  • Approximate function with linear combination of Radial basis functions

    F(x) = S wi G(||x-xi||) i = 1, 2, … , M

  • G(||x-xi||) is Radial Basis Function

    • Mostly Gaussian function

  • When M=number of sample, regularization network

  • When M<number of sample, we call it Radial-basis function network


Architecture

1

wo

x1

x2

w1

...

...

wj

...

Xp-1

wm

Xp

Output

layer

Hidden layer of

Radial basis

Functions

Input layer

Architecture


Three layers
Three Layers

  • Input layer

    • Source nodes that connect to the network to its environment

  • Hidden layer

    • Each hidden unit (neuron) represents a single radial basis function

    • Has own center position and width (spread)

  • Output layer

    • Linear combination of hidden functions


Radial basis function
Radial Basis Function

m

f(x) =  wjhj(x)

j=1

hj(x)= exp( -(x-cj)2 / rj2 )

Where cj is center of a region,

rj is width of the receptive field


Simple summary on rbfn
Simple Summary on RBFN

  • A Feedforward Network

  • A linear model with a radial basis function

  • Three layers:

    • Input layer, hidden layer, output layer

  • Each hidden unit

    • Represents a single radial basis function

    • Has own center position and width (spread)

  • Parameter

    • Center, breath, weight



Design
Design

  • Require

    • Number of radial basis neurons

    • Selection of the center of each neuron

    • Selection of the each breath (width) parameter


Number of radial basis neurons
Number of Radial Basis Neurons

  • Decide by designer

  • Max of neurons = number of input

  • Min of neurons will be experimentally determined

  • More neurons

    • More complex, but smaller tolerance

  • Spread: the selectivity of the neuron




Learning strategies
Learning Strategies

  • Two Levels of Learning

    • Center and spread learning (or determination)

    • Output layer weights learning

  • Fixed Center Selection

  • Self-organizing Center Selection

  • Supervised Selection of Centers with Weights

  • Make # (parameter) small as possible

    • Principles of dimensionality


Fixed center selection
Fixed Center Selection

  • Fixed RBFs of the hidden units

    • The locations of the centers may be chosen randomly from the training data set.

    • We can use different values of centers and widths for each radial basis function -> experimentation with training data is needed.

  • Only ouput layer weight is need to be learned

    • Obtain the value of the output layer weight by pseudo-inverse method

    • Main problem: require a large training set for a satisfactory level of performance


Self organized selection of center
Self-Organized Selection of Center

  • Self-organized learning of centers by means of clustering

    • Clustering on the Hidden Layer

    • K-means clustering

      • Initialization

      • Sampling

      • Similarity matching

      • Updating

      • Continuation


Self organized selection of center cont
Self-Organized Selection of Center (cont.)

  • Setting spreads

    • By selecting the average distance between center and the c closest points in the cluster (e.g. c=5)

  • Supervised learning on the output Layer

    • Estimate the connection weights w by the iterative gradient descent method based on least squares


Supervised selection of centers
Supervised Selection of Centers

  • All free parameters are changed by supervised learning process

  • The center is selected with the weight learning

  • Error-correction learning using least mean square (LMS) algorithm

  • Training for centers and spreads is very slow


Learning formula
Learning Formula

  • Linear weights (output layer)

  • Positions of centers (hidden layer)

  • Spreads of centers (hidden layer)


Approximation
Approximation

  • RBF: Local network

    • Only inputs near a receptive field produce an activation

    • Can give “don’t know” output

  • MLP: Global network

    • All inputs cause an output






Outline1
Outline

  • Radial Basis Function Network

    • Introduction

    • Radial Basis Function

    • Model

    • Training

  • Support Vector Machine

    • Introduction

    • VC Dimension, Structural Risk Minimization

    • Linear Support Vector Machine

    • Nonlinear Support Vector Machine

    • Conclusion


Introduction
Introduction

  • Objective

    • Find an optimal hyperplane to:

      • Classify data points as much as possible

      • Separate the points of two classes as far as possible

  • Approach

    • Formulate a constrained optimization problem

    • Solve it using constrained quadratic programming (constrained QP)

  • Theorem

    • Structural Risk Minimization


Key idea transform to higher dimensional space
Key Idea: Transform to Higher Dimensional Space


Find the optimal hyperplan

Maximum

margin

hyperplan

optimal

hyperplan

hyperplan

Find the Optimal Hyperplan



Description on svm
Description on SVM

  • Given

    • A set of data points belong to either of two classes

  • SVM: Finds the Optimal Hyperplane

    • Minimizes the risk of misclassifying the training samples and unseen test samples

    • Maximizing the distance of either class from the hyperplane


Outline2
Outline

  • Introduction

  • VC Dimension, Structural Risk Minimization

  • Linear Support Vector Machine

  • Nonlinear Support Vector Machine

  • Conclusion


Upper bound for expected risk
Upper Bound for Expected Risk

  • Minimize the Expected Risk

    • Minimize the h: VC dimension

    • Minimize the empirical risk


Vc dimension and empirical risk

True Risk

Classification Error

underfitting

overfitting

Confidence Interval

Empirical Risk

h(VC-dim.)

VC Dimension and Empirical Risk

  • Empirical Risk is Decreasing Function of VC Dimension

    • Need a principled methods for the minimization


Structural risk minimization
Structural Risk Minimization

  • Why Structural Risk Minimization (SRM)

    • It is not enough to minimize the empirical risk

    • Need to overcome the problem of choosing an appropriate VC dimension

  • SRM Principle

    • To minimize the expected risk, both sides in VC bound should be small

    • Minimize the empirical risk and VC confidence simultaneously

    • SRM picks a trade-off in between VC dimension and empirical risk


Outline3
Outline

  • Introduction

  • VC Dimension, Structural Risk Minimization

  • Linear Support Vector Machine

  • Nonlinear Support Vector Machine

  • Performance and Application

  • Conclusion


Separable case
Separable Case

  • Set S is Linearly Separable, then

  • The same as


Canonical optimal hyperplane
Canonical Optimal Hyperplane

w: normal to the hyperplan; is inverse proportion to the perpendicular distance from the hyperplane to the origin






Kernels
Kernels

  • Idea

    • Use a transformation (x) from input space to higher dimensional space

    • Find the separating hyperplane, make the inverse transformation

  • Kernel: dot product in a Banach space

  • Mercer’s Condition


Kernels for nonlinear svms example
Kernels for Nonlinear SVMs: Example

  • Polynomial Kernels

  • Neural Network Like Kernel

  • Radial Function Kernel



Conclusion
Conclusion

  • Advantages

    • Efficient training algorithm (vs. multi-layer NN)

    • Represent complex and nonlinear functions (vs. single-layer NN)

    • Always find a global minimum

  • Disadvantages

    • Solution usually cubic in the number of training data

    • Large training set is a problem




Introduction1
Introduction

  • Radial Basis Function Network

    • A class of single hidden layer feedforward networks

    • Activation functions for hidden units are defined as radially symmetric basis functions such as the Gaussian function.

  • Advantages over Multi-Layer perceptron

    • Faster convergence

    • Smaller extrapolation errors

    • Higher reliability


Two typical radial functions
Two Typical Radial Functions

  • Multi quaric RBF and Gaussian RBF



Comparisons
Comparisons

  • Neural Network


Comparisons 2
Comparisons (2)

  • Linear Models


Comparisons 3
Comparisons (3)

  • The Perceptron

  • Multi-Layer Perceptrons (Feedforward Neural Networks)


Comparisons 4
Comparisons (4)

  • Radial Basis Function Network

    • One hidden layer of basis functions, or neurons

    • At the input of each neuron, the distance between the neuron center and the input vector is calculated.

    • The output of the neuron is then formed by applying the basis function to this distance

    • The RBF network output is formed by a weighted sum of the neuron outputs and the unity bias shown




Classification problem
Classification Problem

  • Input space X

  • output space Y

    • For classification Y={+1, -1}

  • Assume there is an (unknown) probability distribution P on X£Y.

  • Data D={(Xi, Yi)|i-=1…,n} is observed identically and independently according to P.

  • Goal: Construct g:X!Y that predicts Y from X.


Expected risk and empirical risk
Expected Risk and Empirical Risk

  • Expected Risk

    R(g)=P(g(X)¹Y)=E(1g(X)¹ Y)

    • P is unknown so that we cannot compute this.

  • Empirical Risk

    Rn(g)=åi 1g(X)¹ Y/n

    • Dependent upon the data set.

  • Task

    • Minimize the expected risk which is unknown


Vc dimension
VC Dimension

  • Supposes we have n data points to be labeled to two class, we have SG(n)·2n

  • When SG(n)=2n, G can generates any classification on (some set of) n points. In other words, Gshatters n points.


Vc dimension 2
VC Dimension (2)

  • The VC dimension (after Vapnik-Chervonenkis) is defined as the largest n such that SG(n)=2n.

    • It is a simplest measure of classifier complexity/capacity.

    • VC dim=n doesn’t mean that G can shatter every data set of size n.

    • VC dim=n does mean that G can shatter some data set of size n.


Vc dimension example
VC Dimension: Example

  • Is VC dimension == number of parameters

  • In Rd, VC dimension of {all hyperplanes} is d+1.

  • For any d+1 points in general position we can find hyperplanes shattering them.

  • For d+2 points, hyperplanes cann’t shatter them.

  • Hyperplanes are given by a1x1+…+adxd+a0=0,

  • Is VC dimension == number of parameters?

  • The Answer is No: An Example

    • Let G={sgn(sin(tx)) | t2R}, X=R.

    • We can prove VC-dim(G)=1, even though G is a one-parameter family!


Vc sv bounds and the actual risk
VC, SV Bounds and the Actual Risk

  • The VC bound can be predictive even when loose


Structural risk minimization 2
Structural Risk Minimization (2)

  • Introduce “structure”

    • Dividing the entire class of functions into nested subsets

    • For each subset, compute h or a bound of h

    • Finding that subset of functions which minimizes the bound on the actual risk

  • Picking a trade-off in between VC dimension and empirical risk


Optimal margin hyperplane algorithm
Optimal Margin Hyperplane Algorithm

  • Choose y=1 for positive labels and y=-1 for negative labels

  • Problem: minimize

  • Dual formulation: maximize L as a function of i with the constrains:


Optimal margin hyperplane algorithm cont
Optimal Margin Hyperplane Algorithm (cont.)

  • Transformed problem: maximize

  • Karush-Kuhn-Tucker conditions as extremum:

  • Separating surface


Soft margin hyperplane1
Soft Margin Hyperplane

  • Minimize

  • Dual formulation: maximize L as a function of i


Soft margin hyperplane cont
Soft Margin Hyperplane (cont.)

  • Karush-Kuhn-Tuker conditions as extremum:

  • Final optimization problem: miximize L as function of i

  • Separating surface