Tutorial mathematical aspects of neural networks
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

Tutorial: Mathematical Aspects of Neural Networks PowerPoint PPT Presentation


  • 65 Views
  • Uploaded on
  • Presentation posted in: General

Tutorial: Mathematical Aspects of Neural Networks. Barbara Hammer, University of Osnabr ück , Thomas Villmann, University of Leipzig, Germany. Relevance of math for NNs. math is used to develop and present algorithms

Download Presentation

Tutorial: Mathematical Aspects of Neural Networks

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Tutorial mathematical aspects of neural networks

Tutorial:Mathematical Aspects of Neural Networks

Barbara Hammer, University of Osnabrück,

Thomas Villmann, University of Leipzig, Germany

Mathematical aspects of neural networks


Relevance of math for nns

Relevance of math for NNs

  • math is used to

    • develop and present algorithms

      (linear algebra, analysis, statistics, optimization, control theory, statistical physics, differential geometry, ...)

    • investigate applicability, evaluate algorithms

      (statistics, ...)

    • investigate theoretical properties

      (Algebra, Borsuk theorem, Christoffel symbols, Differentialtopology, Entropy, Functional analysis, ...)

Mathematical aspects of neural networks


Relevance of math in nn history

Relevance of math in NN-history

1985: Hopfield-networks for TSP

1969: Minsky/Papert

SVM

Backprop

1958: Rosenblatt

1943: McCulloch/Pitts

... mathematical theory and mathematical questions established for most classical models

Mathematical aspects of neural networks


Classical models

Classical models

Recurrent networks

Self organizing maps

Feed-forward networks

Cottrell, Letremy: Analyzing qualitative variables using the Kohonen algorithm

Claussen, Villmann: Magnification control in winner relaxing neural gas

Archambeau, Lee, Verleysen: On convergence problems of the EM algorithm for finite Gaussian mixtures

Schiller, Steil: On the weight dynamics of recurrent learning

Jain, Wysotzky: A neural graph algorithm based on local invariants

Jain, Wysotzky: An associative memory for the automorphism group of structures

Jianyu Li, Siwei Luo, Yingjian Qi: Approximation of functions by adaptively growing radial basis function neural networks

Mathematical aspects of neural networks


Feed forward networks

perceptron

MLP

RBF-networks

SVM

Feed-forward networks

... for classification and function approximation

1: Universal approximation ability –

Does there exist an appropriate architecture for every function which is to be approximated?

1: architecture

2: optimization

2: Complexity of training –

How do good error minimization algorithms look like and what is their complexity?

3: test

3: Learnability –

Can generalization to previously unseen examples be guaranteed?

x

y =

y0

Mathematical aspects of neural networks


1 universal approximation

1: Universal approximation

feed-forward networks

Perceptron solves only linearly separable problems.

[1969] Minsky/Papert

MLPs constitute universal approximators

[1989] Hornik/Stinchcombe/White, [1993] Leshno et al.

RBFs constitute universal approximators

[1990] Girosi/Poggio, [1993] Park/Sandberg

SVMs constitute universal approximators

[2001] Steinwart, [2003] Hammer/Gersmann

Mathematical aspects of neural networks


1 universal approximation1

1: Universal approximation

number of neurons, rates of convergence, ... ?

feed-forward networks

[1992] Sontag: n neurons are sufficient to interpolate n points for output dimension 1

[1992] Jones, [1993] Barron: convergence of order 1/n for appropriate functions

[1995] Girosi, [1997] Gurvits/Koiran, [1997] Kurkova/Kainen/Krei- novich, [1998] Kurkova/Savicky/Hlavackova, [2002] Lavretsky, ...

Jianyu Li, Siwei Luo, Yingjian Qi:

determine number of neurons during training

... this session:

Mathematical aspects of neural networks


2 complexity of training

my favorite algo

2: Complexity of training

abcdefghijklmnop....

feed-forward networks

Perceptron training is polynomial  Karmakar algorithm

... investigation of the perceptron algorithm

SVM training is polynomial  quadratic optimization of the dual problem

... properties of online solutions, decomposition schemes

MLP training is NP-hard – [1988] Blum/Rivest, [1990] Judd

... design of alternative learning algorithms, investigation of more realistic scenarios

Mathematical aspects of neural networks


2 complexity of training1

2: Complexity of training

MLP training is NP-hard

[1988] Blum/Rivest: 3-node network, [1990] Judd: networks which encodes SAT

the loading problem

[1995] Pinter, [1998] Hammer: more than one layer and two neurons, neurons related to training size, varying number of hidden neurons

too small

[1996] Sima, [1997] Jones, [1997] Vu, [1998] Hammer: sigmoidal settings

too specific

[1995] Hoeffgen/Simon/VanHorn, [2002] Bartlett/Ben-David, [2002] Sima, [2003] DasGupta/Hammer: approximate optimization is NP-hard in several settings, even for one neuron

activation function should be sigmoidal

approximate settings

[2000] Ben-David/Simon: training a neuron optimum with large margin is polynomial

... any other idea?

Mathematical aspects of neural networks


3 learnability

3: Learnability

feed-forward networks

statistical physics

... measures the mean effects of online algorithms  Opper, ...

identification in the limit

... a regularity can be learned exactly in the limit  Gold, ...

... at least one good learning-algorithm exists, thereby: valid generalization with high probability and guaranteed bounds

 [1984] Valiant

PAC learnability

... the empirical error converges uniformly to the real error with high probability, guaranteed bounds can be derived

 [1971] Vapnik/Chervonenkis

UCEM property

Mathematical aspects of neural networks


3 learnability1

3: Learnability

Statistical learning theory (in three lines ...):

PAC↔ finite covering number

UCEM ↔ appropriate empirical covering

distribution independent learnable ↔ finite VC dimension

feed-forward networks

[1989] Baum/Haussler: link NNs to VC-theory

and estimate VC-dim of perceptron-networks

[1994] Maass: lower bound for perceptron networks

[1992] Sontag: several ugly examples

[1993] Macintyre/Sontag: VC of sigmoidal networks is finite

[1995] Karpinski/Macintyre: estimate of VC of sigmoidal networks

[1997] Koiran/Sontag: lower bound of VC of sigmoidal networks

[1992] Haussler, [...] Bartlett et al., [2002] Schmitt, ...

VC dim of SVM scales with the margin

 ESANN’03

luckiness framework for structural risk minimization

Mathematical aspects of neural networks


Feed forward networks1

Feed-forward networks

positive results

ongoing work

formalized

Universal approximation ability

Complexity of training

Learnability

Mathematical aspects of neural networks


Recurrent networks

Recurrent networks

... tasks:

NARX/TDNN

Elman

Hopfield

local/global

discrete/co

1: Approximation ability and capacity

sequence prediction,

sequence transduction,

sequence generation,

associative memory,

optimization,

binding and grouping,

computation

  • on a finite time horizon

  • as associative memory

  • as operators on functions

  • as computation device

  • as dynamic systems

2: Complexity of training/design of training algorithms

  • error optimization – Hebbian learning – energy function – stability constraints

  • complexity – numeric – dynamic properties – potential

3: Learnability

 ESANN’02

  • ...

Mathematical aspects of neural networks


2 training

2: Training

recurrent networks

... this session:

Long term dependencies

[1994] Bengio/Simard/Frasconi

Schiller/Steil

 design of learning algorithm

mathematic investigation of this algorithm

RTRL, BPTT, LSTM, EKF, EM-approaches, recirculation, ...

[2000] Atiyah/Parlos: unification and one new approach

Mathematical aspects of neural networks


2 training1

2: Training

recurrent networks

... this session:

Stability

 guarantees for global or local stability

Schiller/Steil

via linear matrix inequalities: [1997,2000] Suykens/Vandewalle, [1999-2002] Steil et al., [2002] Liao/Chen/Sanchez, ...

local/global stability and convergence rates for fully connected RNNs : [2001] Wersing/Beyn/Ritter, [2002] Chen/Lu/Amari, [2002] Chen, [2002] Peng/ Qia/Xu, ...

Mathematical aspects of neural networks


2 training2

2: Training

recurrent networks

... this session:

Training based on stable states

[1997] Lee: storing sequences in Hopfield-type networks

Schiller/Steil

Jain/Wysotzki

[2002] Welling/Hinton: mean-field Boltzmann machines

graph isomorphisms

[2002] Weng/Steil: training CLM

[2001] Li/Lee: invariant matching

Jain/Wysotzki

[2002] Dang/Xu, [2002] Talavan/Yanez: TSP

automorphism group of structures

[2002] DiBlas/Jagota/Hughuy: graph coloring

Mathematical aspects of neural networks


Recurrent networks1

Recurrent networks

positive results

ongoing work

formalized

Universal approximation ability

Complexity of training

Learnability

Mathematical aspects of neural networks


Self organizing maps

Self-organizing maps

faithful data representation

ICA/PCA

VQ

SOM

NG

statistical approaches

1: Training algorithms, convergence –

How can reasonable training algorithms be designed? What is the objective, cost function? Convergence of the algorithm to desired states?

1: training

2: Topology preservation –

Does the topology of the representation match the underlying data topology?

2: data mining

3: Distribution representation –

Is important information preserved? What is the magnification?

 ESANN

Mathematical aspects of neural networks


1 training algorithms

1: Training algorithms

self-organizing maps

... this session:

iterative updates  e.g. Kushner/Clark or Ljung batch updates  e.g. Geoffrey/Hinton

Archambeau/Lee/ Verleysen

VQ: ok

NG: [1993,1994] Martinetz et al.: ok

investigation of convergence problems of the EM algorithm

SOM: [1992] Erwin et al.: no

[1999] Heskes: but almost

Convergence of SOM: yes in dim one/two, otherwise difficulties Cottrell, Der, Erwin, Flanagan, Fort, Herrmann, Lin, Pages, Ritter, Sadeghi, Obermayer, ...

Mathematical aspects of neural networks


2 topology preservation

2: Topology preservation

self-organizing maps

VQ

NG

SOM

[1992] Bauer/Pawelzik: Topographic product

[1997] Villmann et al.: Topographic function

[1992] Ritter et al., [1993] Heskes, [1994] Der/Herrmann, [1996] Bauer et al., [1999] Der/Herrmann/Villmann: mathematical investigation of mismatching states

[1997] Bauer/Villmann, [1999] Ritter: alternative or adaptive lattices

Mathematical aspects of neural networks


3 distribution preservation

3: Distribution preservation

self-organizing maps

... this session:

Claussen/Villmann

Thomas will fill this area ...

magnification control for winner relaxing neural gas

Mathematical aspects of neural networks


4 recent developments

4: Recent developments

self-organizing maps

... this session:

extension of SOM to general domains:

Cottrell/Letremy

[2001] Kaski/Sinkkonen, [2002] Hammer/Strickert/Villmann: SOM/NG with adaptive metric

SOM for contingency analysis

[2002] Hagenbuchner/Sperduti/Tsoi, [2002] Voegtlin, [2003] Hammer/Micheli/Sperduti: SOM for sequences and structures

[2001] Kohonen: SOM for discrete objects

Mathematical aspects of neural networks


Self organizing maps1

Self-organizing maps

formalized

Convergence

Topology preservation

Distribution preservation

Mathematical aspects of neural networks


Finale

Finale

Theorem: You need at least two neurons to follow this talk.

Proof:By contradiction. Assume you had only one neuron.

Then you couldn’t understand the following...

If neither Thomas nor Barbara drink beer, the idea for the special session will be good.

If Thomas drinks beer and Barbara does not drink beer, the idea for the session will not be good.

If Barbara drinks beer and Thomas does not, the idea for the special session will not be good.

If both, Barbara and Thomas drink beer, the idea for the special session will be good.

...because it includes XOR, not solvable with one neuron.

  • you couldn’t follow the last proof in this talk.

  • since the last proof is deeply connected to the other 66 slides of the talk, you couldn’t follow the talk.

Mathematical aspects of neural networks


Finale1

Finale

  • need math to

    • develop and present algorithms

    • investigate applicability, evaluate algorithms

    • investigate theoretical properties

  • but

    • often limited to simple questions

      (... how many neurons are sufficient?)

    • possibly not applicable

      (... who drank beer?)

    • does not fit in all details

      (... we don’t have 66 slides.)

Mathematical aspects of neural networks


Tutorial mathematical aspects of neural networks

Mathematical aspects of neural networks


1 approximation ability

1: Approximation ability

recurrent networks

Partially recurrent networks constitute universal approximators

[1992] Sontag, [1993] Funahashi/Nakamura, [02] Back/Chen

They show rich dynamic behavior

[1991] Wang, [2002] Tino et al., [...] Pasemann, Haschke, ...

They include automata, Turing machines, non-uniform circuits

Omlin, Giles, Carrasco, Forcada, Siegelmann, Sontag, Kilian, ...

Hopfield networks can minimize polynomials, number of stable patterns can be estimated, various extensions

Mathematical aspects of neural networks


3 learnability2

3: Learnability

recurrent networks

[2003] Hammer/Tino: finite for small weights

VC-dim of RNNs

[1997] Koiran/Sontag: in the general setting, VC depends on the maximum length of input sequences

covering number or entropy number?

[1999, 2001] Hammer: one can achieve distribution dependent or posterior bounds (which might be very bad ...)

[1993] Nobel/Dembo: finite VC dim and finite mixing coefficients are sufficient

non i.i.d. data?

[2001] Vidyasagar: ... is working on nice alternatives and generalizations

Mathematical aspects of neural networks


  • Login