Unsupervised recurrent networks
Download
1 / 30

Unsupervised recurrent networks - PowerPoint PPT Presentation


  • 127 Views
  • Uploaded on

Unsupervised recurrent networks. Barbara Hammer, Institute of Informatics, Clausthal University of Technology. Clausthal - Zellerfeld. Brocken. Prototype-based clustering …. Prototype based clustering. data contained in a real-vector space

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Unsupervised recurrent networks' - bob


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Unsupervised recurrent networks

Unsupervised recurrent networks

Barbara Hammer, Institute of Informatics,

Clausthal University of Technology


Clausthal

- Zellerfeld

Brocken



Prototype based clustering1
Prototype based clustering

  • data contained in a real-vector space

  • prototypes characterized by locations in the data space

  • clustering induced by the receptive fields based on the euclidean metric


Vector quantization
Vector quantization

  • init prototypes

  • repeat

    • present a data point

    • adapt the winner into the direction of the data point


Cost function
Cost function

  • minimizes the cost function

  • online: stochastic gradient descent 


Neighborhood cooperation

wj

wj

Neighborhood cooperation

Self-Organizing Map: regular lattice

Neural gas: data optimum topology

j=(j1,j2)




Old models1
Old models

Temporal Kohonen Map:

leaky integration

x1,x2,x3,x4,…,xt, …

d(xt,wi) = |xt-wi| + α·d(xt-1,wi)

training: wi  xt

Recurrent SOM:

d(xt,wi) = |yt| where yt = (xt-wi) + α·yt-1

training: wi  yt



Merge neural gas som
Merge neural gas/SOM

explicit temporal context

xt-1,xt-2,…,x0

xt,xt-1,xt-2,…,x0

xt

(w,c)

|xt – w|2

|Ct - c|2

merge-context:: content of the winner

Ct

training:

w  xt

c  Ct


Merge neural gas som1

(wj,cj) in ℝnxn

Merge neural gas/SOM

  • explicit context, global recurrence

  • wj : represents entry xt

  • cj: repesents the context which equals the winner content of the last time step

  • distance: d(xt,wj) = α·|xt-wj| + (1-α)·|Ct-cj|

    where Ct = γ·wI(t-1) + (1-γ)·cI(t-1), I(t-1) winner in step t-1 (merge)

  • trainingwj xt, cj Ct


Merge neural gas som2
Merge neural gas/SOM

Example: 42  33 33 34

C1 = (42 + 50)/2 = 46

C2= (33+45)/2 = 39

C3= (33+38)/2 = 35.5


Merge neural gas som3
Merge neural gas/SOM

  • speaker identification, japanese vovel ‘ae’ [UCI-KDD archive]

  • 9 speakers, 30 articulations each

time

12-dim. cepstrum

MNG, 150 neurons: 2.7% test error

MNG, 1000 neurons: 1.6% test error

rule based: 5.9%, HMM: 3.8%


Merge neural gas som4
Merge neural gas/SOM

Experiment:

  • classification of donor sites for C.elegans

  • 5 settings with 10000 training data, 10000 test data, 50 nucleotides TCGA embedded in 3 dim, 38% donor [Sonnenburg, Rätsch et al.]

  • MNG with posterior labeling

  • 512 neurons, γ=0.25, η=0.075, α: 0.999  [0.4,0.7]

  • 14.06%±0.66% training error, 14.26%±0.39% test error

  • sparse representation: 512 · 6 dim


Merge neural gas som5
Merge neural gas/SOM

Theorem – context representation:

Assume

  • a map with merge context is given (no neighborhood)

  • a sequence x0, x1, x2, x3,… is given

  • enough neurons are available

    Then:

  • the optimum weight/context pair for xt is

    w = xt, c = ∑i=0..t-1 γ(1-γ)t-i-1·xi

  • Hebbian training converges to this setting as a stable fixed point

  • Compare to TKM:

    • optimum weights are w = ∑i=0..t (1-α)i·xt-i / ∑i=0..t (1-α)i

    • but: no fixed point for TKM

  • MSOM is the correct implementation of TKM



More models1
More models

what is the correct

temporal context ?

xt,xt-1,xt-2,…,x0

(w,c)

|xt – w|2

xt

|Ct - c|2

Context:

RSOM/TKM – neuron itself

MSOM – winner content

SOMSD – winner index

RecSOM – all activations

Ct

training:

w  xt

c  Ct

xt-1,xt-2,…,x0


More models2
More models

* for normalised WTA context


More models3
More models

Experiment:

  • Mackey-Glass time series

  • 100 neurons

  • different lattices

  • different contexts

  • evaluation by the temporal quantization error:

average(mean activity k steps into the past

- observed activity k steps into the past)2


More models4
More models

SOM

quantization error

RSOM

NG

RecSOM

SOMSD

HSOMSD

MNG

now

past



So what1
So what?

  • inspection / clustering of high-dimensional events within their temporal context could be possible

  • strong regularization as for standard SOM / NG

  • possible training methods for reservoirs

  • some theory

  • some examples

  • no supervision

  • the representation of context is critical and not clear at all

  • training is critical and not clear at all


ad