1 / 27

Dynamics of Learning VQ and Neural Gas

Dynamics of Learning VQ and Neural Gas. Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration with Barbara Hammer (Clausthal), Anarta Ghosh (Groningen). Outline. Vector Quantization (VQ) Analysis of VQ Dynamics

Download Presentation

Dynamics of Learning VQ and Neural Gas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamics of Learning VQand Neural Gas Aree Witoelar, Michael Biehl Mathematics and Computing Science University of Groningen, Netherlands in collaboration with Barbara Hammer (Clausthal), Anarta Ghosh (Groningen)

  2. Outline • Vector Quantization (VQ) • Analysis of VQ Dynamics • Learning Vector Quantization (LVQ) • Summary Dagstuhl Seminar, 25.03.2007

  3. Assign data ξμ to nearestprototype vectorwj(by a distance measure, e.g. Euclidean) Find optimal set W for lowest quantization error grouping data into clusters e.g. for classification distance to nearest prototype data Vector Quantization • Objective: • representation of (many) data with (few) prototype vectors Dagstuhl Seminar, 25.03.2007

  4. • present a single example • move the winner even closertowards the example • identify the closest prototype, i.ethe so-called winner • prototypes at areas with high • density of data Example: Winner Takes All (WTA) • initialize K prototype vectors • stochastic gradient descentwith • respect to a cost function Dagstuhl Seminar, 25.03.2007

  5. Winner Takes All “winner takes most”: update according to “rank”e.g. Neural Gas less sensitive to initialization? sensitive to initialization Problems Dagstuhl Seminar, 25.03.2007

  6. (L)VQ algorithms • intuitive • fast, powerful algorithms • flexible • limited theoretical background w.r.t. convergence speed, • robustness to initial conditions, etc. • Analysis of VQ Dynamics • exact mathematical description in very high dimensions • study of typical learning behavior Dagstuhl Seminar, 25.03.2007

  7. (p-) ℓ (p-) B- B+ (p+) (p+) separable in projection to (B+ , B-) plane not separable on other planes Model: two Gaussian clusters of high dimensional data Random vectors ξ∈ ℝN according to classes: σ = {+1,-1} prior prob.: p+, p- p+ + p- = 1 cluster centers: B+, B- ∈ℝN variance: υ+, υ- separation ℓ only separable in 2 dimensions  simple model, but not trivial Dagstuhl Seminar, 25.03.2007

  8. move prototype towards currentdata Online learning sequence of independent random data ws∈ ℝN update of prototype vector learning rate, step size strength, direction of update etc. prototypeclass data class fs[…] describes the algorithm used “winner” Dagstuhl Seminar, 25.03.2007

  9. 2. Derive recursion relations of the quantities for new input data random vector ξμ enters as projections 1. Define few characteristic quantities of the system projections tocluster centers length and overlapof prototypes 3. Calculate average recursions Dagstuhl Seminar, 25.03.2007

  10. average over examples • the projections • become correlated Gaussian quantities completely specified in terms of first and second moments: • characteristic quantities • self average w.r.t. random sequence of data (fluctuations vanish) μ : discrete (1,2,…,P) t : continuous • define continuouslearningtime In the thermodynamic limit N∞ ... Dagstuhl Seminar, 25.03.2007

  11. 4. Deriveordinary differential equations • 5. Solve for Rsσ(t), Qst(t) • dynamics/asymptotic behavior (t  ∞) • quantization/generalization error • sensitivity to initial conditions, learning rates, structure of data Dagstuhl Seminar, 25.03.2007

  12. R1+ R2- ws winner R2+ R1- Q11 Q22 E(W) Q12 t ResultsVQ2 prototypes characteristic quantities Numerical integration of the ODEs (ws(0)≈0p+=0.6, ℓ=1.0, υ+=1.5, υ-=1.0,=0.01) quantization error Dagstuhl Seminar, 25.03.2007

  13. 3 prototypes RS- RS- ℓ p+ > p- B- B+ RS+ RS+ Two prototypes move to the stronger cluster 2 prototypes Projections of prototypes on the B+,B- plane at t=50 Dagstuhl Seminar, 25.03.2007

  14. λi=2; λf=10-2 RS- quantization error E(W) t t=50 t=0 RS+ Neural Gas: a winner take most algorithm3 prototypes λ(t) large initially, decreased over time λ(t)0: identical to WTA update strength decreases exponentially by rank Dagstuhl Seminar, 25.03.2007

  15. t=0 • WTA: • (eventually) reaches minimum E(W) • depends on initialization: possible large learning time ∇HVQ≈0 “plateau” • Neural Gas: • more robust w.r.t. initialization Sensitivity to initialization WTA Neural Gas RS- RS- RS+ RS+ at t=50 at t=50 E(W) t Dagstuhl Seminar, 25.03.2007

  16. Assign data {ξ,σ};ξ ∈ ℝNto nearestprototype vector(distance measure, e.g. Euclidean)       Find optimal set W for lowest generalization error  misclassified by nearest prototype Learning Vector Quantization (LVQ) • Objective: • classification of data using prototype vectors Dagstuhl Seminar, 25.03.2007

  17. c={+1, -1} c={+1,+1,-1} c={+1,-1,-1} RS- RS- RS- ±1 ws winner three prototypes which class to add the 3rd prototype? two prototypes RS+ RS+ RS+ LVQ1 update winner towards/ away from data no cost function related to generalization error Dagstuhl Seminar, 25.03.2007

  18. Generalization error class misclassified data εg t p+=0.6, p-= 0.4 υ+=1.5, υ-=1.0 Dagstuhl Seminar, 25.03.2007

  19. (p-) d ℓ B- B+ (p+>p- ) optimal with K=3 more prototypes  better approximation to optimal decision boundary Optimal decision boundary (hyper)plane where equal variance (υ+=υ-): linear decision boundary unequal varianceυ+>υ- K=2 Dagstuhl Seminar, 25.03.2007

  20. c={+1,-1,-1} εg(t∞) εg p+ • Optimal: K=3 equal to K=2 • LVQ1: K=3 worse Asymptotic εg υ+ >υ- (υ+=0.81, υ-=0.25) c={+1,+1,-1} εg(t∞) p+ • Optimal: K=3 better • LVQ1: K=3 better • more prototypes not always better for LVQ1 • best: more prototypes on the class with the larger variance Dagstuhl Seminar, 25.03.2007

  21. Outlook • study different algorithms e.g. LVQ+/-, LFM, RSLVQ • more complex models • multi-prototype, multi-class problems Summary • dynamics of (Learning) Vector Quantization for high dimensional data • Neural Gas: more robust w.r.t. initialization than WTA • LVQ1: more prototypes not always better Reference Dynamics and Generalization Ability of LVQ Algorithms M. Biehl, A. Ghosh, and B. Hammer Journal of Machine Learning Research (8): 323-360 (2007) http://jmlr.csail.mit.edu/papers/v8/biehl07a.html Dagstuhl Seminar, 25.03.2007

  22. Questions ?

  23. Dagstuhl Seminar, 25.03.2007

  24. Central Limit Theorem • Let x1, x2,…, xN be independent random numbers from arbitrary probability distribution with mean and finite variance • The distribution of the average of xj approaches a normal distribution as N becomes large. Example: non-normal distribution p(xj) N=1 Distribution of average of xj: N=2 N=5 N=50 Dagstuhl Seminar, 25.03.2007

  25. Self Averaging Fluctuations decreases with larger degree of freedom N At N∞, fluctuations vanish (variance becomes zero) Monte Carlo simulations over 100 independent runs Dagstuhl Seminar, 25.03.2007

  26. “LVQ +/-” p+ >> p- : strong repulsion by stronger class to overcome divergence: e.g. early stopping (difficult in practice) t stop at εg(t)=εg,min strongly divergent! εg(t) t update correct and incorrect winners ds = min {dk} with cs = σμ dt = min {dk} with ct≠σμ Dagstuhl Seminar, 25.03.2007

  27. c={+1,+1,-1} υ+ = υ- =1.0 υ+ = 0.81, υ- =0.25 p+ p+ LVQ1 outperforms LVQ+/- with early stopping LVQ+/- with early stopping outperforms LVQ1 in a certain p+ interval Comparison LVQ1 and LVQ +/- LVQ+/-  performance depends on initial conditions Dagstuhl Seminar, 25.03.2007

More Related