1 / 42

An adaptive modular approach to the mining of sensor network data

An adaptive modular approach to the mining of sensor network data. G. Bontempi, Y. Le Borgne (1) {gbonte,yleborgn}@ulb.ac.be Machine Learning Group Université Libre de Bruxelles – Belgium

kalli
Download Presentation

An adaptive modular approach to the mining of sensor network data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An adaptive modular approach to the mining of sensor network data G. Bontempi, Y. Le Borgne (1) {gbonte,yleborgn}@ulb.ac.be Machine Learning Group Université Libre de Bruxelles – Belgium (1) Supported by the COMP2SYS project, sponsored by the HRM program of the European Community (MEST-CT-2004-505079)

  2. Outline • Wireless sensor networks: Overview • Machine learning in WSN • An adaptive two-layer architecture • Simulation and results • Conclusion and perspective Y. Le Borgne

  3. Sensor networks : Overview • Goal : Allow for a sensing task over an environment • Desiderata for the nodes: • Autonomous power • Wireless communication • Computing capabilities Y. Le Borgne

  4. Smart dust project • Smart dust: Get mote size down to 1mm³ • Berkeley - Deputy dust (2001) • 6mm³ • Solar powered • Acceleration and light sensors • Optical communication • Low cost in large quantities Y. Le Borgne

  5. Current available sensors • Crossbow : Mica / Mica dot • uProc: 4Mhz, 8 bit Atmel RISCRadio: 40 kbit 900/450/300 MHz or250 kbit 2.5GHz (MicaZ 802.15.4)Memory: 4K RAM / 128 K Program Flash / 512 K Data FlashPower: 2 x AA or coin cell • Intel :iMote • uProc: 12Mhz, 16 bit ARMRadio: BluetoothMemory: 64K SRAM / 512 K Data FlashPower: 2 x AA • MoteIV : Telos • uProc: 8Mhz, 16 bit TI RISCRadio: 250 kbit 2.5GHz (802.15.4)Memory: 2 K RAM / 60 K Program Flash / 512 K Data FlashPower: 2 x AA Y. Le Borgne

  6. Applications • Wildfire monitoring • Ecosystem monitoring • Earthquake monitoring • Precision agriculture • Object tracking • Intrusion detection • … Y. Le Borgne

  7. Challenges for… • Electronics • Networking • Systems • Data bases • Statistics • Signal processing • … Y. Le Borgne

  8. Machine learning and WSN • Local scale • Spatio-temporal correlations • Local predictive model identification • Can be used to: • Reduce sensor communication activity • Predict values for malfunctioning sensors Y. Le Borgne

  9. Machine learning and WSN • Global scale • The network as a a whole can achieve high level tasks • Sensor network <-> Image Y. Le Borgne

  10. Supervised learning and WSN • Classification (Traffic type classification) • Prediction (Pollution forecast) • Regression (Wave intensity, population density) Y. Le Borgne

  11. A supervised learning scenario • Ѕ: Network of S sensors • x(t)={s1(t),s2(t),…sS(t)} snapshot at time t • y(t)=f(x(t))+ε(t) the value associated to S at time t (ε standing for noise) • Let DN be a set of N observations (x(t),y(t)) • Goal : Find a model that predicts y for any new x Y. Le Borgne

  12. Centralized approach High transmission overhead Y. Le Borgne

  13. Two-layer approach • Use of compression • Reduce transmission overhead • Spatial correlation induces low loss in compression • Reduction of learning problem dimensionality Y. Le Borgne

  14. Two-layer adaptive approach • PAST : Online compression • Lazy learning : Online learning Y. Le Borgne

  15. Compression : PCA • PCA: • Transform the set of n input variables , into a set of m variables , m<n. • Linear transformation : , • Variance preserving maximization • Solution : • m first eigenvectors of x correlation matrix, or • Minimization of Y. Le Borgne

  16. PAST – Recursive PCA • Projection approximation subspace tracking [YAN95] • Online formulation: • Low memory requirement and computational complexity : O(n*m)+O(m²) Y. Le Borgne

  17. PAST Algorithm Recursive formulation: [HYV01] Y. Le Borgne

  18. Learning algorithm • Lazy learning: K-NN approach • Storage of observation set: • When a query q is asked, takes the k nearest neighbours to q: • Builds a local linear model: , such that • Computes the output at by applying Y. Le Borgne

  19. How many neighbours? • y=sin(x)+e • e : Gaussian noise with σ=0.1 • What is the y value at x=1.5? Y. Le Borgne

  20. How many neighbours? • K=2 : Overfitting Y. Le Borgne

  21. How many neighbours? • K=2 : Overfitting • K=3 : Overfitting Y. Le Borgne

  22. How many neighbours? • K=2: Overfitting • K=3: Overfitting • K=4: Overfitting Y. Le Borgne

  23. How many neighbours? • K=2: Overfitting • K=3: Overfitting • K=4: Overfitting • K=5: Good Y. Le Borgne

  24. How many neighbours? • K=2: Overfitting • K=3: Overfitting • K=4: Overfitting • K=5: Good • K=6: Underfitting Y. Le Borgne

  25. Automatic model selection([BIR99],[BON99],[BON00]) • Starting with a low k, local models are identified • Their quality is assessed by a leave one out procedure • The best model(s) are kept for computing the prediction • Low computational cost • PRESS statistics (ALL74) • Recursive least squares ([GOO84]) Y. Le Borgne

  26. Advantages of PAST and lazy • No assumption on the process underlying data • On-line learning capability • Adaptive with non-stationarity • Low computational and memory costs Y. Le Borgne

  27. Simulation • Modeling wave propagation phenomenon • Helmholtz equation: • k is the wave number • 2372 sensors • 30 k values between 1 and 146; 50 time instants • 1500 Observations • Output k is noisy Y. Le Borgne

  28. Test procedure • Prediction error measurement • Normalized Mean Square Error (NMSE) • 10-fold cross-validation (1350/150) Example of learning curve: Y. Le Borgne

  29. Experiment 1 • Centralized configuration • Comparison PCA / PAST for 1 to 16 first principal components Y. Le Borgne

  30. Results • Prediction accuracy similar if number of principal components sufficient Y. Le Borgne

  31. Clustering • The number of clusters involves a trade-off between • The routing costs between clusters and gateway • The final prediction accuracy • The robustness of the architecture Y. Le Borgne

  32. Experiment 2 • Partitioning into geographical clusters • P varies from P(2) to P(7) • 2 main components for each cluster • Ten-fold cross-validation – 1500 data Example of P(2) partitioning Y. Le Borgne

  33. Results • Comparison of P(2) (Top) and P(5) (bottom) error curves • As number of cluster increases: • Better accuracy • Faster convergence Y. Le Borgne

  34. Experiment 3 • Simulation: at each time instant • Probability 10% for a sensor failure • Probability 1% for a supernode failure • Recursive PCA and lazy learning deals efficiently with input space dimension variations • Robust with random sensor malfunctioning Y. Le Borgne

  35. Results • Comparison of P(2) (Top) and P(5) (bottom) error curves • The number of clusters increases the robustness Y. Le Borgne

  36. Experiment 4 • Time varying changes in sensor measures • 2700 time instants • Sensor response decreases linearly from a factor 1 to a factor 0.4 • A temporal window: • Only the last 1500 measures are kept Y. Le Borgne

  37. Results • Due to the concept drift, the fixed model (in black) becomes outdated • The lazy characteristic of the proposed architecture can deal with this drift very easily Y. Le Borgne

  38. Conclusion • Architecture: • Yielding good results compared to batch equivalent • Computationally efficient • Adaptive with appearing and disappearing units • Handling easily non-stationarity Y. Le Borgne

  39. Future work • Extensions of tests to real-world data • Improvement of clustering strategy • Taking costs (routing/accuracy) into consideration • Making use of ad-hoc feature of the network • Test of other compression procedures • Robust PCA • ICA Y. Le Borgne

  40. References Smart Dust project: http://www-bsac.eecs.berkeley.edu/archive /users/warneke-brett/SmartDust/ Crossbow: http://www.xbow.com/ [BON99] G.Bontempi. Local Techniques for Modeling, Prediction and Control. PhD Thesis, IRIDIA- Université Libre de Bruxelles, 1999. [YAN95] B. Yang. Projection Approximation Subspace Tracking, IEEE Transactions on Signal Processing, 43(1):95-107,1995. [ALL74] D.M. Allen. 1974. The relationship between variable and data augmentation and a method of prediction. Technometrics, 16, 125-127 [GOO84] G.C. Goodwin & K.S. Sin. 1984. Adaptive filtering Prediction and Control. Prentice-Hall. [HYV01] Independent Component Analysis. A. Hyvarinen, J. Karhunen, E. Oja. 2001. Y. Le Borgne

  41. References on lazy learning [BIR99] M. Birattari, G. Bontempi, and H. Bersini. Lazy learning meets the recursive least square algorithm. In M. S. Kearns, S.a. Solla, and D.a. Cohn, editors, NIPS 11, pages 375-381, Cambridge,1999, MIT Press. [BON99] G. Bontempi, M.Birattari, and H.Bersini. Local learning for iterated time-series prediction. In I. Bratko and S. Dzeroski, editors, Machine Learning : Proceedings of the 16th International Conference, pages 32-38, San Francisco, CA, 1999. Morgan Kaufmann Publishers. [BON00] G. Bontempi, M.Birattari, and H. Bersini. A model selection approach for local learning. Artificial Intelligence Communications, 121(1), 2000. Y. Le Borgne

  42. Thanks for your attention! Y. Le Borgne

More Related