1 / 45

Integrated Learning in Multi-net Systems

Integrated Learning in Multi-net Systems. Neural Computing Group Department of Computing University of Surrey. Matthew Casey. 6 th February 2004. http://www.computing.surrey.ac.uk/personal/st/M.Casey/. Introduction. In-situ learning in multi-net systems Classification

effie
Download Presentation

Integrated Learning in Multi-net Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrated Learningin Multi-net Systems Neural Computing Group Department of Computing University of Surrey Matthew Casey 6th February 2004 http://www.computing.surrey.ac.uk/personal/st/M.Casey/

  2. Introduction • In-situ learning in multi-net systems • Classification • Parallel combination of networks: ensemble • Sequential combination of networks: modular • Simulation • Parallel and sequential combination of networks • Quantification and addition • Formal framework and algorithm for multi-net systems

  3. Introduction • Learning: • “process which leads to the modification of behaviour”1 • Biological motivation • Hebb’s neurophysiological postulate2 • Learning across cell assemblies: neural integration • Functional specialism: analogy to multi-net systems • Theoretical motivation • Generalisation improvements with multi-net systems • Ensemble and modular • Learning in collaboration with modularisation

  4. Single-net Systems • Systems of one or more artificial neurons combined together in a single network • Parallel distributed processing3 • Learning to generalise • Learning algorithms • Supervised: delta4,5,6, backpropagation7,8,9 • Unsupervised: Hebbian2, SOM10

  5. 1 Combination of Linear Decision Boundaries 1 -1 -1 1 1 -1 -1 1 1 -1 -1 1 True (1) False (-1) -1 -1 1 Single-nets as Multi-nets? XOR

  6. From Single-nets to Multi-nets • Multi-net systems appear to be a development of the parallel processing paradigm • Can multi-net systems improve generalisation? • Modularisation with simpler networks? • Limited theoretical and empirical evidence • Generalisation: • Balance prior knowledge and training • VC Dimension11,12,13 • Bias/variance dilemma14

  7. Multi-net Systems:Ensemble or Modular? • Ensemble systems: • Parallel combination • Each network performs the same task • Simple ensemble • AdaBoost15 • Modular systems: • Each network performs a different (sub-)task • Mixture-of-experts16 (top-down parallel competitive) • Min-max17 (bottom-up static parallel/sequential)

  8. Categorising Multi-net Systems • Sharkey’s18,19 combination strategies: • Parallel: co-operative or competitive top-down or bottom-up static or dynamic • Sequential • Supervisory • Component networks may be20: • Pre-trained (independent) • Incrementally trained (iterative) • In-situ trained (simultaneous)

  9. Multi-net Systems • Categorisation schemes appear not to support the generalisation of multi-net system properties beyond specific examples • Ensemble: bias, variance and diversity21 • Modular: bias and variance • What about measures such as the VC Dimension? • Some use of in-situ learning • ME and HME22 • Negative correlation learning23

  10. Research • Multi-net systems seem to offer a way in which generalisation performance and learning speed can be improved: • Yet limited theoretical and empirical evidence • Focus on parallel systems • Limited use of in-situ learning despite motivation • Existing results show improvement • Can the approach be generalised? • No general framework for multi-net systems • Difficult to generalise properties from categorisation

  11. Research • Explore in-situ learning in multi-net systems: • Parallel: in-situ learning in the simple ensemble • Sequential: combining networks with in-situ learning • Does in-situ learning provide improved generalisation? • Can we combine ‘simple’ networks to solve ‘complex’ problems: ‘superordinate’ systems with faster learning? • Propose a formal framework for multi-net systems • A method to describe the architecture and learning algorithm for a general multi-net system

  12. Multi-net System Framework • Previous work: • Framework for the co-operation of learning algorithms24 • Stochastic model25 • Importance Sampled Learning Ensembles (ISLE)26 • Focus on supervised learning and specific architectures • Jordan and Xu’s (1995) definition of HME27: • Generalisation of HME • Abstraction of architecture from algorithm • Theoretical results for convergence

  13. Multi-net System Framework • Propose a modification to Jordan and Xu’s definition of HME to provide a generalised multi-net system framework • HME combines the output of the expert networks through a weighting generated by a gating network • Replace the weighting by the (optional) operation of a network • Can be used for parallel, sequential and supervisory systems

  14. Example: HME

  15. Example: Framework

  16. Definition • A multi-net system consists of the ordered tree of depth r defined by the nodes , with the root of the tree associated with the output , such that:

  17. Multi-net System Framework • Learning algorithm operates by modifying the state of the system as defined by associated with each node • Includes: • Pre-training • In-situ training • Incremental training (through pre- or in-situ training) • Examples demonstrate how framework can be used to describe existing types of multi-net system • However does not rely upon categorisation schemes

  18. In-situ Learning • Evaluation of in-situ learning in multi-net systems • Explore parallel and sequential in-situ learning with definitions using the proposed framework • Simple learning ensemble (SLE) • Sequential learning modules (SLM) • Benchmark classification tasks28: • XOR • MONK’s problems (MONK 1, 2, 3)29 • Wisconsin Breast Cancer Database (WBCD)30 • Proben1 Thyroid (thyroid1 data set)31

  19. Simple Learning Ensemble • Ensembles: • Train each network individually • Parallel combination of network outputs: mean output • Pre-training: how can we understand or control the combined performance of the ensemble? • Incremental: AdaBoost15 • In-situ: negative correlation23 • In-situ learning: • Train in-situ and assess combined performance during training using early stopping • Generalisation loss early stopping criterion32

  20. Benchmark Results • Compare SLE results with: • Simple ensemble (SE): all networks pre-trained using early stopping • Single-net: MLP with backpropagation – with and without early stopping • For SLE and SE: • From 2 to 20 MLP networks • 100 trials per configuration: mean response

  21. Benchmark Results • XOR • SLE and SE equivalent training (no early stopping) • SLE uses less epochs to give equivalent responses • MONK’s problems/WBCD/Thyroid • SLE improves upon SE, SE improves upon single-net • SLE trains for longer: combined performance • Adding more networks gives better generalisation • More networks, more achieved desired performance • MONK 1/2 • SLE improves upon non-early stopping single-net

  22. Sequential Learning Modules • Sequential systems: • Can a combination of (simpler) networks give good generalisation and learning speed? • Typically pre-trained and for specific processing • Sequential in-situ learning: • How can in-situ learning be achieved with sequential networks: target error/output? • Use unsupervised networks • Last network has target output and hence can be supervised

  23. Benchmark Results • Compare SLM results with: • Single-net: single layer network with delta learning • Single-net: MLP with backpropagation – without early stopping • Networks: • Combine a SOM with a single layer network with delta learning • Varying map sizes of SOM used to see effect on classification performance • 100 trials per configuration: mean response

  24. Benchmark Results • XOR • Cannot be solved by SOM or single layer network • Can be solved by SLM with 3x3 map • Faster learning times than MLP with backpropagation • MONK’s problems/WBCD • SLM can learn classification of training examples • Generalisation: improves upon SE with early stopping • MONK 2/3 • SLM improves upon MLP without early stopping • MONK 3 improves upon SLE

  25. Benchmark Results • MONK 1/WBCD • Generalisation: similar, but slightly worse • Thyroid • Poor learning of training examples • Can SOM perform sufficient pre-processing for single layer network? • Results seem to depend upon problem type and map size

  26. In-situ Learning • In-situ learning in multi-net systems: • Can give better training and generalisation performance • Comparison with SE (early stopping) and single-net systems (with and without early stopping) • Computational effort? • Ensemble systems: • Effect of training times on bias, variance and diversity? • Sequential systems: • Encouraging empirical results: theoretical results? • Automatic classification of unsupervised clusters

  27. In-situ Learning

  28. In-situ Learning

  29. In-situ Learning and Simulation • Biological motivation for in-situ learning • Hebb’s ‘neural integration’ • Functional specialism in cognitive systems • Use in-situ learning in modular multi-net systems to simulate numerical abilities: • Quantification: subitization and counting • Addition: fact retrieval and ‘count all’ • Combine ME and SLM to allow abilities to ‘compete’

  30. Strategy Learning System

  31. Multi-net Simulation of Quantification • Single-net and multi-net systems • Trained on scenes consisting of ‘objects’ • Logarithmic probability model • Subitization SOM: • Ordering of numbers with compressive scheme • Limit: training data, object frequency and map size • Counting MLP with backpropagation: • Correct responses to training data • Conventional, stable non-conventional and nonstable

  32. Multi-net Simulation of Quantification • MNQ: • Subitization: SOM (pre-trained) • Single layer network with delta learning • Counting: MLP with backpropagation • Successfully learnt to quantify • To subitize or count • Decision based upon input • Subitization limit attributable to interaction of modules • Lower numbers subitized, higher numbers counted

  33. Multi-net Simulation ofAddition • Single-net and multi-net systems • Trained on scenes consisting of two sets of ‘objects’ • Equal probability model • Fact retrieval SOM: • Each addend associated with a map axis • Some overlap of problems • Commutative information used • ‘Count all’ MLP with backpropagation: • Correct responses to training data • Some relationship to observed human errors

  34. Multi-net Simulation ofAddition • MNA: • Fact retrieval: SOM • Single layer network with delta learning • ‘Count all’: MLP with backpropagation • Successfully learnt to add • To count or from facts • Decision based upon input • Predominant use of ‘count all’, rather than facts • Does demonstrate change in strategy

  35. In-situ Learning and Simulation • In-situ learning in SLS: • Simulate developmental progression • At least as capable as equivalent monolithic solutions • Competition between supervised and unsupervised learning paradigms • In-situ learning and simulation of the interaction between multiple abilities: • Interaction between different abilities: simulating psychological phenomena • Integrated learning

  36. Contribution • Multi-net systems: • Seem to offer empirical and theoretical improvement to generalisation • Properties under-explored • Framework for multi-net systems • Generalisation of multi-net systems beyond categorisation schemes • Foundation to explore multi-net properties • In-situ learning • Biological and theoretical motivation

  37. Contribution • Compared different multi-net techniques and the use of in-situ learning • Demonstrated that in-situ learning can lead to improved training and generalisation • SLE • Assessment of combined performance • SLM • Combining ‘simple’ networks to solve ‘complex’ tasks • ‘Superordinate’? • Also shows automatic classification using SOM

  38. Contribution • Integrated learning • Simulations using modular parallel and sequential networks • In-situ learning used to explore the interaction of modules during learning • Demonstrated simulation of abilities and observed psychological phenomena

  39. Future Work • Framework: • Modify learning algorithm – recursive using tree • Explore properties such as bias, variance, diversity and VC Dimension • In-situ learning: • Further comparison • SLE: does learning promote diversity? • SLM: expand and explore limitations • Simulations: • Explore further the effect of integrated learning

  40. Questions?

  41. References • Simpson, J.A. & Weiner, E.S.C. (Ed) (1989). Oxford English Dictionary, 2nd. Oxford, UK: Clarendon Press. • Hebb, D.O. (1949). The Organization of Behavior: A Neuropsychological Theory. New York: John Wiley & Sons. • McClelland, J.L. & Rumelhart, D.E. (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 2: Psychological and Biological Models. Cambridge, MA.: A Bradford Book, MIT Press. • Rosenblatt, F. (1958). The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain. Psychological Review, vol. 65(6), pp. 386-408. • Widrow, B. & Hoff, M.E.Jr. (1960). Adaptive Switching Circuits. IRE WESCON Convention Record, pp. 96-104. • Minsky, M.L. & Papert, S. (1988). Perceptrons: An Introduction to Computational Geometry, Expanded Ed. Cambridge, MA.: MIT Press. • Werbos, P.J. (1974). Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Unpublished doctoral thesis. Cambridge, MA.: Harvard University.

  42. References • Rumelhart, D.E., Hinton, G.E. & Williams, R.J. (1986). Learning Internal Representations by Error Propagation. In Rumelhart, D. E. & McClelland, J. L. (Ed), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1: Foundations, pp. 318-362. Cambridge, MA.: MIT Press. • Elman, J.L. (1990). Finding Structure in Time. Cognitive Science, vol. 14, pp. 179-211. • Kohonen, T. (1982). Self-Organized Formation of Topologically Correct Feature Maps. Biological Cybernetics, vol. 43, pp. 59-69. • Vapnik, V.N. & Chervonenkis, A.Ya. (1971). On the Uniform Convergence of Relative Frequencies of Events to their Probabilities. Theory of Probability and Its Applications, vol. XVI(2), pp. 264-280. • Baum, E.B. & Haussler, D. (1989). What Size Net Gives Valid Generalisation? Neural Computation, vol. 1(1), pp. 151-160. • Koiran, P. & Sontag, E.D. (1997). Neural Networks With Quadratic VC Dimension. Journal of Computer and System Sciences, vol. 54(1), pp. 190-198. • Geman, S., Bienenstock, E. & Doursat, R. (1992). Neural Networks and the Bias/Variance Dilemma. Neural Computation, vol. 4(1), pp. 1-58.

  43. References • Freund, Y. & Schapire, R.E. (1996). Experiments with a New Boosting Algorithm. Machine Learning: Proceedings of the 13th International Conference, pp. 148-156. Morgan Kaufmann. • Jacobs, R.A., Jordan, M.I. & Barto, A.G. (1991). Task Decomposition through Competition in a Modular Connectionist Architecture: The What and Where Vision Tasks. Cognitive Science, vol. 15, pp. 219-250. • Lu, B. & Ito, M. (1999). Task Decomposition and Module Combination Based on Class Relations: A Modular Neural Network for Pattern Classification. IEEE Transactions on Neural Networks, vol. 10(5), pp. 1244-1256. • Sharkey, A.J.C. (1999). Multi-Net Systems. In Sharkey, A. J. C. (Ed), Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems, pp. 1-30. London: Springer-Verlag. • Sharkey, A.J.C. (2002). Types of Multinet System. In Roli, F. & Kittler, J. (Ed), Proceedings of the Third International Workshop on Multiple Classifier Systems (MCS 2002), pp. 108-117. Berlin, Heidelberg, New York: Springer-Verlag. • Liu, Y., Yao, X., Zhao, Q. & Higuchi, T. (2002). An Experimental Comparison of Neural Network Ensemble Learning Methods on Decision Boundaries. Proceedings of the 2002 International Joint Conference on Neural Networks (IJCNN'02), vol. 1, pp. 221-226. Los Alamitos, CA.: IEEE Computer Society Press.

  44. References • Kuncheva, L.I. (2002). Switching Between Selection and Fusion in Combining Classifiers: An Experiment. IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 32(2), pp. 146-156. • Jordan, M.I. & Jacobs, R.A. (1994). Hierarchical Mixtures of Experts and the EM Algorithm. Neural Computation, vol. 6(2), pp. 181-214. • Liu, Y. & Yao, X. (1999a). Ensemble Learning via Negative Correlation. Neural Networks, vol. 12(10), pp. 1399-1404. • Bottou, L. & Gallinari, P. (1991). A Framework for the Cooperation of Learning Algorithms. In Lippmann, R.P., Moody, J.E. & Touretzky, D.S. (Ed), Advances in Neural Information Processing Systems, vol. 3, pp. 781-788. • Amari, S.-I. (1995). Information Geometry of the EM and em Algorithms for Neural Networks. Neural Networks, vol. 8(9), pp. 1379-1408. • Friedman,J.H. & Popescu,B. (2003). Importance Sampling: An Alternative View of Ensemble Learning. Presented at the 4th International Workshop on Multiple Classifier Systems (MCS 2003). Guildford, UK. • Jordan, M.I. & Xu, L. (1995). Convergence Results for the EM Approach to Mixtures of Experts Architectures. Neural Networks, vol. 8, pp. 1409-1431.

  45. References • Blake,C.L. & Merz,C.J. (1998). UCI Repository of Machine Learning Databases. http://www.ics.uci.edu/~mlearn/MLRepository.html. Irvine, CA.: University of California, Irvine, Department of Information and Computer Sciences. • Thrun, S.B., Bala, J., Bloedorn, E., Bratko, I., Cestnik, B., Cheng, J., De Jong, K., Dzeroski, S., Fahlman, S.E., Fisher, D., Hamann, R., Kaufman, K., Keller, S., Kononenko, I., Kreuziger, J., Michalski, R.S., Mitchell, T., Pachowicz, P., Reich, Y., Vafaie, H., van de Welde, W., Wenzel, W., Wnek, J. & Zhang, J. (1991). The MONK's Problems: A Performance Comparison of Different Learning Algorithms. Technical Report CMU-CS-91-197. Pittsburgh, PA.: Carnegie-Mellon University, Computer Science Department. • Wolberg, W.H. & Mangasarian, O.L. (1990). Multisurface Method of Pattern Separation for Medical Diagnosis Applied to Breast Cytology. Proceedings of the National Academy of Sciences, USA, vol. 87(23), pp. 9193-9196. • Prechelt, L. (1994). Proben1: A Set of Neural Network Benchmark Problems and Benchmarking Rules. Technical Report 21 / 94. Karlsruhe, Germany: University of Karlsruhe. • Prechelt, L. (1996). Early Stopping - But When? In Orr, G. B. & Müller, K-R. (Ed), Neural Networks: Tricks of the Trade, 1524, pp. 55-69. Berlin, Heidelberg, New York: Springer-Verlag.

More Related