WK1 - Introduction

WK1 - Introduction CS 476: Networks of Neural Computation WK1 – Introduction Dr. Stathis Kasderidis Dept. of Computer Science University of Crete Spring Semester, 2009

Contents • Course structure and details • Basic ideas of Neural Networks • Historical development of Neural Networks • Types of learning • Optimisation techniques and the LMS method • Conclusions Contents

Course Details • Duration: 13 weeks (2 Feb – 15 May 2009) • Lecturer: Stathis Kasderidis • E-mail: stathis@ics.forth.gr • Meetings: After arrangement through e-mail. • Assts: Farmaki, Fasoulakis • Hours: • Every Tue 11-1 am and Wed 11-1 am. • Laboratory at Fri 11-1 am. Course

Course Timetable • WK1 (3/5 Feb): Introduction • WK2 (10/12 Feb): Perceptron • WK3 (17/19 Feb): Multi-layer Perceptron • WK4 (24/26 Oct): Radial Basis Networks • WK5 (3/5 Mar): Recurrent Networks • WK6 (10/12 Mar): Self-Organising Networks • WK7 (17/19 Mar): Hebbian Learning • WK8 (24/26 Mar): Hopfield Networks • WK9 (31/2 Apr): Principal Component Analysis Course

Course Timetable (Cont) • WK10 (7/9 Apr): Support Vector Machines • WK11 (28/30 Apr): Stochastic Networks • WK12 (5/7 May): Student Projects’ Presentation • WK13 (12/14 May): Exams Preparation • Every week: • 3hrs Theory • 1hr Demonstration • 19 Mar 2009: Written mid-term exams (optional) Course

Course Timetable (Cont) • Lab sessions will take place every Friday 11-1 am. In Lab sessions, you will be examined in written assignments and you can get help between assignments. • There will be four assignments during the term on the following dates: • Fri 6 Mar (Ass1 – Perceptron / MLP / RBF) • Fri 20 Mar (Ass2 – Recurrent / Self-organising) • Fri 3 Apr (Ass3 – Hebbian / Hopfield) • Fri 8 May (Ass4 – PCA/SVM/Stochastic)

Course Structure • Final grade is divided: • Laboratory attendance (20%) • Obligatory! • Course project (40%) • Starts at WK2. Presentation at WK12. • Teams of 2-4 people depending on class size. Selection from a set of offered projects. • Theory. Best of: • Final Theory Exams (40%) or • Final Theory Exams (25%) + Mid-term exams (15%) Course

Project Problems • Problems categories: • Time Series Prediction (Financial Series?) • Color Segmentation with Self-Organising Networks. • Robotic Arm control with Self-Organising Networks • Pattern Classification (Geometric Shapes) • Cognitive Modeling (ALCOVE model) Course

Suggested Tools • Tools: • MATLAB (+ Neural Networks Toolbox). Can be slow in large problems! • TLearn: http://crl.ucsd.edu/innate/tlearn.html • Any C/C++ compiler • Avoid Java and other interpreted languages! Too slow! Course

What are Neural Networks? • Models inspired by real nervous systems • They have a mathematical and computational formulation • Very general modelling tools • Different approach to Symbolic AI (Connectionism) • Many paradigms exist but based on common ideas • A type of graphical models • Usedin many scientific and technological areas, e.g. Basic Ideas

What are Neural Networks? (Cont.) Basic Ideas

What are Neural Networks? (Cont. 2) • NNs & Physics: e.g. Spin Glasses • NNs & Mathematics: e.g. Random Fields • NNs & Philosophy: e.g. Theory of Mind, Consciousness • NNs & Cognitive Science: e.g. Connectionist Models of High-Level Functions (Memory, Language, etc) • NNs & Engineering: e.g. Control, Hybrid Systems, A-Life • NNs & Neuroscience: e.g. Channel dynamics, Compartmental models Basic Ideas

What are Neural Networks? (Cont. 3) • NNs & Finance: e.g. Agent-based models of markets, • NNs & Social Science: e.g. Artif. Society Basic Ideas

General Characteristics I • How do they look like? Basic Ideas

General Characteristics II • Node details: • Y=f(Act) • f is called Transfer function • Act=I Xi * Wi –B • B is called Bias • W are called Weights Basic Ideas

General Characteristics III • Form of transfer function: Basic Ideas

General Characteristics IV • Network Specification: • Number of neurons • Topology of connections (Recurrent, Feedforward, etc) • Transfer function(s) • Input types (representation: symbols, etc) • Output types (representation: as above) • Weight parameters, W • Other (weights initialisation, Cost function, training criteria, etc) Basic Ideas

General Characteristics V • Processing Modes: • Recall • “Learning” Basic Ideas

General Characteristics VI • Common properties of all Neural Networks: • Distributed representations • Graceful degradation due to damage • Noise robustness • Non-linear mappings • Generalisation and prototype extraction • Allow access of memory by contents • Can work with incomplete input Basic Ideas

Historical Development of Neural Networks • History in brief: • McCulloch-Pitts, 1943: Digital Neurons • Hebb, 1949:Synaptic plasticity • Rosenblant, 1958: Perceptron • Minksy & Papert, 1969: Perceptron Critique • Kohonen, 1978: Self-Organising Maps • Hopfiled, 1982: Associative Memory • Rumelhart & McLelland, 1986: Back-Prop algorithm • Many people, 1985-today:EXPLOSION! History

What is Learning in NN? Def: “Learning is a process by which the free parameters of neural network are adapted through a process of stimulation by the environment in which the network is embedded. The type of learning is determined by the manner in which the parameter changes take place” [Mendel & McClaren (1970)] Learning

Learning Sequence • The network is stimulated by the environment; • The network undergoes changes in its free parameters as a result of this stimulation; • The network responds in a new way to the environment because of the changes that have occurred in its internal structure. Learning

Learning Criteria • Sum squared error • Mean square error • X2 statistic • Mutual information • Entropy • Other (e.g. Dot product – ‘similarity’) Learning

Learning Paradigms • Learning with a teacher (supervised learning) • Learning without a teacher • Reinforcement learning • Unsupervised learning (self-organisation) Learning

Families of Learning Algorithms • Error-based learning • wkj(n) = h*ek(n)*xj(n) (Delta rule) • Memory-based learning (??) • 1-Nearest Neighbour • K-Nearest Neighbours • Hebbian learning • wkj(n) = h*yk(n)*xj(n) • wkj(n) =F(yk(n),xj(n)) (more general case) • Competitive learning • wij(n+1) = h*(xj(n)- wij(n)) Learning

Families of Learning Algorithms II • Stochastic Networks • Boltzmann learning • wkj(n) = h*(kj+(n)-kj-(n)) • (kj* = avg corr of states of neurons i, j ) Learning

Learning Tasks • Function approximation • Association • Auto-association • Hetero-association • Pattern recognition • Control • Filtering Learning

Credit Assignment Problem • Def: It is the problem of providing credit or blame to states that lead to useful / harmful outcomes • Temporal Credit Assignment Problem: Find which actions in a period q=[t,t-T] lead to useful outcome at time t and credit these actions, I.e. • Outcome(t) – f  Actions(q) • Structural Credit Assignment Problem: Find which states at time t lead to useful actions at time t, I.e. • Actions(t) – g  State(t) Learning

Statistical Nature of the Learning Process • Assume that a set of examples is given: • Assume that a statistical model of the generating process is given (regression equation): • Where X is a vector random variable (independent variable), D is scalar random variable (dependent) and  is a random variable with the following properties: Bias / Var

Statistical Nature of the Learning Process II • The first property says that  has zero mean given any realisation of X • The second property says that  is uncorrelated with the regression function f(X) (principle of orthogonality) •  is called intrinsic error • Assume that the neural network describes an “approximation” to the regression function, which is: Bias / Var

Statistical Nature of the Learning Process III • The weight vector w is obtained by minimising the cost function: • We can re-write this, using expectation operators, as: Bias / Var • … (after some algebra we get) ….

Statistical Nature of the Learning Process IV • Thus to obtain w we need to optimise the function: • … (after some more algebra!) …. Bias / Var

Statistical Nature of the Learning Process V • B(w) is called bias (or approximation error) • V(w) is called variance (or estimation error) • The last relation shows the bias-variance dilemma: • “We cannot minimise at the same time both • bias and variance for a finite set, T. Only • when N   both are becoming zero” • Bias measures the “goodness” of our functional form in approximating the true regression function f(x) • Variance measures the amount of information present in the data set T which is used for estimating F(x,w) Bias / Var

Comments I • We should distinguish Artificial NN from bio-physicalneural models (e.g. Blue Brain Project); • Some NNs are Universal Approximators, e.g. feed-forward modles are based on the Kolmogorov Theorem • Can be combined with other methods, e.g. Neuro-Fuzzy Systems • Flexible modeling tools for: • Function approximation • Pattern Classification • Association • Other Conclusions

Comments II • Advantages: • Distributed representation allows co-activation of categories • Graceful degradation • Robustness to noise • Automatic generalisation (of categories, etc) Conclusions

Comments III • Disadvantages: • They cannot explain their function due to distributed representations • We cannot add existing knowledge to neural networks as rules • We cannot extract rules • Network parameters found by trial and error (in general case) Conclusions

WK1 - Introduction