- 44 Views
- Uploaded on
- Presentation posted in: General

Today’s Lecture

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- Administrative Details
- Learning
- Decision trees: cleanup & details
- Belief nets
- Sub-symbolic learning
- Neural networks

CS-424 Gregory Dudek

- Assignment 3 now available. Due in 1 week.
<http://www.cim.mcgill.ca/~dudek/424/game.pdf>

- The game for the final project has been defined. It’s description can be found via the course home page at:<http://www.cim.mcgill.ca/~dudek/424/game.pdf>
- Examine the game now, try it, see what good strategies might be.
- Note: the normal late policy will not apply to the project.
- You **must** submit the electronic (executable) on time, or it may not be evaluated (i.e. you get zero)!
- It must run on LINUX. Be certain to compile and test it on one of the linux machines in the lab well before it’s due.
- If you are developing on another platform, regularly test on linux during development.

CS-424 Gregory Dudek

- Last class, discussed using entropy to select a question for building a decision tree. This idea first developed by Quinlan (1979) in the ID3 system, later improved resulting in C4.5
- Recap:
- Entropy for classification into sets p+ and p- is I(p+,p-)
- Consider information gain per attribute A.
- For each subtree Ai we have a few bits given by the distribution of cases on the subtree I(pi+,pi-). To fully classify all sub-cases, we needRemainder(A) = i weight(i) I(pi+,pi-)
- Thus, info gain is what’s left:
Gain = I(p+,p-) - Remainder(A)

CS-424 Gregory Dudek

- Provoked by the seminar yesterday.
- Idea:
- Entropy tells you about unpredictability, or randomness.
- When selecting a question, the one with highest entropy will carry the most information with respect to what you knew, because the answer is hardest to predict.
- When asking if we know about something, then high entropy is bad
- Consider the PDF of a robot’s position estimate…
More entropy means more uncertainty.

CS-424 Gregory Dudek

- When constructing a learning system, we what to generalize to new example. (already discussed ad nauseam)
- How can we tell if it’s working?
- Look at how well we do with our training data.
- But…. What if we just learned a “quirk” of the data?
- Overfitting? Bad features?
- (tank classification example; table lookup)

- Look at how we do on a set of examples never used for any training: a “test set”.
- But… what if we can’t afford the data?

- Cross validation: learner L on training set X
e(L:X) = i in XS=X-i error(L(S) on case i)2

- Look at how well we do with our training data.

CS-424 Gregory Dudek

Is there a fixed circuit network topology that can be used to represent

a family of functions?

Yes! Neural-like networks (a.k.a. artificial neural networks) allow us this flexibility and more; we can represent arbitrary families of continuous functions using fixed topology networks.

CS-424 Gregory Dudek

- We will cover only R&N Section 15.1 & 15.2 (briefly), and then segue to chapter 19. You should read 15.3. This will be cursory coverage only.
- A belief net is a formalism for describing probabilistic relationships (a.k.a. Bayes nets).
- A graph (in fact a DAG) G(V,E). Nodes are random variables. Directed edges indicate a node has direct influence on another node.
- Each node has an associated conditional probability table quantifying the effects of it’s “parents”.
- No directed cycles.

CS-424 Gregory Dudek

- Objective:
- Compute probabilities of variables of interest, query variables
- Given observations of specific phenomena in the world, evidence variables.

- The net you get depends on how you construct it, not just the problem and probabilities.
- Seek compactness (fewer links, tighter clusters): called locally structured nets.

CS-424 Gregory Dudek

- See overheads...

CS-424 Gregory Dudek

Not in text

- Where do the probabilities come from?
- They can be learned or inferred from data.

- Where does the causal structure come from (the topology)?
- It’s (sometimes) very hard to learn.
- Problem: lots of alternative topologies are possible. What’s really cause and what’s effect?
- Did it really rain because I brought my umbrella? Can a computer infer this (or the opposite) just from weather data?

- Both these topics are current research areas.

CS-424 Gregory Dudek

Artificial Neural Nets

a.k.a.

Connectionist Nets

(connectionist learning)

a.k.a.

Sub-symbolic learning

a.k.a.

Perceptron learning (a special case)

CS-424 Gregory Dudek

- Note there is an interesting connection to Bayes nets: it isn’t considered in the book.
- Something to reflect on.

- Idea: model intelligence withour “jumping ahead” to symbolic representations.
- Related to earliest work on cybernetics.

CS-424 Gregory Dudek

- Artificial neural networks come in several “flavors”.
- Most of based on a simplified model of a neuron.

- A set of (many) inputs.
- One output.
- Output is a function of the sum on the inputs.
- Typical functions:
- Weighted sum
- Threshold
- Gaussian

- Typical functions:

CS-424 Gregory Dudek

Not in text

- Motives:
- We wish to create systems with abilities akin to those of the human mind.
- The mind is usually assumed to be be a direct consequence of the structure of the brain.
- Let’s mimic the structure of the brain!

- The mind is usually assumed to be be a direct consequence of the structure of the brain.
- By using simple computing elements, we obtain a system that might scale up easily to parallel hardware.
- Avoids (or solves?) the key unresolved problem of how to get from “signal domain” to symbolic representations.
- Fault tolerance

- We wish to create systems with abilities akin to those of the human mind.

CS-424 Gregory Dudek

Not in text

CS-424 Gregory Dudek

Not in text

CS-424 Gregory Dudek

- Signals in neurons are coded by “spike rate”.
- In ANN’s, inputs can be either:
- 0 or 1 (binary)
- [0,1]
- [-1,1]
- R (real)

- Each input Ii has an associated real-valued weight wi
- Learning by changing weights at synapses.

CS-424 Gregory Dudek

Not in text

CS-424 Gregory Dudek

- The brain seems divided into functional areas
- These are often seen as analogous to modules in a software system.
- Why would it be like this? (2 possible answers)
- Evolution: incremental improvement easier in a modular system.
- Advantage of combining complementary solutions.
- It isn’t!

CS-424 Gregory Dudek

Not in text

- Where’s the inductive bias?
- In the topology and architecture of the network.
- In the learning rules.
- In the input and output representation.
- In the initial weights.

CS-424 Gregory Dudek

Not in text

- Oldest ANN model is McCulloch-Pitts neuron [1943] .
- Inputs are +1 or -1 with real-valued weights.
- If sum of weighted inputs is > 0, then the neuron “fires” and gives +1 as an output.
- Showed you can comput logical functions.
- Relation to learning proposed (later!) by Donald Hebb [1949].

- Perceptron model [Rosenblatt, 1958].
- Single-layer network with same kind of neuron.
- Firing when input is about a threshold: ∑xiwi>t .

- Added a learning rule to allow weight selection.

- Single-layer network with same kind of neuron.

CS-424 Gregory Dudek

CS-424 Gregory Dudek

- Perceptron learning:
- Have a set of training examples (TS) encoded as input values (I.e. in the form of binary vectors)
- Have a set of desired output values associated with these inputs.
- This is supervisedlearning.

- Problem: how to adjust the weights to make the actual outputs match the training examples.
- NOTE: we to not allow the topology to change! [You should be thinking of a question here.]

- Intuition: when a perceptron makes a mistake, it’s weights are wrong.
- Modify them so make the output bigger or smaller, as desired.

CS-424 Gregory Dudek

- Desired Ti Actual output Oi
- Weight update formula (weight from unit j to i):
Wj,i = Wj,I + k* xj * (Ti - Oi)

Where k is the learning rate.

- If the examples can be learned (encoded), then the perceptron learning rule will find the weights.
- How?
Gradient descent.

Key thing to prove is the absence of local minima.

- How?

CS-424 Gregory Dudek

- Only linearly separable functions [Minsky & Papert 1969].
- N dimensions: N-dimensional hyperplane.

CS-424 Gregory Dudek

- Generalize in 3 ways:
- Allow continuous output values [0,1]
- Allow multiple layers.
- This is key to learning a larger class of functions.

- Allow a more complicated function than thresholded summation
[why??]

Generalize the learning rule to accommodate this: let’s see how it works.

CS-424 Gregory Dudek

- The key variant:
- Change threshold into a differentiable function
- Sigmoid, known as a “soft non-linearity” (silly).
M = ∑xiwi

O = 1 / (1 + e -k M )

CS-424 Gregory Dudek