today s lecture
Download
Skip this Video
Download Presentation
Today’s Lecture

Loading in 2 Seconds...

play fullscreen
1 / 27

Today’s Lecture - PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on

Today’s Lecture. Administrative Details Learning Decision trees: cleanup & details Belief nets Sub-symbolic learning Neural networks. Administrativia. Assignment 3 now available. Due in 1 week. <http://www.cim.mcgill.ca/~dudek/424/game.pdf>

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Today’s Lecture' - zeus-hawkins


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
today s lecture
Today’s Lecture
  • Administrative Details
  • Learning
    • Decision trees: cleanup & details
    • Belief nets
    • Sub-symbolic learning
      • Neural networks

CS-424 Gregory Dudek

administrativia
Administrativia
  • Assignment 3 now available. Due in 1 week.

<http://www.cim.mcgill.ca/~dudek/424/game.pdf>

  • The game for the final project has been defined. It’s description can be found via the course home page at:<http://www.cim.mcgill.ca/~dudek/424/game.pdf>
  • Examine the game now, try it, see what good strategies might be.
  • Note: the normal late policy will not apply to the project.
    • You **must** submit the electronic (executable) on time, or it may not be evaluated (i.e. you get zero)!
    • It must run on LINUX. Be certain to compile and test it on one of the linux machines in the lab well before it’s due.
      • If you are developing on another platform, regularly test on linux during development.

CS-424 Gregory Dudek

id3 more
ID3 (more…)
  • Last class, discussed using entropy to select a question for building a decision tree. This idea first developed by Quinlan (1979) in the ID3 system, later improved resulting in C4.5
  • Recap:
    • Entropy for classification into sets p+ and p- is I(p+,p-)
    • Consider information gain per attribute A.
      • For each subtree Ai we have a few bits given by the distribution of cases on the subtree I(pi+,pi-). To fully classify all sub-cases, we need Remainder(A) = i weight(i) I(pi+,pi-)
      • Thus, info gain is what’s left:

Gain = I(p+,p-) - Remainder(A)

CS-424 Gregory Dudek

final thoughts on entropy
Final thoughts on entropy...
  • Provoked by the seminar yesterday.
  • Idea:
    • Entropy tells you about unpredictability, or randomness.
    • When selecting a question, the one with highest entropy will carry the most information with respect to what you knew, because the answer is hardest to predict.
    • When asking if we know about something, then high entropy is bad
    • Consider the PDF of a robot’s position estimate…

More entropy means more uncertainty.

CS-424 Gregory Dudek

training testing
Training & testing
  • When constructing a learning system, we what to generalize to new example. (already discussed ad nauseam)
  • How can we tell if it’s working?
    • Look at how well we do with our training data.
      • But…. What if we just learned a “quirk” of the data?
      • Overfitting? Bad features?
        • (tank classification example; table lookup)
    • Look at how we do on a set of examples never used for any training: a “test set”.
      • But… what if we can’t afford the data?
    • Cross validation: learner L on training set X

e(L:X) = i in XS=X-i error(L(S) on case i)2

CS-424 Gregory Dudek

simple functions
Simple functions?

Is there a fixed circuit network topology that can be used to represent

a family of functions?

Yes! Neural-like networks (a.k.a. artificial neural networks) allow us this flexibility and more; we can represent arbitrary families of continuous functions using fixed topology networks.

CS-424 Gregory Dudek

belief networks ch 15 briefly
Belief networks (ch. 15) - briefly
  • We will cover only R&N Section 15.1 & 15.2 (briefly), and then segue to chapter 19. You should read 15.3. This will be cursory coverage only.
  • A belief net is a formalism for describing probabilistic relationships (a.k.a. Bayes nets).
  • A graph (in fact a DAG) G(V,E). Nodes are random variables. Directed edges indicate a node has direct influence on another node.
  • Each node has an associated conditional probability table quantifying the effects of it’s “parents”.
  • No directed cycles.

CS-424 Gregory Dudek

slide8
Why?
  • Objective:
    • Compute probabilities of variables of interest, query variables
    • Given observations of specific phenomena in the world, evidence variables.
  • The net you get depends on how you construct it, not just the problem and probabilities.
  • Seek compactness (fewer links, tighter clusters): called locally structured nets.

CS-424 Gregory Dudek

slide9
See overheads...

CS-424 Gregory Dudek

issues

Not in text

Issues
  • Where do the probabilities come from?
    • They can be learned or inferred from data.
  • Where does the causal structure come from (the topology)?
    • It’s (sometimes) very hard to learn.
    • Problem: lots of alternative topologies are possible. What’s really cause and what’s effect?
      • Did it really rain because I brought my umbrella? Can a computer infer this (or the opposite) just from weather data?
  • Both these topics are current research areas.

CS-424 Gregory Dudek

neural networks
Neural Networks?

Artificial Neural Nets

a.k.a.

Connectionist Nets

(connectionist learning)

a.k.a.

Sub-symbolic learning

a.k.a.

Perceptron learning (a special case)

CS-424 Gregory Dudek

networks that model the brain
Networks that model the brain?
  • Note there is an interesting connection to Bayes nets: it isn’t considered in the book.
    • Something to reflect on.
  • Idea: model intelligence withour “jumping ahead” to symbolic representations.
  • Related to earliest work on cybernetics.

CS-424 Gregory Dudek

the idealized neuron
The idealized neuron
  • Artificial neural networks come in several “flavors”.
    • Most of based on a simplified model of a neuron.
  • A set of (many) inputs.
  • One output.
  • Output is a function of the sum on the inputs.
    • Typical functions:
      • Weighted sum
      • Threshold
      • Gaussian

CS-424 Gregory Dudek

why neural nets

Not in text

Why neural nets?
  • Motives:
    • We wish to create systems with abilities akin to those of the human mind.
      • The mind is usually assumed to be be a direct consequence of the structure of the brain.
        • Let’s mimic the structure of the brain!
    • By using simple computing elements, we obtain a system that might scale up easily to parallel hardware.
    • Avoids (or solves?) the key unresolved problem of how to get from “signal domain” to symbolic representations.
    • Fault tolerance

CS-424 Gregory Dudek

slide15

Not in text

CS-424 Gregory Dudek

slide16

Not in text

CS-424 Gregory Dudek

real and fake neurons
Real and fake neurons
  • Signals in neurons are coded by “spike rate”.
  • In ANN’s, inputs can be either:
    • 0 or 1 (binary)
    • [0,1]
    • [-1,1]
    • R (real)
  • Each input Ii has an associated real-valued weight wi
  • Learning by changing weights at synapses.

CS-424 Gregory Dudek

slide18

Not in text

CS-424 Gregory Dudek

brains
Brains
  • The brain seems divided into functional areas
  • These are often seen as analogous to modules in a software system.
  • Why would it be like this? (2 possible answers)
    • Evolution: incremental improvement easier in a modular system.
    • Advantage of combining complementary solutions.
    • It isn’t!

CS-424 Gregory Dudek

inductive bias

Not in text

Inductive bias?
  • Where’s the inductive bias?
    • In the topology and architecture of the network.
    • In the learning rules.
    • In the input and output representation.
    • In the initial weights.

CS-424 Gregory Dudek

simple neural models

Not in text

Simple neural models
  • Oldest ANN model is McCulloch-Pitts neuron [1943] .
    • Inputs are +1 or -1 with real-valued weights.
    • If sum of weighted inputs is > 0, then the neuron “fires” and gives +1 as an output.
    • Showed you can comput logical functions.
    • Relation to learning proposed (later!) by Donald Hebb [1949].
  • Perceptron model [Rosenblatt, 1958].
    • Single-layer network with same kind of neuron.
      • Firing when input is about a threshold: ∑xiwi>t .
    • Added a learning rule to allow weight selection.

CS-424 Gregory Dudek

perceptron nets
Perceptron nets

CS-424 Gregory Dudek

perceptron learning
Perceptron learning
  • Perceptron learning:
    • Have a set of training examples (TS) encoded as input values (I.e. in the form of binary vectors)
    • Have a set of desired output values associated with these inputs.
      • This is supervisedlearning.
    • Problem: how to adjust the weights to make the actual outputs match the training examples.
      • NOTE: we to not allow the topology to change! [You should be thinking of a question here.]
  • Intuition: when a perceptron makes a mistake, it’s weights are wrong.
    • Modify them so make the output bigger or smaller, as desired.

CS-424 Gregory Dudek

learning algorithm
Learning algorithm
  • Desired Ti Actual output Oi
  • Weight update formula (weight from unit j to i):

Wj,i = Wj,I + k* xj * (Ti - Oi)

Where k is the learning rate.

  • If the examples can be learned (encoded), then the perceptron learning rule will find the weights.
    • How?

Gradient descent.

Key thing to prove is the absence of local minima.

CS-424 Gregory Dudek

perceptrons what can they learn
Perceptrons: what can they learn?
  • Only linearly separable functions [Minsky & Papert 1969].
  • N dimensions: N-dimensional hyperplane.

CS-424 Gregory Dudek

more general networks
More general networks
  • Generalize in 3 ways:
    • Allow continuous output values [0,1]
    • Allow multiple layers.
      • This is key to learning a larger class of functions.
    • Allow a more complicated function than thresholded summation

[why??]

Generalize the learning rule to accommodate this: let’s see how it works.

CS-424 Gregory Dudek

the threshold
The threshold
  • The key variant:
    • Change threshold into a differentiable function
    • Sigmoid, known as a “soft non-linearity” (silly).

M = ∑xiwi

O = 1 / (1 + e -k M )

CS-424 Gregory Dudek

ad