Today s lecture
1 / 27

Today’s Lecture - PowerPoint PPT Presentation

  • Uploaded on

Today’s Lecture. Administrative Details Learning Decision trees: cleanup & details Belief nets Sub-symbolic learning Neural networks. Administrativia. Assignment 3 now available. Due in 1 week. <>

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Today’s Lecture' - zeus-hawkins

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Today s lecture
Today’s Lecture

  • Administrative Details

  • Learning

    • Decision trees: cleanup & details

    • Belief nets

    • Sub-symbolic learning

      • Neural networks

CS-424 Gregory Dudek


  • Assignment 3 now available. Due in 1 week.


  • The game for the final project has been defined. It’s description can be found via the course home page at:<>

  • Examine the game now, try it, see what good strategies might be.

  • Note: the normal late policy will not apply to the project.

    • You **must** submit the electronic (executable) on time, or it may not be evaluated (i.e. you get zero)!

    • It must run on LINUX. Be certain to compile and test it on one of the linux machines in the lab well before it’s due.

      • If you are developing on another platform, regularly test on linux during development.

CS-424 Gregory Dudek

Id3 more
ID3 (more…)

  • Last class, discussed using entropy to select a question for building a decision tree. This idea first developed by Quinlan (1979) in the ID3 system, later improved resulting in C4.5

  • Recap:

    • Entropy for classification into sets p+ and p- is I(p+,p-)

    • Consider information gain per attribute A.

      • For each subtree Ai we have a few bits given by the distribution of cases on the subtree I(pi+,pi-). To fully classify all sub-cases, we need Remainder(A) = i weight(i) I(pi+,pi-)

      • Thus, info gain is what’s left:

        Gain = I(p+,p-) - Remainder(A)

CS-424 Gregory Dudek

Final thoughts on entropy
Final thoughts on entropy...

  • Provoked by the seminar yesterday.

  • Idea:

    • Entropy tells you about unpredictability, or randomness.

    • When selecting a question, the one with highest entropy will carry the most information with respect to what you knew, because the answer is hardest to predict.

    • When asking if we know about something, then high entropy is bad

    • Consider the PDF of a robot’s position estimate…

      More entropy means more uncertainty.

CS-424 Gregory Dudek

Training testing
Training & testing

  • When constructing a learning system, we what to generalize to new example. (already discussed ad nauseam)

  • How can we tell if it’s working?

    • Look at how well we do with our training data.

      • But…. What if we just learned a “quirk” of the data?

      • Overfitting? Bad features?

        • (tank classification example; table lookup)

    • Look at how we do on a set of examples never used for any training: a “test set”.

      • But… what if we can’t afford the data?

    • Cross validation: learner L on training set X

      e(L:X) = i in XS=X-i error(L(S) on case i)2

CS-424 Gregory Dudek

Simple functions
Simple functions?

Is there a fixed circuit network topology that can be used to represent

a family of functions?

Yes! Neural-like networks (a.k.a. artificial neural networks) allow us this flexibility and more; we can represent arbitrary families of continuous functions using fixed topology networks.

CS-424 Gregory Dudek

Belief networks ch 15 briefly
Belief networks (ch. 15) - briefly

  • We will cover only R&N Section 15.1 & 15.2 (briefly), and then segue to chapter 19. You should read 15.3. This will be cursory coverage only.

  • A belief net is a formalism for describing probabilistic relationships (a.k.a. Bayes nets).

  • A graph (in fact a DAG) G(V,E). Nodes are random variables. Directed edges indicate a node has direct influence on another node.

  • Each node has an associated conditional probability table quantifying the effects of it’s “parents”.

  • No directed cycles.

CS-424 Gregory Dudek


  • Objective:

    • Compute probabilities of variables of interest, query variables

    • Given observations of specific phenomena in the world, evidence variables.

  • The net you get depends on how you construct it, not just the problem and probabilities.

  • Seek compactness (fewer links, tighter clusters): called locally structured nets.

CS-424 Gregory Dudek

CS-424 Gregory Dudek


Not in text


  • Where do the probabilities come from?

    • They can be learned or inferred from data.

  • Where does the causal structure come from (the topology)?

    • It’s (sometimes) very hard to learn.

    • Problem: lots of alternative topologies are possible. What’s really cause and what’s effect?

      • Did it really rain because I brought my umbrella? Can a computer infer this (or the opposite) just from weather data?

  • Both these topics are current research areas.

CS-424 Gregory Dudek

Neural networks
Neural Networks?

Artificial Neural Nets


Connectionist Nets

(connectionist learning)


Sub-symbolic learning


Perceptron learning (a special case)

CS-424 Gregory Dudek

Networks that model the brain
Networks that model the brain?

  • Note there is an interesting connection to Bayes nets: it isn’t considered in the book.

    • Something to reflect on.

  • Idea: model intelligence withour “jumping ahead” to symbolic representations.

  • Related to earliest work on cybernetics.

CS-424 Gregory Dudek

The idealized neuron
The idealized neuron

  • Artificial neural networks come in several “flavors”.

    • Most of based on a simplified model of a neuron.

  • A set of (many) inputs.

  • One output.

  • Output is a function of the sum on the inputs.

    • Typical functions:

      • Weighted sum

      • Threshold

      • Gaussian

CS-424 Gregory Dudek

Why neural nets

Not in text

Why neural nets?

  • Motives:

    • We wish to create systems with abilities akin to those of the human mind.

      • The mind is usually assumed to be be a direct consequence of the structure of the brain.

        • Let’s mimic the structure of the brain!

    • By using simple computing elements, we obtain a system that might scale up easily to parallel hardware.

    • Avoids (or solves?) the key unresolved problem of how to get from “signal domain” to symbolic representations.

    • Fault tolerance

CS-424 Gregory Dudek

Not in text

CS-424 Gregory Dudek

Not in text

CS-424 Gregory Dudek

Real and fake neurons
Real and fake neurons

  • Signals in neurons are coded by “spike rate”.

  • In ANN’s, inputs can be either:

    • 0 or 1 (binary)

    • [0,1]

    • [-1,1]

    • R (real)

  • Each input Ii has an associated real-valued weight wi

  • Learning by changing weights at synapses.

CS-424 Gregory Dudek

Not in text

CS-424 Gregory Dudek


  • The brain seems divided into functional areas

  • These are often seen as analogous to modules in a software system.

  • Why would it be like this? (2 possible answers)

    • Evolution: incremental improvement easier in a modular system.

    • Advantage of combining complementary solutions.

    • It isn’t!

CS-424 Gregory Dudek

Inductive bias

Not in text

Inductive bias?

  • Where’s the inductive bias?

    • In the topology and architecture of the network.

    • In the learning rules.

    • In the input and output representation.

    • In the initial weights.

CS-424 Gregory Dudek

Simple neural models

Not in text

Simple neural models

  • Oldest ANN model is McCulloch-Pitts neuron [1943] .

    • Inputs are +1 or -1 with real-valued weights.

    • If sum of weighted inputs is > 0, then the neuron “fires” and gives +1 as an output.

    • Showed you can comput logical functions.

    • Relation to learning proposed (later!) by Donald Hebb [1949].

  • Perceptron model [Rosenblatt, 1958].

    • Single-layer network with same kind of neuron.

      • Firing when input is about a threshold: ∑xiwi>t .

    • Added a learning rule to allow weight selection.

CS-424 Gregory Dudek

Perceptron nets
Perceptron nets

CS-424 Gregory Dudek

Perceptron learning
Perceptron learning

  • Perceptron learning:

    • Have a set of training examples (TS) encoded as input values (I.e. in the form of binary vectors)

    • Have a set of desired output values associated with these inputs.

      • This is supervisedlearning.

    • Problem: how to adjust the weights to make the actual outputs match the training examples.

      • NOTE: we to not allow the topology to change! [You should be thinking of a question here.]

  • Intuition: when a perceptron makes a mistake, it’s weights are wrong.

    • Modify them so make the output bigger or smaller, as desired.

CS-424 Gregory Dudek

Learning algorithm
Learning algorithm

  • Desired Ti Actual output Oi

  • Weight update formula (weight from unit j to i):

    Wj,i = Wj,I + k* xj * (Ti - Oi)

    Where k is the learning rate.

  • If the examples can be learned (encoded), then the perceptron learning rule will find the weights.

    • How?

      Gradient descent.

      Key thing to prove is the absence of local minima.

CS-424 Gregory Dudek

Perceptrons what can they learn
Perceptrons: what can they learn?

  • Only linearly separable functions [Minsky & Papert 1969].

  • N dimensions: N-dimensional hyperplane.

CS-424 Gregory Dudek

More general networks
More general networks

  • Generalize in 3 ways:

    • Allow continuous output values [0,1]

    • Allow multiple layers.

      • This is key to learning a larger class of functions.

    • Allow a more complicated function than thresholded summation


      Generalize the learning rule to accommodate this: let’s see how it works.

CS-424 Gregory Dudek

The threshold
The threshold

  • The key variant:

    • Change threshold into a differentiable function

    • Sigmoid, known as a “soft non-linearity” (silly).

      M = ∑xiwi

      O = 1 / (1 + e -k M )

CS-424 Gregory Dudek