artificial neural networks l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Artificial Neural Networks PowerPoint Presentation
Download Presentation
Artificial Neural Networks

Loading in 2 Seconds...

play fullscreen
1 / 32

Artificial Neural Networks - PowerPoint PPT Presentation


  • 138 Views
  • Uploaded on

Artificial Neural Networks. Overview. Computational units and architectures Learning in perceptrons Learning in Multilayer feed-forward nets. Neural Nets. Composed of basic units and weighted links between them The basic units (or nodes) are an idealization of neurons

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Artificial Neural Networks' - halona


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview
Overview
  • Computational units and architectures
  • Learning in perceptrons
  • Learning in Multilayer feed-forward nets
neural nets
Neural Nets
  • Composed of basic units and weighted links between them
  • The basic units (or nodes) are an idealization of neurons
  • Responsible for basic computations
  • The pattern of connections of the units determines the network architecture
computation at units
Computation at Units
  • Compute a 0-1 or a graded function of the weighted sum of the inputs
  • is the activation function
common activation functions
Common Activation Functions
  • Step function:

g(x)=1, if x >= t ( t is a threshold)

g(x) = 0, if x < t

  • Sign function:

g(x)=1, if x >= t ( t is a threshold)

g(x) = -1, if x < t

  • Sigmoid function: g(x)= 1/(1+exp(-x))
can implement boolean functions
Can Implement Boolean Functions
  • A unit can implement And, Or, and Not
  • Need mapping True and False to numbers:
    • e.g. True = 1.0, False= 0.0
  • (Exercise) Use a step function and show how to implement various simple Boolean functions
  • Combining the units, we can get any Boolean function of n variables

Can obtain logical circuits as special case

network structures
Network Structures
  • Recurrent (cycles exist), more powerful as they can implement state, but harder to analyze. Examples:
      • Hopfield network, symmetric connections, interesting properties, useful for implementing associative memory
      • Boltzmann machines: more general, with applications in constraint satisfaction and combinatorial optimization
network structures8
Network Structures
  • Feedforward (no cycles), less power, easier understood
    • Input units
    • Hidden layers
    • Output units
  • Perceptron: No hidden layer, so basically correspond to one unit, also basically linear threshold functions (ltf)
  • Ltf: defined by weights and threshold , value is 1 iff otherwise, 0
perceptron capabilities
Perceptron Capabilities
  • Quite expressive: many, but not all Boolean functions can be expressed. Examples:
    • conjuncts and disjunctions, example
    • more generally, can represent functions that are true if and only if at least k of the inputs are true:
    • Can’t represent XOR
representable functions
Representable Functions
  • Perceptrons have a monotinicity property:

If a link has positive weight, activation can only increase as the corresponding input value increases (irrespective of other input values)

  • Can’t represent functions where input interactions can cancel one another’s effect (e.g. XOR)
representable functions11
Representable Functions
  • Can represent only linearly separable functions
  • Geometrically: only if there is a line (plane) separating the positives from the negatives
  • The good news: such functions are PAC learnable and learning algorithms exist
linearly separable
Linearly Separable

-

+

+

+

_

+

+

+

+

+

+

+

+

+

the perceptron learning algorithm
The Perceptron Learning Algorithm
  • Example of current-best-hypothesis (CBH) search (so incremental, etc.):
  • Begin with a hypothesis (a perceptron)
  • Repeat over all examples several times
    • Adjust weights as examples are seen
  • Until all examples correctly classified or a stopping criterion reached
method for adjusting weights
Method for Adjusting Weights
  • One weight update possibility:
  • If classification correct, don’t change
  • Otherwise:
    • If false negative, add input:
    • If false positive, subtract input:
  • Intuition: For instance, if example is positive, strengthen/increase the weights corresponding to the positive attributes of the example
properties of the algorithm
Properties of the Algorithm
  • In general, also apply a learning rate (see book):
  • The adjustment is in the direction of minimizing error on the example
  • If learning rate is appropriate and the examples are linear separable, after a finite number of iterations, the algorithm converges to a linear separator
another algorithm least sum squares algorithm
Another Algorithm(least-sum-squares algorithm)
  • Define and minimize an error function
  • S is the set of examples, is the ideal function, is the linear function corresponding to the current perceptron
  • Error of the perceptron (over all examples):
  • Note:
derivative of error
Derivative of Error
  • Gradient (derivative) of E:
  • Take the steepest descent direction:
  • is the gradient along , is the learning rate
gradient descent
Gradient Descent
  • The algorithm: pick initial random hype (perceptron) and repeatedly compute error and modify the perceptron (take a step along the reverse of gradient)

E

Gradient direction:

Descent direction:

properties of the algorithm22
Properties of the algorithm
  • Error function has no local minima (is quadratic)
  • The algorithm is a gradient descent method to the global minimum, and will asymptotically converge
  • Even if not linearly separable, can find a good (minimum error) linear classifier
  • Incremental?
a third method
A Third Method
  • Formulate problem in terms of a linear feasibility or linearoptimization problem
  • Example: find weights such that
  • Can be solved in polynomial time (output none if no solution exists, or otherwise output a solution)
multilayer feed forward networks
Multilayer Feed-Forward Networks
  • Multiple perceptrons, layered
  • Example: a two-layer network with 3 inputs one output, one hidden layer (two hidden units)

output layer

inputs layer

hidden layer

power expressiveness
Power/Expressiveness
  • Can represent interactions among inputs (unlike perceptrons)
  • Two layer networks can represent any Boolean function, and continuous functions (within a tolerance) as long as the number of hidden units is sufficient and appropriate activation functions used
  • Learning algorithms exist, but weaker guarantees than perceptron learning algorithms
back propagation
Back-Propagation
  • Similar to the perceptron learning algorithm and gradient descent for perceptrons
  • Problem to overcome: How to adjust internal links (how to distribute the “blame” or the error)
  • Assumption: internal units use differentiable functions and nonlinear
  • sigmoid functions are convenient
back propagation cont
Back-Propagation (cont.)
  • Start with a hype (network with random weights)
  • Repeat until a stopping criterion is met
    • For each example, compute the network output and for each unit i it’s error term
    • Update each weight (weight of link going from node i to node j):

Output of unit i

derivation
Derivation
  • Write the error for a single training example; as before use sum of squared error (as it’s convenient for differentiation, etc):
  • Differentiate (with respect to each weight…)
  • For example, we get

for weight connecting node j to output i

properties
Properties
  • Converges to a minimum, but could be a local minimum
  • Could be slow to converge

(Note: Training a three node net is NP-Complete!)

  • Must watch for over-fitting just as in decision trees (use validation sets, etc.)
  • Network structure? Often two layers suffices, start with relatively few hidden units
properties cont
Properties (cont.)
  • Many variations to the basic back-propagation: e.g. use momentum
  • Reduce with time (applies to perceptrons as well)

Nth update amount

a constant

nn properties
NN properties
  • Can handle domains with
    • continuous and discrete attributes
    • Many attributes
    • noisy data
  • Could be slow at training but fast at evaluation time
  • Human understanding of what the network does could be limited