Download Presentation
Artificial Neural Networks

Loading in 2 Seconds...

1 / 32

# Artificial Neural Networks - PowerPoint PPT Presentation

Artificial Neural Networks. Overview. Computational units and architectures Learning in perceptrons Learning in Multilayer feed-forward nets. Neural Nets. Composed of basic units and weighted links between them The basic units (or nodes) are an idealization of neurons

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about 'Artificial Neural Networks' - halona

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Artificial Neural Networks

Overview
• Computational units and architectures
• Learning in perceptrons
• Learning in Multilayer feed-forward nets
Neural Nets
• Composed of basic units and weighted links between them
• The basic units (or nodes) are an idealization of neurons
• Responsible for basic computations
• The pattern of connections of the units determines the network architecture
Computation at Units
• Compute a 0-1 or a graded function of the weighted sum of the inputs
• is the activation function
Common Activation Functions
• Step function:

g(x)=1, if x >= t ( t is a threshold)

g(x) = 0, if x < t

• Sign function:

g(x)=1, if x >= t ( t is a threshold)

g(x) = -1, if x < t

• Sigmoid function: g(x)= 1/(1+exp(-x))
Can Implement Boolean Functions
• A unit can implement And, Or, and Not
• Need mapping True and False to numbers:
• e.g. True = 1.0, False= 0.0
• (Exercise) Use a step function and show how to implement various simple Boolean functions
• Combining the units, we can get any Boolean function of n variables

Can obtain logical circuits as special case

Network Structures
• Recurrent (cycles exist), more powerful as they can implement state, but harder to analyze. Examples:
• Hopfield network, symmetric connections, interesting properties, useful for implementing associative memory
• Boltzmann machines: more general, with applications in constraint satisfaction and combinatorial optimization
Network Structures
• Feedforward (no cycles), less power, easier understood
• Input units
• Hidden layers
• Output units
• Perceptron: No hidden layer, so basically correspond to one unit, also basically linear threshold functions (ltf)
• Ltf: defined by weights and threshold , value is 1 iff otherwise, 0
Perceptron Capabilities
• Quite expressive: many, but not all Boolean functions can be expressed. Examples:
• conjuncts and disjunctions, example
• more generally, can represent functions that are true if and only if at least k of the inputs are true:
• Can’t represent XOR
Representable Functions
• Perceptrons have a monotinicity property:

If a link has positive weight, activation can only increase as the corresponding input value increases (irrespective of other input values)

• Can’t represent functions where input interactions can cancel one another’s effect (e.g. XOR)
Representable Functions
• Can represent only linearly separable functions
• Geometrically: only if there is a line (plane) separating the positives from the negatives
• The good news: such functions are PAC learnable and learning algorithms exist
Linearly Separable

-

+

+

+

_

+

+

+

+

+

+

+

+

+

The Perceptron Learning Algorithm
• Example of current-best-hypothesis (CBH) search (so incremental, etc.):
• Begin with a hypothesis (a perceptron)
• Repeat over all examples several times
• Adjust weights as examples are seen
• Until all examples correctly classified or a stopping criterion reached
Method for Adjusting Weights
• One weight update possibility:
• If classification correct, don’t change
• Otherwise:
• If false negative, add input:
• If false positive, subtract input:
• Intuition: For instance, if example is positive, strengthen/increase the weights corresponding to the positive attributes of the example
Properties of the Algorithm
• In general, also apply a learning rate (see book):
• The adjustment is in the direction of minimizing error on the example
• If learning rate is appropriate and the examples are linear separable, after a finite number of iterations, the algorithm converges to a linear separator
Another Algorithm(least-sum-squares algorithm)
• Define and minimize an error function
• S is the set of examples, is the ideal function, is the linear function corresponding to the current perceptron
• Error of the perceptron (over all examples):
• Note:
Derivative of Error
• Gradient (derivative) of E:
• Take the steepest descent direction:
• is the gradient along , is the learning rate
Gradient Descent
• The algorithm: pick initial random hype (perceptron) and repeatedly compute error and modify the perceptron (take a step along the reverse of gradient)

E

Gradient direction:

Descent direction:

Properties of the algorithm
• Error function has no local minima (is quadratic)
• The algorithm is a gradient descent method to the global minimum, and will asymptotically converge
• Even if not linearly separable, can find a good (minimum error) linear classifier
• Incremental?
A Third Method
• Formulate problem in terms of a linear feasibility or linearoptimization problem
• Example: find weights such that
• Can be solved in polynomial time (output none if no solution exists, or otherwise output a solution)
Multilayer Feed-Forward Networks
• Multiple perceptrons, layered
• Example: a two-layer network with 3 inputs one output, one hidden layer (two hidden units)

output layer

inputs layer

hidden layer

Power/Expressiveness
• Can represent interactions among inputs (unlike perceptrons)
• Two layer networks can represent any Boolean function, and continuous functions (within a tolerance) as long as the number of hidden units is sufficient and appropriate activation functions used
• Learning algorithms exist, but weaker guarantees than perceptron learning algorithms
Back-Propagation
• Similar to the perceptron learning algorithm and gradient descent for perceptrons
• Problem to overcome: How to adjust internal links (how to distribute the “blame” or the error)
• Assumption: internal units use differentiable functions and nonlinear
• sigmoid functions are convenient
Back-Propagation (cont.)
• Start with a hype (network with random weights)
• Repeat until a stopping criterion is met
• For each example, compute the network output and for each unit i it’s error term
• Update each weight (weight of link going from node i to node j):

Output of unit i

Derivation
• Write the error for a single training example; as before use sum of squared error (as it’s convenient for differentiation, etc):
• Differentiate (with respect to each weight…)
• For example, we get

for weight connecting node j to output i

Properties
• Converges to a minimum, but could be a local minimum
• Could be slow to converge

(Note: Training a three node net is NP-Complete!)

• Must watch for over-fitting just as in decision trees (use validation sets, etc.)
• Network structure? Often two layers suffices, start with relatively few hidden units
Properties (cont.)
• Many variations to the basic back-propagation: e.g. use momentum
• Reduce with time (applies to perceptrons as well)

Nth update amount

a constant

NN properties
• Can handle domains with
• continuous and discrete attributes
• Many attributes
• noisy data
• Could be slow at training but fast at evaluation time
• Human understanding of what the network does could be limited