- 139 Views
- Uploaded on

Download Presentation
## Artificial Neural Networks

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Overview

- Computational units and architectures
- Learning in perceptrons
- Learning in Multilayer feed-forward nets

Neural Nets

- Composed of basic units and weighted links between them
- The basic units (or nodes) are an idealization of neurons
- Responsible for basic computations
- The pattern of connections of the units determines the network architecture

Computation at Units

- Compute a 0-1 or a graded function of the weighted sum of the inputs
- is the activation function

Common Activation Functions

- Step function:

g(x)=1, if x >= t ( t is a threshold)

g(x) = 0, if x < t

- Sign function:

g(x)=1, if x >= t ( t is a threshold)

g(x) = -1, if x < t

- Sigmoid function: g(x)= 1/(1+exp(-x))

Can Implement Boolean Functions

- A unit can implement And, Or, and Not
- Need mapping True and False to numbers:
- e.g. True = 1.0, False= 0.0
- (Exercise) Use a step function and show how to implement various simple Boolean functions
- Combining the units, we can get any Boolean function of n variables

Can obtain logical circuits as special case

Network Structures

- Recurrent (cycles exist), more powerful as they can implement state, but harder to analyze. Examples:
- Hopfield network, symmetric connections, interesting properties, useful for implementing associative memory
- Boltzmann machines: more general, with applications in constraint satisfaction and combinatorial optimization

Network Structures

- Feedforward (no cycles), less power, easier understood
- Input units
- Hidden layers
- Output units
- Perceptron: No hidden layer, so basically correspond to one unit, also basically linear threshold functions (ltf)
- Ltf: defined by weights and threshold , value is 1 iff otherwise, 0

Perceptron Capabilities

- Quite expressive: many, but not all Boolean functions can be expressed. Examples:
- conjuncts and disjunctions, example
- more generally, can represent functions that are true if and only if at least k of the inputs are true:
- Can’t represent XOR

Representable Functions

- Perceptrons have a monotinicity property:

If a link has positive weight, activation can only increase as the corresponding input value increases (irrespective of other input values)

- Can’t represent functions where input interactions can cancel one another’s effect (e.g. XOR)

Representable Functions

- Can represent only linearly separable functions
- Geometrically: only if there is a line (plane) separating the positives from the negatives
- The good news: such functions are PAC learnable and learning algorithms exist

The Perceptron Learning Algorithm

- Example of current-best-hypothesis (CBH) search (so incremental, etc.):
- Begin with a hypothesis (a perceptron)
- Repeat over all examples several times
- Adjust weights as examples are seen
- Until all examples correctly classified or a stopping criterion reached

Method for Adjusting Weights

- One weight update possibility:
- If classification correct, don’t change
- Otherwise:
- If false negative, add input:
- If false positive, subtract input:
- Intuition: For instance, if example is positive, strengthen/increase the weights corresponding to the positive attributes of the example

Properties of the Algorithm

- In general, also apply a learning rate (see book):
- The adjustment is in the direction of minimizing error on the example
- If learning rate is appropriate and the examples are linear separable, after a finite number of iterations, the algorithm converges to a linear separator

Another Algorithm(least-sum-squares algorithm)

- Define and minimize an error function
- S is the set of examples, is the ideal function, is the linear function corresponding to the current perceptron
- Error of the perceptron (over all examples):
- Note:

Derivative of Error

- Gradient (derivative) of E:
- Take the steepest descent direction:
- is the gradient along , is the learning rate

Gradient Descent

- The algorithm: pick initial random hype (perceptron) and repeatedly compute error and modify the perceptron (take a step along the reverse of gradient)

E

Gradient direction:

Descent direction:

Properties of the algorithm

- Error function has no local minima (is quadratic)
- The algorithm is a gradient descent method to the global minimum, and will asymptotically converge
- Even if not linearly separable, can find a good (minimum error) linear classifier
- Incremental?

A Third Method

- Formulate problem in terms of a linear feasibility or linearoptimization problem
- Example: find weights such that
- Can be solved in polynomial time (output none if no solution exists, or otherwise output a solution)

Multilayer Feed-Forward Networks

- Multiple perceptrons, layered
- Example: a two-layer network with 3 inputs one output, one hidden layer (two hidden units)

output layer

inputs layer

hidden layer

Power/Expressiveness

- Can represent interactions among inputs (unlike perceptrons)
- Two layer networks can represent any Boolean function, and continuous functions (within a tolerance) as long as the number of hidden units is sufficient and appropriate activation functions used
- Learning algorithms exist, but weaker guarantees than perceptron learning algorithms

Back-Propagation

- Similar to the perceptron learning algorithm and gradient descent for perceptrons
- Problem to overcome: How to adjust internal links (how to distribute the “blame” or the error)
- Assumption: internal units use differentiable functions and nonlinear
- sigmoid functions are convenient

Back-Propagation (cont.)

- Start with a hype (network with random weights)
- Repeat until a stopping criterion is met
- For each example, compute the network output and for each unit i it’s error term
- Update each weight (weight of link going from node i to node j):

Output of unit i

Derivation

- Write the error for a single training example; as before use sum of squared error (as it’s convenient for differentiation, etc):
- Differentiate (with respect to each weight…)
- For example, we get

for weight connecting node j to output i

Properties

- Converges to a minimum, but could be a local minimum
- Could be slow to converge

(Note: Training a three node net is NP-Complete!)

- Must watch for over-fitting just as in decision trees (use validation sets, etc.)
- Network structure? Often two layers suffices, start with relatively few hidden units

Properties (cont.)

- Many variations to the basic back-propagation: e.g. use momentum
- Reduce with time (applies to perceptrons as well)

Nth update amount

a constant

NN properties

- Can handle domains with
- continuous and discrete attributes
- Many attributes
- noisy data
- Could be slow at training but fast at evaluation time
- Human understanding of what the network does could be limited

Download Presentation

Connecting to Server..