This presentation is the property of its rightful owner.
1 / 86

# Soft Computing Methods PowerPoint PPT Presentation

Soft Computing Methods. J.A. Johnson Dept. of Math and Computer Science Seminar Series February 8, 2013. Outline. Fuzzy Sets Neural Nets Rough Sets Bayesian Nets Genetic Algorithms. Fuzzy sets. Fuzzy set theory is a means of specifying how well an object satisfies a vague description.

Soft Computing Methods

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Soft Computing Methods

J.A. Johnson

Dept. of Math and Computer Science Seminar Series

February 8, 2013

### Outline

• Fuzzy Sets

• Neural Nets

• Rough Sets

• Bayesian Nets

• Genetic Algorithms

### Fuzzy sets

• Fuzzy set theory is a means of specifying how well an object satisfies a vague description.

• A fuzzy set can be defined as a set with fuzzy boundaries

• Fuzzy sets were first introduced by Zadeh (1965).

### How do we represent a fuzzy set in a computer?

First, the membership function must be determined.

### Example

• Consider the proposition "Nate is tall."

• Is the proposition true if Nate is 5' 10"?

• The linguistic term "tall" does not refer to a sharp demarcation of objects into two classes—there are degrees of tallness.

Fuzzy set theory treats Tall as a fuzzy predicate and says that the truth value of Tall(Nate) is a number between 0 and 1, rather than being either true or false.

Let A denote the fuzzy set of all tall employees and x be a member of the universe X of all employees. What would the function μA(x) look like

• μA(x) = 1 if x is definitely tall

• μA(x) = 0 if x is definitely not tall

• 0 <μA(x) <1 for borderline cases

• Classical Set

• Fuzzy Set

### Standard Fuzzy set operations

• Complement cA(x) = 1 − A(x)

• Intersection(A ∩ B)(x) = min [A(x), B(x)]

• Union(A ∪ B)(x) = max [A(x), B(x)]

### Linguistic variables and hedges

• The range of possible values of a linguistic variable represents the universe of discourse of that variable.

• A linguistic variable carries with it the concept of fuzzy set qualifiers, called hedges. Hedges are terms that modify the shape of fuzzy sets.

• For instance, the qualifier “very” performs concentration and creates a new subset.(very, extremely)

• An operation opposite to concentration is dilation. It expands the set.(More or less, somewhat)

### Representation of hedges

Hedge Mathematical Expression Graphical representation

• Fuzzy logic is not logic that is fuzzy, but logic that is used to describe fuzziness.

• Fuzzy logic deals with degrees of truth.

### Building a Fuzzy Expert System

• Specify the problem and define linguistic variables.

• Determine fuzzy sets.

• Elicit and construct fuzzy rules.

• Perform fuzzy inference.

• Evaluate and tune the system.

### References

[1]Artificial Intelligence (A Guide to Intelligent Systems) 2nd Edition by MICHAEL NEGNEVITSKY

[2]An Introduction to Fuzzy Sets by WitoldPedrycz and Fernando Gomide

[3]Fuzzy Sets and Fuzzy Logic: Theory and Applications by Bo Yuan and George J.

[4]ELEMENTARY FUZZY MATRIX THEORY AND FUZZY MODELS FOR SOCIAL SCIENTISTS by W. B. VasanthaKandasamy

[5]Wikipedia: http://en.wikipedia.org/wiki/Fuzzy_logic

[6] Wikipedia: http://en.wikipedia.org/wiki/Fuzzy

### References

• http://www.softcomputing.net/fuzzy_chapter.pdf

• http://www.cs.cmu.edu/Groups/AI/html/faqs/ai/fuzzy/part1/faq-doc-18.html

• http://www.mv.helsinki.fi/home/niskanen/zimmermann_review.pdf

• http://sawaal.ibibo.com/computers-and-technology/what-limits-fuzzy-logic-241157.html

• http://my.safaribooksonline.com/book/software-engineering-and-development/9780763776473/fuzzy-logic/limitations_of_fuzzy_systems#X2ludGVybmFsX0ZsYXNoUmVhZGVyP3htbGlkPTk3ODA3NjM3NzY0NzMvMTUy

### Thanks to

• Ding Xu

For help with researching content and preparation of overheads on Fuzzy Sets

### Artificial Neural Networks

Neuron:basic information-processing units

### Single neural network

basic information-processing units

### Active function

• The Step and Sign active function, also named hard limit functions, are mostly used in decision-making neurons.

• The Sigmoid function transforms the input, which can have any value between plus and minus infinity, into a reasonable value in the range between 0 and 1. Neurons with this function are used in the back-propagation networks.

• The Linear activation function provides an output equal to the neuron weighted input. Neurons with the linear function are often used for linear approximation.

### The Algorithm of single neural network

• Step 1: Initialization

Set initial weights w1,w2, . . . ,wnand threshold to random numbers in the range [-0.5,0.5]。

• Step 2: Activation

• Step 3: Weight training

• Step 4: Iteration

Increase iteration p by one, go back to Step 2 and repeat the process until convergence.

Weight training

### References

• http://pages.cs.wisc.edu/~bolo/shipyard/neural/local.html.

2. Stuart J. Russell and Peter Norvig. Artificial Intelligence: A Modern Approach. Prentice Hall, 2009.

3. http://www.roguewave.com/Portals/0/products/imsl-numerical-libraries/c-library/docs/6.0/stat/default.htm?turl=multilayerfeedforwardneuralnetworks.htm

4. Notes on Multilayer, Feedforward Neural Networks , Lynne E. Parker.

5.http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html#Why use neural networks

### Thanks to

• Hongming(Homer) Zuo

• Danni Ren

For help with researching content and preparation of overheads on Neural Nets

### Rough Sets

• Introduced by ZdzislawPawlak in the early 1980’s.

• Formal framework for the automated transformation of data into knowledge.

• Simplifies the search for dominant attributesin an inconsistent information table leading to derivation of shorter if-then rules.

### Inconsistent Information Table

Certain rules for examples are:

(Temperature, normal)  (Flu, no),

(Headache, yes) and (Temperature, high)  (Flu, yes),

(Headache, yes) and (Temperature, very_high)  (Flu, yes).

Uncertain (or possible) rules are:

(Temperature, high)  (Flu, yes),

(Temperature, very_high)  (Flu, yes).

### Strength of a Rule

• Weights

• Coverage:

# elements covered by rule

# elements in universe

• Support:

# positive elements covered by rule

# elements in universe

• Degree of certainty:

support x 100

coverage

### Attribute Reduction

• Which are the dominate attributes?

• How do we determine redundant attributes?

### Indiscernibility Classes

• An indiscernibility class, with respect to set of attributes X, is defined as a set of examples all of whose values for attributes x Є X agree

• For example, the indiscernibility classes with respect to attributes X = {Headache, Temperature} are {e1}, {e2}, {e3}, {e4}, {e5, e7} and {e6, e8}

Defined by a lower approximation and an upper approximation

The lower approximation is

X = i xi

The upper approximation is

X= (i x) i

e5

e8

Lower and upper

approximations

of set X

upper approximation of X

Set X

lower approximation of X

e4

e7

e6

e1

e2

e3

If the indiscernibility classes with and without attribute A are identical then attribute A is redundant.

Set X

## Example:Identifying Edible Mushrooms with ILA Algorithm

### Mushroom Dataset

Dataset contains 8124 entries of different mushrooms

Each entry (mushroom) has 22 different attributes

Cap-shape

Cap-surface

Cap-color

Bruises

Odor

Gill-attachment

Gill-spacing

Gill-size

Gill-color

Stalk-shape

Stalk-root

Stalk-surface-above-ring

Stalk-surface-below-ring

Stalk-color-above-ring

Stalk-color-below-ring

Veil-type

Veil-color

Ring-number

Ring-type

Spore-print-color

Population

Habitat

almond

anise

creosote

fishy

foul

musty

none

pungent

spicy

### Soft Values for Attributes

One of the attributes chosen is odor

All the possible values are

### Example of the dataset

• p,x,s,n,t,p,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u

• e,x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g

• e,b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m

• p,x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u

• e,x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g

### ILA Algorithm

• Was invented by Mehmed R. Tolun and Saleh M. Abu-Soud

• It is used for data mining

• Runs in a stepwise forward iteration

• Searches for a description that covers a relatively large number of data

• Outputs IF-THEN rules

### General Requirements:

• Examples are listed in a tabular form where each row corresponds to an example and each column contains attribute values.

• A set of m training examples each example composed of k attributes and a class attribute with n possible decisions.

• A rule set R with an initial value of Ø

• All rows in the table are initially unmarked

### ILA Algorithm Steps

• Step 1: Partition the table containing m examples into n sub-tables. One table for each possible value of the class attribute.

( Steps 2 through 8 are repeated for each sub-table )

• Step 2: Initialize attribute combination count j as j = 1.

• Step 3: For the sub-table under consideration, divide the attribute list into distinct combinations, each combination with j distinct attributes.

### ILA Algorithm Steps

• Step 4: For each combination of attributes, count the number of occurrences of attribute values that appear under the same combination of attributes in unmarked rows of the sub-table under consideration but at the same time that should not appear under the same combination of attributes of other sub-tables. Call the first combination with the maximum number of occurrences as max-combination.

### ILA Algorithm Steps

• Step 5: If max-combination = Ø

• Step 6: Mark all rows of the sub-table under consideration, in which the values of max-combination appear, as classified.

• Step 7: Add a rule to R whose left hand side comprise attribute names of max-combination with their values separated by AND operator(s) and its right hand side contains the decision attribute value associated with the sub-table.

• Step 8: If all rows are marked as classified, then move on to process another sub-table and go to Step 3. Otherwise (i.e., if there are still unmarked rows) go to Step 4. If no sub-tables are available, exit with the set of rules obtained so far.

25 Rules (first 12 Rules)

If stalk-color-above-ring=gray then edible.

If odor=almond then edible.

If odor=anise then edible.

If population=abundant then edible.

If stalk-color-below-ring=gray then edible.

If habitat=waste then edible.

If stalk-color-above-ring=orange then edible.

If population=numerous then edible.

If ring-type=flaring then edible.

If cap-shape=sunken then edible.

If spore-print-color=black and odor=none then edible.

If spore-print-color=brown and odor=none then edible.

RuleNo TP FN Error

1- 576 0 0.0

2- 400 0 0.0

3- 400 0 0.0

4- 384 0 0.0

5- 384 0 0.0

6- 192 0 0.0

7- 192 0 0.0

8- 144 0 0.0

9- 48 0 0.0

10- 32 0 0.0

11- 608 0 0.0

12- 608 0 0.0

### Output of the ILA algoritm

25 Rules (Remaining 13 rules)

If stalk-color-below-ring=brown and gill-spacing=crowded then edible.

If spore-print-color=white and ring-number=two then edible.

If odor=foul then poisonous.

If gill-color=buff then poisonous.

If odor=pungent then poisonous.

If odor=creosote then poisonous.

If spore-print-color=green then poisonous.

If odor=musty then poisonous.

If stalk-color-below-ring=yellow then poisonous.

If cap-surface=grooves then poisonous.

If cap-shape=conical then poisonous.

If stalk-surface-above-ring=silky and gill-spacing=close then poisonous.

If population=clustered and cap-color=white then poisonous.

RuleNo TP FN Error

13- 48 0 0.0

14- 192 0 0.0

15- 2160 0 0.0

16- 1152 0 0.0

17- 256 0 0.0

18- 192 0 0.0

19- 72 0 0.0

20- 36 0 0.0

21- 24 0 0.0

22- 4 0 0.0

23- 1 0 0.0

24- 16 0 0.0

25- 3 0 0.0

### Introduction to Bayesian Networks

• A probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph (DAG).

• Nodes, which are not connected, represent variables which are conditionally independent of each other.

### Introduction to Bayesian Networks

• Each node is associated with a probability function that takes as input a particular set of values for the node's parent variables and gives the probability of the variable represented by the node.

• If the parents are m Boolean variables then the probability function could be represented by a table of 2m entries, one entry for each of the 2m possible combinations of its parents being true or false.

### Example

• Suppose there are two events which could cause grass to be wet: either the sprinkler is on or it's raining. Also, suppose that the rain has a direct effect on the use of the sprinkler (namely that when it rains, the sprinkler is usually not turned on). Then the situation can be modeled with a Bayesian network . All three variables have two possible values, T (for true) and F (for false).

The joint probability function is:

P(G,S,R) = P(G | S,R)P(S | R)P(R)

where the names of the variables have been abbreviated to G = Grass wet, S = Sprinkler, and R = Rain.

• The model can answer questions like "What is the probability that it is raining, given the grass is wet?“

• By using the conditional probability formula and summing over all nuisance variables:

### Applications

• Biology and bioinformatics (gene regulatory networks, protein structure, gene expression analysis).

• Medicine.

• Document classification.

• Information retrieval.

• Image processing.

• Data fusion.

• Decision support systems.

• Engineering.

• Gaming.

• Law.

### Reference

[1] "Bayesian Probability Theory" in George F. Luger, William A. Stubbleeld, "Artificial Intelligence: Structures and Strategies for Complex Problem Solving", Second Edition, The Benjamin/Cummings Publishing Company, Inc., ISBN 0-8053-4780-1.

[2] "Bayesian Reasoning" in Michael Negnevitsky, "Artificial Intelligence: A Guide to Intelligent Systems", Third Edition, Pearson Education Limited, ISBN 978-1-4082-2574-5.

[3] "Bayesian Network" in http://en.wikipedia.org/wiki/Bayesian_network.

[4] "Probabilistic Graphical Model" in http://en.wikipedia.org/wiki/Graphical_model.

[5] "Random Variables" in http://en.wikipedia.org/wiki/Random_variables.

[6] "Conditional Independence" in http://en.wikipedia.org/wiki/Conditional_independence.

### Reference

[7] "Directed Acyclic Graph" in http://en.wikipedia.org/wiki/Directed_acyclic_graph.

[8] "Inference" in http://en.wikipedia.org/wiki/Inference.

[9] "Machine Learning" in http://en.wikipedia.org/wiki/Machine_learning.

[10] "History" in http://en.wikipedia.org/wiki/Bayesian_network.

[11] "Example" in http://en.wikipedia.org/wiki/Bayesian_network.

[12] "Applications" in http://en.wikipedia.org/wiki/Bayesian_network.

[13] "A simple Bayesian Network" figure in http://en.wikipedia.org/wiki/File:SimpleBayesNet.svg.

### Reference

[14] "Representation" in http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html#repr.

[15] "Conditional Independence in Bayes Nets" in http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html#repr.

[16] "Representation Example" figure in http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html#repr.

[17] "Conditional Independence" figure in http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html#repr.

[18] "Inference and Learning" in http://en.wikipedia.org/wiki/Bayesian_network.

[19] "Decision Theory" in http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html#repr.

### Thanks to

• Sheikh ShushmitaJahan

For help with researching content and preparation of overheads on Bayesean Nets

### Genetic Algorithms

• Use random numbers to search for near-optimal solutions.

• Use a process similar to the Theory of Evolution by Natural Selection proposed by Charles Darwin in his book On The Origin of Species.

• Apply the same rules as Natural Selection in order to find near-optimal solutions.

• An initial population of candidate solutions is generated,

• the fitness of each solution is evaluated.

• the most-fit solutions are chosen to reproduce.

### Candidate Solutions

• An array of bytes:

• 00010101 00111010 11110000

• May be converted to string representation

• ### FITNESS FUNCTION

• May be an integer representation (or score)

• There should be a preset maximum or minimum score (to help with termination)

• One of the bigger challenges of designing a genetic algorithm

### Crossover

• An operation which is analogous to biological reproduction, in which parts of parent solutions are combined in order to produce offspring solutions.

• Typically, a single crossover point is chosen and the data beyond it are swapped in the children.

### Mutation

• An operation aimed at including diversity into successive generations of solutions.

• A mutation takes an existing solution to a problem and alters it in some way before including it in the next generation.

• Using crossover points and mutation factors, offspring solutions are produced and added to the population.

• This procedure is repeated until a termination condition is reached (eg. sufficient fitness, time limit exceeded)

### Initialization

• The creation of an initial population of solutions

• Random bytes or strings are generated:

solutions = new array(size)

for (i = 0; i < size; i++)

new solution

solution.value = random bytes or strings

solution.fitness = 0

endfor

### Selection

Individual solutions are measured against the fitness function, and marked for either reproduction or removal

### Selection Cont.

for (i = 0; i < size; i++)

solutions[i].fitness = fitnessFunction(i)

endfor

next = new array(maxSolutionsPerGeneration)

fittest = solutions[0]

for (i = 0; i < maxSolutionsPerGeneration; i++)

for (j = 0; j < size; j++)

if (fittest.fitness < solutions[j].fitness)

fittest = solutions[j]

endif

endfor

next[i] = fittest

endfor

solutions = next

### Overall Algorithm

initial population

fitness function on individual solutions of initial population

average fitness of all solutions

loop (until terminating condition)

select x solutions for reproduction

combine pairs randomly

mutate

evaluate fitness

determine average fitness

end loop

### Thanks to

• Devon Noel de Tilly

• Tyler Chamberland

For help with researching content and preparation of overheads on Genetic Algorithms.

### Hybridization (FS/NN)

Fuzzy systems lack the capabilities of machine learning , as well as neural network-type memory and pattern recognition, therefore, hybrid systems(eg, neurofuzzy systems) are becoming more popular for specific applications.

### Hybridization (RS/NN)

Rough sets paradigm permits reduction of the number of inputs for a neural network as well as assists with the assignment of initial weights that are likely to cause the NN to converge more quickly.