SA, GA and GSA in Fuzzy Systems

SA, GA and GSA in Fuzzy Systems Supervisor: Prof. Ho Cheng-Seen Presented by: Irfan Subakti 司馬伊凡 (M9215801) EE601-2 NTUST, February 9th 2004

Fields of Artificial Intelligent (AI)

Simulated Annealing (SA) • SA is stochastic iterative improvement methods for solving combinatorial optimization problems. • SA generates a single sequence of solutions and searches for an optimum solution along this search path. • SA starts with a given initial solution x0. • At each step, SA generates a candidate solution x’ by changing a small fraction of a current solution x. • SA accepts the candidate solution as a new solution with a probability min {1, e-f/T}, where f = f(x’) – f(x) is cost reduction from the current solution x to the candidate solution x’, and T is a control parameter called temperature. • A key point of SA is that SA accepts up-hill moves with the probability e-f/T. • This allows SA to escape from local minima. • But SA cannot cover a large region of the solution space within a limited computation time because SA is based on small moves.

Simulated Annealing (SA) (continue) • Pseudo-code of Simulated Annealing (SA) (Koakutsu et al. [20]) SA_algorithm(Na, T0, ) { x  x0; /* initial solution */ T  T0; /* initial temperature */ while (system is not frozen) { for (loop = 1; loop  Na; loop++) { x’  Mutate(x); f  f(x’) – f(x); r  random number between 0 and 1 if (f < 0 or r < exp(-f/T)) x  x’; } T  T * /* lower temperature */ } return x }

Genetic Algorithms (GA) • GA is another approach for solving combinatorial optimization problems. • GA applies an evolutionary mechanism to optimization problems. • It starts with a population of initial solutions. • Each solution has a fitness value which is a measure of the quality of solutions. • At each step, called a generation, GA produces a set of candidate solutions, called child solutions (offspring), using two types of genetic operators: mutation and crossover. • It selects good solutions as survivors to the next generation according to the fitness value. • The mutation operator takes a single parent and modifies it randomly in a localized manner, so that it makes a small jump in the solution space. • On the other hand, the crossover operator takes 2 solutions as parents and creates their child solutions by combining the partial solutions of the parents. • Crossover tends to create child solutions which differs from both parent solutions. • It results in a large jump in the solution space.

Genetic Algorithms (GA) (continue) • There are 2 key differences between GA and SA. • GA maintains a population of solutions and uses them to search the solution space. • GA uses the crossover operator which causes a large jump in the solution space. • Features 2 allows GA to globally search a large region of the solution space. But GA has no explicit ways to produce a sequence of small moves in the solution space. • Mutation creates a single small move one at a time instead of a sequence of small moves. • As the result GA cannot search local region on the solution space exhaustively.

Genetic Algorithms (GA) (continue)

Genetic Algorithms (GA) (continue) • Pseudo-code of Genetic Algorithms (GA) (Koakutsu et al. [20]) GA_algorithm(L, Rc, Rm) { X  {x1, ..., xL}; /* initial population */ while (stop criterion is not met) { X’ ; while (number of children created < L  Rc) { select two solution, xi,xj from X x’  Crossover(xi, xj); X’  X’ + {x’}; } select L solutions from X  X’ as a new population while (number of solutions mutated < L  Rm) { select one solution xk from X xk Mutate(xk); } } return the best solution in X }

Genetic Simulated Annealing (GSA) • In order to improve the performance of GA and SA, several hybrid algorithms have been proposed. • Mutation used in GA tends to destroy some good features of solutions at the final stages of optimization process. SA-based • While Sigrag and Weisser [10] proposed a thermodynamic genetic operator, which incorporates an annealing schedule to control the probability of applying the mutation, Adler [11] used a SA-based acceptance function to control the probability of accepting a new solution produced by the mutation. GA-based • More recent works on GA-oriented hybrids are the Simulated Annealing Genetic Algorithm (SAGA) method proposed by Brown et al. [12] and Annealing Genetic (AG) method proposed by Lin et al. [13]. • Both methods divide each “generation” into 2 phases: GA phase and SA phase. • GA generates a set of new solutions using the crossover operator and then SA further refines each solution in the population. • While SAGA uses the same annealing schedule for each SA phase, AG tries to optimize different schedules for different SA phases.

GSA (continue) • The above GA-oriented hybrids methods try to incorporate the local stochastic hill climbing features of SA into GA. • Since they incorporate full SA into each generation and the number of generations is usually very large, GA-oriented hybrid methods are very time-consuming. • SA-oriented hybrid approaches, attempts to adopt the global crossover operations of GA into SA. • Parallel Genetic Simulated Annealing (PGSA) [14, 15], is a parallel version of SA incorporating GA features. • During parallel SA-based search, crossover is used to generate new solutions in order to enlarge the search region of SA. • GSA proposed by Koakutsu et al. [20]. • While PGSA generates the seeds of SA local search in parallel, that is the order of applying each SA local search is independent, GSA generates the seeds of SA sequentially, that is the seeds of a SA local search depends of the best-so-far solutions of all previous SA local searches. • This sequentially approach seems to generate better child solutions. • In addition, compared to PGSA, GSA uses fewer crossover operations since it only uses crossover operations when the SA local search reaches a flat surface and it is time to jump in the solution space.

GSA (continue) • GSA starts with a population X = {x1, …, XNp} and repeatedly applies 3 operations: SA-based local search, GA-based crossover operation, and population update. • SA-based local search produces a candidate solution x’ by changing a small fraction of the state of x. • The candidate solution is accepted as the new solution with probability min{1, e-f/T}. • GSA preserves the local best-so-far solution x*L during the SA-based local search. • When the search reaches a flat surface or the system is frozen, GSA produces a large jump in the solution space by using GA-based crossover. • GSA picks up a pair of parent solutions xj and xk at random from the population X such that f(xj)  f(xk), applies crossover operator, and then replace the worst solution xi by the new solution produced by the crossover operator. • At the end of each SA-based local search, GSA updates the population by replacing the current solution xiby the local best-so-far solution x*L. • GSA terminates when the CPU time reaches given limit, and reports the global best-so-far solution x*G.

GSA (continue) • Pseudo-code of GSA (Koakutsu et al. [20]) GSA_algorithm(Np, Na, T0, ) { X  {x1, ..., XNp}; /* initialize population */ x*L the best solution among X; /* initialize local best-so-far */ x*G x*L/* initialize global best-so-far */ while (not reach CPU time limit) { T  T0; /* initialize temperature */ /* jump */ select the worst solution xi from X; select two solutions xj, xk from X such that f(xj)  f(xk); xi Crossover(xj, xk); /* SA-based local search */ while (not frozen or not meet stopping criterion) { for (loop = 1; loop  Na; loop++) { x’  Mutate(xi); f  f(x’) – f(xi); r  random number between 0 and 1 if (f < 0 or r < exp(-f/T)) xi x’; if (f(xi) < f(x*L)) x*L xi; /* update local best-so-far */ } T  T * /* lower temperature */ } if (f(x*L) < f(x*G)) x*G x*L; /* update global best-so-far */ /* update population */ xi x*L; f(x*L)  +; /* reset current local best-so-far */ } return x*G; }

GSA to Estimate Null Values in Generating Weighted Fuzzy Rules from Relational Database Systems and It’s Estimating on Multiple Null Values Basic Concepts of Fuzzy Sets • A fuzzy subset A of the universe of discourse U can be represented as follows: A = A (u1) / u1 + A (u2) / u2 + … + A (un) / un (1) • Where A is the membership function of the fuzzy subset A, A: U[0,1], and A (ui) indicate the grade of membership of ui in the fuzzy set A. If U is continuous set, then the fuzzy subset A can be represented as follows: A = (2) Part 1: Estimating Null Values in Relational Database Systems With GSA • In [1], Chen et al. described how to estimate null values in relational database systems with Genetic Algorithms. • A linguistic term can be represented by a fuzzy set represented by a membership function. In that paper, the membership functions of the linguistic terms “L”, “SL”, “M”, “SH”, and “H” of the attributes “Salary” and “Experience” in relational database system are adopted from [6].

… continue • Here is relation in relational database [4], [6] • Degree of similarity between the values of the attribute “Degree” listed below:

… continue

… continue • After fuzzification:

… continue • Degree of similarity between two nonnumeric values listed below: Rank(Bachelor) = 1 Rank(Master) = 2 Rank(Ph.D.) = 3 • Let X is a nonnumeric attribute. Based on the value Ti.X of the attribute X of tuple Ti and the value Tj.X of the attribute X of tuple Tj, where i  j, the degree of closeness Closeness(Ti, Tj) between tuples Ti and Tj can be calculated by (8) or (9), where Weight(Tj.Degree) and Weight(Tj.Experience) denote the weights of the attributes “Degree” and “Experience”, respectively, obtained from the fuzzified values of the attributes “Degree” and “Experience” of tuple Tj, derived from a chromosome If Rank(Ti.X)  Rank(Tj.X) then Closeness(Ti, Tj) = Similarity(Ti.X, Tj.X)  Weight(Tj.Degree) +  Weight(Tj.Experience) (8) If Rank(Ti.X) < Rank(Tj.X) then Closeness(Ti, Tj) = 1/Similarity(Ti.X, Tj.X)  Weight(Tj.Degree) +  Weight(Tj.Experience) (9) • where Similarity(Ti.X, Tj.X) denotes the degree of similarity between Ti.X and Tj.X, and its value is obtained from a fuzzy similarity matrix of the linguistic terms of the attribute X defined by a domain expert.

… continue • Estimated value “ETi.Salary” of the attribute “Salary” of tuple Ti as follows: ETi.Salary = Ti.Salary  Closeness (Ti, Tj) (10) • Estimated error of each tuple by (11), where Errori denotes the estimated error between the estimated value ETi.Salary of the attribute “Salary” of tuple Ti and the actual value Ti.Salary of the attribute “Salary” of tuple Ti Errori = (11) • Let Avg_Error denote the average estimated error of the tuples based on the combination of weights of the attributes derived from the chromosome, where Avg_Error = (12) • Then, we can obtain the fitness degree of this chromosome as follows: Fitness Degree = 1 – Avg_Error (13)

… continue • Here is a table where 1 value is null value, and with GSA we try to obtained this value from other value together with formulas described above.

EvaluationAndBestSelection {find the best solution among population. Also it initializes LocalBestChromosomeSoFar and GlobalBestChromosomeSoFar: X  {x1, ..., XNp}; {initialize population} x*L the best solution among X; {initialize local best-so-far} x*G x*L {initialize global best-so-far} FitnessDegreeEval  FitnessDegree from global best-so-far } for i:= 1 to number-of-generations do begin T  T0; EvaluationAndWorstSelection; {select the worst solution xi from X} CrossOver; {select two solutions xj, xk from X such that f(xj)  f(xk): xi Crossover(xj, xk); } Mutation; {update local best-so-far if value is better repeat for i:= 0 to number-of-mutation do begin f(xi)  Get Fitness Degree from chromosome before mutation x’  Mutate(xi) f(x’)  Get Fitness Degree from chromosome after mutation f  f(xi) - f(x’) r  random number between 0 and 1 ft f(x’) if (f >= 0) or (r >= exp(-f/T)) then begin xi x’; ft f(xi); end; if (ft >= FitnessDegreeEval) then begin x*L xi; {update local best-so-far} FitnessDegreeEval  ft FDLocalBestSoFar  ft {Get local best Fitness Degree} end end T  T * ; {lower temperature} until T <= FrozenValue; } CountCloseness(x*L); {get FD from LocalBestChromosomeSoFar} AvgError:= AvgError / NumData; FDLocalBestSoFar:= 1 - AvgError; CountCloseness(x*G); {get FD from GlobalBestChromosomeSoFar} AvgError:= AvgError / NumData; FDGlobalBestSoFar:= 1 - AvgError; if FDLocalBestSoFar >= FDGlobalBestSoFar then begin x*G x*L; {update global best-so-far} FitnessDegreeEval:= FDGlobalBestSoFar; end; xi x*L; {update population} end;

Procedure CountCloseness describe below: AvgError:= 0.0; for i:= 0 to NumData - 1 do begin {base on all data available} BestClosenessEval:= MaxInt; IdxClosestCloseness:= i; for j:= 0 to NumData - 1 do if i <> j then begin if Rank(Ti.X)  Rank(Tj.X) then begin ClosenessE(Ti,Tj)= Similarity(Ti.X,Tj.X)  Weight(Tj.Degree) +  Weight(Tj.Experience); end else begin {If Rank(Ti.X) < Rank(Tj.X)} ClosenessE:= 1/Similarity(Ti.X,Tj.X)  Weight(Tj.Degree) +  Weight(Tj.Experience); end; {find a tuples which is closest to 1.0 as a} {closest tuple to tuple Ti} ClosestCloseness:= Abs(1 - ClosenessE); if ClosestCloseness <= BestClosenessEval then begin BestClosenessEval:= ClosestCloseness; IdxClosestCloseness:= j; end; end; {Then we find Estimated Salary and Error for every record} {if this record was null value, so we must find} {another record that closest to 1} if IsNullValue(i) and IsNullValue(IdxClosestCloseness) then begin PreferIdx:= GetPreferIdx; ETi.Salary:= Ti. Salary  GetClosenessValue(PreferIdx); if Tprefer-index.Salary <> 0 then Errori:= end else begin ETi.Salary:= Ti. Salary  GetClosenessValue(IdxClosestCloseness); if Ti.Salary <> 0 then Errori:= end; AvgError:= AvgError + Abs(Errori); end;

… continue • Function GetClosenessValue describe below: function GetClosenessValue(Idx) Result  find value in ClosenessE which have the same index with Idx • Function GetPreferIdx describe below: function GetPreferIdx Result  find value in ClosenessE that closest to 1, and it's not null value

… continue Experiments • We run this program for different parameters, each for 10 times. • We get the results as below: Experiments type 1: • Mutation Rate = 0.01 = 1% • Initial Temperature = 100 • Alpha = 0.7 • Frozen Value = 0.00001 • Index of Null Values = 21 (It means row/tuple 22th in relational database)

… continue

… continue Experiments type 2: • Mutation Rate = 0.1 = 10% • Initial Temperature = 100 • Alpha = 0.7 • Frozen Value = 0.00001 • Index of Null Values = 21 (It means row/tuple 22th in relational database)

… continue

… continue • Summaries from experiments

… continue • Summaries from experiments (continue)

… continue • For comparing, we toke result from Chen et al. [1] • Best chromosome: 0.010 0.071 0.343 0.465 0.505 0.303 0.495 0.081 0.778 0.717 0.303 0.869 0.869 0.828 0.434 • Below is the result from with size of population: 60; number of generations: 300; Cross Over rate: 1.0; and Mutation rate: 0.2

… continue • For another running we get average estimated errors for different parameters of the GA (Chen et al. [1])

… continue • Example a result from one of above (from this research, using GSA) Size of Population: 60 Number of Generations: 300 Mutation Rate (%): 10 Initial Temperature: 100 Alpha: 0.7 Frozen Value: 1E-5 Index of Null Values: 21 Best Chromosome Gene-1 Gene-2 Gene-3 Gene-4 Gene-5 Gene-6 Gene-7 Gene-8 0.719 0.995 0.989 0.485 0.095 0.896 0.277 0.416 Gene-9 Gene-10 Gene-11 Gene-12 Gene-13 Gene-14 Gene-15 0.085 0.997 0.183 0.583 0.350 0.652 0.241

… continue • Example a result from one of above (continue) Emp. ID Degree Experience Salary Salary (Estimated) Estimated Error 1 Ph.D. 7.2 63,000 62,889.86 -0.0017482 2 Master 2.0 37,000 36,847.97 -0.0041090 3 Bachelor 7.0 40,000 40,128.33 0.0032082 4 Ph.D. 1.2 47,000 46,538.60 -0.0098170 5 Master 7.5 53,000 52,978.58 -0.0004042 6 Bachelor 1.5 26,000 25,970.00 -0.0011540 7 Bachelor 2.3 29,000 28,967.01 -0.0011375 8 Ph.D. 2.0 50,000 50,341.15 0.0068230 9 Ph.D. 3.8 54,000 53,836.28 -0.0030319 10 Bachelor 3.5 35,000 35,060.59 0.0017310 11 Master 3.5 40,000 39,876.06 -0.0030986 12 Master 3.6 41,000 40,875.72 -0.0030312 13 Master 10.0 68,000 68,087.03 0.0012798 14 Ph.D. 5.0 57,000 56,731.71 -0.0047068 15 Bachelor 5.0 36,000 36,051.19 0.0014219 16 Master 6.2 50,000 49,936.01 -0.0012798 17 Bachelor 0.5 23,000 22,940.28 -0.0025966 18 Master 7.2 55,000 54,966.66 -0.0006062 19 Master 6.5 51,000 50,945.03 -0.0010778 20 Ph.D. 7.8 65,000 64,938.81 -0.0009414 21 Master 8.1 64,000 63,933.65 -0.0010367 22 Ph.D. 8.5 70,000 70,654.72 0.0093531 Avg Estimated Error: 0.002890636946022 Time Elapsed: 3h:15m:50s:968ms • Here, we can prove that this proposes method better than method proposed by C.M. Huang [1].

GSA to Estimate Null Values in Generating Weighted Fuzzy Rules from Relational Database Systems and It’s Estimating on Multiple Null Values Part 2: Estimating Problems on Multiple Null Values • In part 1, we concerning just about how to perform • At this part, we try to estimate many values, which are null values. • Recalling procedure CountCloseness described previously in part 1, we consider part below: … … {Then we find Estimated Salary and Error for every record} {if this record was null value, so we must find} {another record that closest to 1} if IsNullValue(i) and IsNullValue(IdxClosestCloseness) then begin PreferIdx:= GetPreferIdx; ETi.Salary:= Ti. Salary  GetClosenessValue(PreferIdx); if Tprefer-index.Salary <> 0 then Errori:= end else begin ETi.Salary:= Ti. Salary  GetClosenessValue(IdxClosestCloseness); if Ti.Salary <> 0 then Errori:= end;

… continue • Here is a table where many values are null value, and with GSA we try to obtained this value from other value together with formulas described above.

… continue • Because there is a checking process with regarding to null values, so we can set one or many null values that we want to estimate. This process performs in function GetPreferIdx as described previous. • Of course as a boundary/quota, there is at least one value in column/field SALARY to estimate another (if it is a null value). Experiments • We run this program for different parameters, each for 10 times. • We get the results as below: Experiments type 1 • Size Of Population = 60 • Number of Generations = 300 • Mutation Rate = 0.01 = 1% • Initial Temperature = 100 • Alpha = 0.7 • Frozen Value = 0.00001 • Index of Null Values = 0 (It means row/tuple 1st in relational database)

… continue • Index of Null Values = 0 (It means row/tuple 1st in relational database)

… continue • Index of Null Values = 0,1 (It means row/tuple 1st and 2nd in relational database)

… continue • Index of Null Values = 0,1,2

… continue • Index of Null Values = 0,1,2,3

… continue • Index of Null Values = 0,1,2,3,4

… continue • Index of Null Values = 0,1,2,3,4,5

… continue • Index of Null Values = 0,1,2,3,4,5,6

SA, GA and GSA in Fuzzy Systems