Memory approaches to improve multi-start constructive heuristics

Celso C. Ribeiro Universidade Federal Fluminense, Brazil Memory approaches to improve multi-start constructive heuristics Joint work with Eraldo Fernandes (M.Sc., PUC-Rio, Brazil) WEA’2005 – IV Workshop on Experimental and Efficient Algorithms Santorini, May 2005

Summary • Application: DNA sequencing • Motivation: sequencing by hybridization • Multi-start randomized constructive heuristic • Adaptive memory strategy • Vocabulary building • Complete heuristic: MS+MEM+VB • Computational experiments • Numerical results and comparisons • Concluding remarks Memory approaches to improve multi-start constructive heuristics

DNA sequencing • DNA molecule: sequence formed by a combination of four different nucleotide bases - A, C, G, and T • Each DNA molecule may be represented as a word over the alphabet {A,C,G,T} of nucleotide bases • Example: ATAGGCAGGA • Sequencing: identification of the contents of a DNA molecule • Gel electrophoresis • Chemical method Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization • SBH: alternative approach to DNA sequencing • Two phases: • Biochemical: hybridization experiment involving a DNA array and the target molecule to be sequenced • Computational: reconstruction problem using the results of the hybridization experiment Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization • DNA array: • Bidimensional grid • Each cell contains a probe: small sequence of q nucleotides • Library C(q): set of all 4q probes of size q in the array • Hybridization experiment: • Array is introduced into a solution containing many copies of the target sequence • A copy of the target sequence reacts with a probe if the latter is a subsequence (of the complement) of the former • Spectrum: set of all probes of size q that reacted with the target sequence, i.e., subsequences of size q that appear in the target Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization Library C(4): Target sequence: ATAGGCAGGA Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization Library C(4): Target sequence: ATAGGCAGGA Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA} Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization • Reconstruction problem: • Second phase: reconstruction of the target sequence from the spectrum • Find a sequence of the probes in the spectrum such that consecutive probes have q-1 bases of superposition • Hamiltonian path problem on the spectrum: • One vertex for each probe u in the spectrum • Arc (u,v) from probe uto vif the last q-1 bases of u coincide with the first q-1 bases of v ATAG TAGG AGGC GGCA GCAG CAGG AGGA ATAG TAGG AGGC GGCA GCAG CAGG AGGAATAGGCAGGA Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA} TAGG AGGC ATAG GGCA AGGA CAGG GCAG Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA} ATAG TAGG AGGC GGCA GCAG CAGG AGGAATAGGCAGGA TAGG AGGC ATAG GGCA AGGA CAGG GCAG Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization • Hybridization errors: • Hybridization experiment is not perfect • False positives: probes that appear in the spectrum but not in the target sequence • False negatives: probes that occur in the target sequence but not in the spectrum ATAG TAGG AGGC ----GCAG CAGG AGGAATAGGCAGGA Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization • Problem of sequencing by hybridization (PSBH): given the spectrum S = {s1, s2, ..., sm}, the size q of the probes, the length n, and the first probe s0of the target sequence, find a sequence with size smaller than or equal to n with a maximum number of probes. • PSBH is NP-hard (Blazewicz et al., 1999) Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization • Directed graph G = (V,E) • V = S (probes in the spectrum) • E = {(u,v): uS and vS} • Superposition o(u,v) between two probes u,vS: size of the largest sequence that is both a suffix of u and a prefix of v • Weight w(u,v) of the arc (u,v): Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) TAGG AGGC ATAG GGCG: false positive GGCA: false negative GGCG AGGA CAGG GCAG Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) TAGG 1 AGGC 1 1 3 ATAG 1 2 GGCG: false positive GGCA: false negative GGCG AGGA 3 1 CAGG GCAG 1 1 Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization • Feasible solutions: acyclic paths in G emanating from vertex s0 with weight less than or equal to n-q • A path in G is a sequence a = (a1, a2, ..., ak) of probes ai S, i {1, 2, ..., k} • An optimal solution visits a maximum number of vertices and respects the above constraints • Heuristics: ant colony, tabu search, genetic algorithm • This work: multi-start constructive heuristic with a memory-based strategy Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) TAGG 1 AGGC 1 1 3 ATAG 1 2 GGCG: false positive GGCA: false negative GGCG AGGA 3 1 CAGG GCAG 1 1 Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) ATAG TAGG AGGC ----GCAG CAGG AGGAATAGGCAGGA TAGG 1 AGGC 1 1 3 ATAG 1 2 GGCG AGGA 3 1 CAGG GCAG 1 1 GGCG: false positive GGCA: false negative Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

Multi-start randomized constructive heuristic • Iteratively builds multiple solutions using a randomized constructive algorithm • Randomized constructive algorithm builds a different solution at each run • Returns the best solution found • Initial solution formed by a unique probe: a = (s0) • Current partial solution (path) is extended at each iteration by the insertion of a new probe at the end Memory approaches to improve multi-start constructive heuristics

greediness Multi-start randomized constructive heuristic • Current partial solution (path) is extended at each iteration by the insertion of a new probe at the end • Probe to be inserted is probabilistically selected from a restricted candidate list (RCL) • S(a): probes in the current partial solution a • u: last probe in the current path • RCL = {v  S\S(a): o(u,v) ≥ (1-).max tS\S(a) o(u,t) and w(a) + w(u,v)  n-q} • Randomly select a probe v from RCL with probability p(u,v) = (1/w(u,v))/ΣtS\S(a) (1/w(u,t)) Memory approaches to improve multi-start constructive heuristics

Adaptive memory strategy • Application to QAP: Fleurent and Glover, 1999 • Pool Q of elite solutions(best solutions found): diversity • Intensification strategy for the constructive algorithm • Makes use of two kinds of information in the construction: superposition between the probes and frequency of the arcs in the elite solutions • Parameter  used to balance the weights of the two terms: greediness (superposition) and frequency (memory) Memory approaches to improve multi-start constructive heuristics

greediness frequency Adaptive memory strategy higher when the superposition between probes u and v is larger higher for arcs (u,v) appearing more often in the solutions of the elite set Probability p(u,v) of selecting a probe v from the RCL to extend the current partial solution whose last probe is u: Memory approaches to improve multi-start constructive heuristics

Adaptive memory strategy • Pool update: • Pool size: at most q solutions • Solution a is a candidate to be inserted into the pool Q if it is better than the worst solution currently in the pool, i.e., |a| > min a’Q|a’| • Candidate solution a replaces the worst solution in the pool if it is better than the best solution in the pool (|a| > max a’Q|a’|) or if it is sufficiently different from every other solution in the pool (min a’Q dist(a,a’) ≥ dmin) Memory approaches to improve multi-start constructive heuristics

Vocabulary building • Good solutions are very often formed by the same building blocks (paths) • Optimal solutions formed by components appearing in suboptimal solutions • Identify short paths with optimal superposition and combine them to build optimal solutions • Vocabulary building: Glover and Laguna, 1997 • Find common paths appearing in good solutions (words) • Combine them into new good solutions (phrases) Memory approaches to improve multi-start constructive heuristics

Vocabulary building • Solutions encoded as adjacency vectors • Solution a = (a1,a2,...,ak) represented as a vector x = x1,x2,...,x|S| • If xu = s, then probe s follows immediately after probe u, i.e., the arc (u,s) is used in the path 1 2 a= (1,4,2,3,5) 6 3 5 4 Memory approaches to improve multi-start constructive heuristics

Vocabulary building • Some notation: • Set X of adjacency vectors • Size(x): number of arcs in the adjacency vector x • Inter(X): subset of arcs that appear in all vectors in X • Enclosure(y,X): set formed by all vectors in X that contain the arcs in the adjacency vector y Memory approaches to improve multi-start constructive heuristics

3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 Inter(x1,x2): Memory approaches to improve multi-start constructive heuristics

3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 3 4 2 5 Inter(x1,x2): 1 6 8 7 Memory approaches to improve multi-start constructive heuristics

Vocabulary building • Some notation: • Set X of adjacency vectors • Size(x): number of arcs in the adjacency vector x • Inter(X): subset of arcs that appear in all vectors in X • Enclosure(y,X): set formed by all vectors in X that contain the arcs in the adjacency vector y • Find words: given an elite set X, find vectors y with |Enclosure(y,X)| as large as possible and Size(y) ≥ smin (non-elementary small words), where smin is a parameter Memory approaches to improve multi-start constructive heuristics

Vocabulary building • Algorithm FindWords(X,smin): Y  , X’  X while X’  do x  rand(X’), Z  {x}, X’’  X - {x} while X’’  do x  rand(X’’) if Size(Inter(Z{x})) ≥ smin then Z  Z  {x} X’’  X’’ - {x}; end-while if |Z| > 1 then y  Inter(Z); Y  Y  {y} X’  X’ – Z end-while return Y Martins and Plastino, 2005: more effective algorithm based on data mining strategies Memory approaches to improve multi-start constructive heuristics

Vocabulary building • Additional notation: • x and y: adjacency vectors • ExtInter(x,y): undefined variables in one of the vectors are filled with the corresponding defined variables in the other Memory approaches to improve multi-start constructive heuristics

3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 ExtInter(x1,x2): Memory approaches to improve multi-start constructive heuristics

3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 3 4 2 5 ExtInter(x1,x2): 1 6 8 7 Memory approaches to improve multi-start constructive heuristics

Vocabulary building • Additional notation: • x and y: adjacency vectors • ExtInter(x,y): undefined variables in one of the vectors are filled with the corresponding defined variables in the other • Combine words: given a set of words Y, combine them into phrases • Very similar to the algorithm that finds words, replacing the original operator Inter by the new operator ExtInter Memory approaches to improve multi-start constructive heuristics

Vocabulary building • Algorithm CombineWords(Y): Z  , Y’  Y while Y’  do y  rand(Y’), W  {y}, Y’’  Y - {y} while Y’’  do y  rand(Y’’) if MaxInDegree(ExtInter(W,y)) = 1 then W  W  {y} Y’’  Y’’ - {y}; end-while if |W| > 1 then z  ExtInter(W); Z  Z  {z} Y’  Y’ – W end-while return Z Memory approaches to improve multi-start constructive heuristics

Vocabulary building • Combine words: given a set of words Y, combine them into phrases • Very similar to the algorithm that finds words, replacing the original operator Inter by the new operator ExtInter • Phrases may be incomplete or unfeasible • Make feasible the unfeasible phrases (solutions) • Insert probe s0 in the best place in case it does not appear in the phrase • Complete the solution joining subpaths of the phrase Memory approaches to improve multi-start constructive heuristics

Vocabulary building • Algorithm VocabularyBuilding(X,smin): Y  FindWords(X,smin) Z  CombineWords(Y) A   for each z  Z do a  MakeFeasible(z) A  A  {a} end-for return A Memory approaches to improve multi-start constructive heuristics

Complete heuristic: MS+MEM+VB Q: pool of elite solutions for adaptive memory X: pool of elite solutions for vocabulary building |X|>>|Q| • Algorithm MS+MEM+VB: Q, X ; a*  null for i = 1, ..., MAXITER a  GreedyRandomizedMemory(Q, ) if |a| > |a*| then a*  a update weight  and use a to update pools Q and X if i mod(nVB) = 0 then A  VocabularyBuilding(X,smin) for every a  A do use a to update pools Q and X and if |a| > |a*|then a*  a end-for end-for return a* Memory approaches to improve multi-start constructive heuristics

Computational experiments • Conditions: • Pentium 2.4 GHz with 512 M of RAM memory • Linux 10.0 with kernel 2.6.3 • Codes in ANSI C++ compiled with GNU compiler version 3.3.2 • Instances: • set A: instances generated from real human DNA sequences obtained from GenBank • set R: instances randomly generated Memory approaches to improve multi-start constructive heuristics

Computational experiments • Instances A: • Origin: 40 GenBank sequences • Five smaller sequences are generated from each original sequence, corresponding to their prefixes of size n = 109, 209, 309, 409, 509 • For each of them, we consider its ideal spectrum, with size resp. equal to 100, 200, 300, 400, 500, using an array with probes of size q = 10 • Total: 200 instances • 20% of false negatives and 20% of false positives generated for each instance (probe s0 appears in all of them, no repetitions) Memory approaches to improve multi-start constructive heuristics

Computational experiments • Instances R: • Origin: 100 random sequences • Ten smaller sequences are generated from each original sequence, corresponding to their prefixes of size n = 100, 200, ..., 1000 • For each of them, we consider its ideal spectrum, with size resp. equal to 92, 192, ..., 992, using an array with probes of size q = 7 • Total: 1000 instances • 20% of false negatives and 20% of false positives generated for each instance (probe s0 appears in all of them, no repetitions) Memory approaches to improve multi-start constructive heuristics

Computational experiments • Solution quality evaluation: • Number of probes in the solution: |a| • Similarity with the target sequence: • Perform the alignment between the solution and the target sequence (matches: +1, missmatches: -1) to compute the value align((a),*) by dynamic programming • Compute similarity(a) = 100.(align((a),*)+nmax)/(2.nmax), with nmax = max{|(a)|,|*|} • Fraction: • fraction(a) = 100.|a|/|a*| Memory approaches to improve multi-start constructive heuristics

Computational experiments • Random instances in set R used for parameter seting and tuning • Weight  decreases with the iteration counter • Small values of  are used in the beginning, so as that purely greedy solutions are generated when no frequency information is available • Initial value of  decreases with the problem size • MAXITER = 10.n (iterations) • Parameters  and  are updated after blocks of n/2 iterations Memory approaches to improve multi-start constructive heuristics

MS+MEM+VB MS Numerical results Average similarity with the target sequence over all R instances with the same size Each additional component (memory, VB) improves the multi-start heuristic Memory approaches to improve multi-start constructive heuristics

Numerical results Average computation time over all R instances with the same size Memory approaches to improve multi-start constructive heuristics

Memory approaches to improve multi-start constructive heuristics

Memory approaches to improve multi-start constructive heuristics

Presentation Transcript

Approaches to Improve Housing Codes

Multi-Cultural Approaches to Program Evaluation

Multi-Faceted Approaches to Energy Education

Improve your memory!

APPROACHES TO THE BIOLOGY OF MEMORY

Multi-Tiered Approaches to Intervention

How to improve your memory

Constructive approaches to management of conflict

Approaches to Multi-Homing for IPv6

Improve Your Memory

How to Improve memory

Innovative Approaches to Improve Your SEO

Innovative Approaches To Improve Your Php.

How To Improve Your Memory, Ways To Improve Your Memory, How To Increase Your Memory Power

How to improve memory power

Best Techniques To Improve Your Memory

Techniques to improve your Memory

Approaches to Multi-Homing for IPv6

Ways To Improve Memory

Improve Visual Memory