1 / 72

Memory approaches to improve multi-start constructive heuristics

Celso C. Ribeiro Universidade Federal Fluminense, Brazil. Memory approaches to improve multi-start constructive heuristics. Joint work with Eraldo Fernandes (M.Sc., PUC-Rio, Brazil). WEA’2005 – IV Workshop on Experimental and Efficient Algorithms. Santorini, May 2005. Summary.

agrata
Download Presentation

Memory approaches to improve multi-start constructive heuristics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Celso C. Ribeiro Universidade Federal Fluminense, Brazil Memory approaches to improve multi-start constructive heuristics Joint work with Eraldo Fernandes (M.Sc., PUC-Rio, Brazil) WEA’2005 – IV Workshop on Experimental and Efficient Algorithms Santorini, May 2005

  2. Summary • Application: DNA sequencing • Motivation: sequencing by hybridization • Multi-start randomized constructive heuristic • Adaptive memory strategy • Vocabulary building • Complete heuristic: MS+MEM+VB • Computational experiments • Numerical results and comparisons • Concluding remarks Memory approaches to improve multi-start constructive heuristics

  3. DNA sequencing • DNA molecule: sequence formed by a combination of four different nucleotide bases - A, C, G, and T • Each DNA molecule may be represented as a word over the alphabet {A,C,G,T} of nucleotide bases • Example: ATAGGCAGGA • Sequencing: identification of the contents of a DNA molecule • Gel electrophoresis • Chemical method Memory approaches to improve multi-start constructive heuristics

  4. Sequencing by hybridization • SBH: alternative approach to DNA sequencing • Two phases: • Biochemical: hybridization experiment involving a DNA array and the target molecule to be sequenced • Computational: reconstruction problem using the results of the hybridization experiment Memory approaches to improve multi-start constructive heuristics

  5. Sequencing by hybridization • DNA array: • Bidimensional grid • Each cell contains a probe: small sequence of q nucleotides • Library C(q): set of all 4q probes of size q in the array • Hybridization experiment: • Array is introduced into a solution containing many copies of the target sequence • A copy of the target sequence reacts with a probe if the latter is a subsequence (of the complement) of the former • Spectrum: set of all probes of size q that reacted with the target sequence, i.e., subsequences of size q that appear in the target Memory approaches to improve multi-start constructive heuristics

  6. Sequencing by hybridization Library C(4): Target sequence: ATAGGCAGGA Memory approaches to improve multi-start constructive heuristics

  7. Sequencing by hybridization Library C(4): Target sequence: ATAGGCAGGA Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA} Memory approaches to improve multi-start constructive heuristics

  8. Sequencing by hybridization • Reconstruction problem: • Second phase: reconstruction of the target sequence from the spectrum • Find a sequence of the probes in the spectrum such that consecutive probes have q-1 bases of superposition • Hamiltonian path problem on the spectrum: • One vertex for each probe u in the spectrum • Arc (u,v) from probe uto vif the last q-1 bases of u coincide with the first q-1 bases of v ATAG TAGG AGGC GGCA GCAG CAGG AGGA ATAG TAGG AGGC GGCA GCAG CAGG AGGAATAGGCAGGA Memory approaches to improve multi-start constructive heuristics

  9. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA} TAGG AGGC ATAG GGCA AGGA CAGG GCAG Memory approaches to improve multi-start constructive heuristics

  10. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA} TAGG AGGC ATAG GGCA AGGA CAGG GCAG Memory approaches to improve multi-start constructive heuristics

  11. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA} ATAG TAGG AGGC GGCA GCAG CAGG AGGAATAGGCAGGA TAGG AGGC ATAG GGCA AGGA CAGG GCAG Memory approaches to improve multi-start constructive heuristics

  12. Sequencing by hybridization • Hybridization errors: • Hybridization experiment is not perfect • False positives: probes that appear in the spectrum but not in the target sequence • False negatives: probes that occur in the target sequence but not in the spectrum ATAG TAGG AGGC ----GCAG CAGG AGGAATAGGCAGGA Memory approaches to improve multi-start constructive heuristics

  13. Sequencing by hybridization • Problem of sequencing by hybridization (PSBH): given the spectrum S = {s1, s2, ..., sm}, the size q of the probes, the length n, and the first probe s0of the target sequence, find a sequence with size smaller than or equal to n with a maximum number of probes. • PSBH is NP-hard (Blazewicz et al., 1999) Memory approaches to improve multi-start constructive heuristics

  14. Sequencing by hybridization • Directed graph G = (V,E) • V = S (probes in the spectrum) • E = {(u,v): uS and vS} • Superposition o(u,v) between two probes u,vS: size of the largest sequence that is both a suffix of u and a prefix of v • Weight w(u,v) of the arc (u,v): Memory approaches to improve multi-start constructive heuristics

  15. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) TAGG AGGC ATAG GGCG: false positive GGCA: false negative GGCG AGGA CAGG GCAG Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

  16. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) TAGG 1 AGGC 1 1 3 ATAG 1 2 GGCG: false positive GGCA: false negative GGCG AGGA 3 1 CAGG GCAG 1 1 Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

  17. Sequencing by hybridization • Feasible solutions: acyclic paths in G emanating from vertex s0 with weight less than or equal to n-q • A path in G is a sequence a = (a1, a2, ..., ak) of probes ai S, i {1, 2, ..., k} • An optimal solution visits a maximum number of vertices and respects the above constraints • Heuristics: ant colony, tabu search, genetic algorithm • This work: multi-start constructive heuristic with a memory-based strategy Memory approaches to improve multi-start constructive heuristics

  18. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) TAGG 1 AGGC 1 1 3 ATAG 1 2 GGCG: false positive GGCA: false negative GGCG AGGA 3 1 CAGG GCAG 1 1 Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

  19. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) TAGG 1 AGGC 1 1 3 ATAG 1 2 GGCG: false positive GGCA: false negative GGCG AGGA 3 1 CAGG GCAG 1 1 Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

  20. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) ATAG TAGG AGGC ----GCAG CAGG AGGAATAGGCAGGA TAGG 1 AGGC 1 1 3 ATAG 1 2 GGCG AGGA 3 1 CAGG GCAG 1 1 GGCG: false positive GGCA: false negative Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

  21. Multi-start randomized constructive heuristic • Iteratively builds multiple solutions using a randomized constructive algorithm • Randomized constructive algorithm builds a different solution at each run • Returns the best solution found • Initial solution formed by a unique probe: a = (s0) • Current partial solution (path) is extended at each iteration by the insertion of a new probe at the end Memory approaches to improve multi-start constructive heuristics

  22. greediness Multi-start randomized constructive heuristic • Current partial solution (path) is extended at each iteration by the insertion of a new probe at the end • Probe to be inserted is probabilistically selected from a restricted candidate list (RCL) • S(a): probes in the current partial solution a • u: last probe in the current path • RCL = {v  S\S(a): o(u,v) ≥ (1-).max tS\S(a) o(u,t) and w(a) + w(u,v)  n-q} • Randomly select a probe v from RCL with probability p(u,v) = (1/w(u,v))/ΣtS\S(a) (1/w(u,t)) Memory approaches to improve multi-start constructive heuristics

  23. Adaptive memory strategy • Application to QAP: Fleurent and Glover, 1999 • Pool Q of elite solutions(best solutions found): diversity • Intensification strategy for the constructive algorithm • Makes use of two kinds of information in the construction: superposition between the probes and frequency of the arcs in the elite solutions • Parameter  used to balance the weights of the two terms: greediness (superposition) and frequency (memory) Memory approaches to improve multi-start constructive heuristics

  24. greediness frequency Adaptive memory strategy higher when the superposition between probes u and v is larger higher for arcs (u,v) appearing more often in the solutions of the elite set Probability p(u,v) of selecting a probe v from the RCL to extend the current partial solution whose last probe is u: Memory approaches to improve multi-start constructive heuristics

  25. Adaptive memory strategy • Pool update: • Pool size: at most q solutions • Solution a is a candidate to be inserted into the pool Q if it is better than the worst solution currently in the pool, i.e., |a| > min a’Q|a’| • Candidate solution a replaces the worst solution in the pool if it is better than the best solution in the pool (|a| > max a’Q|a’|) or if it is sufficiently different from every other solution in the pool (min a’Q dist(a,a’) ≥ dmin) Memory approaches to improve multi-start constructive heuristics

  26. Vocabulary building • Good solutions are very often formed by the same building blocks (paths) • Optimal solutions formed by components appearing in suboptimal solutions • Identify short paths with optimal superposition and combine them to build optimal solutions • Vocabulary building: Glover and Laguna, 1997 • Find common paths appearing in good solutions (words) • Combine them into new good solutions (phrases) Memory approaches to improve multi-start constructive heuristics

  27. Vocabulary building • Solutions encoded as adjacency vectors • Solution a = (a1,a2,...,ak) represented as a vector x = x1,x2,...,x|S| • If xu = s, then probe s follows immediately after probe u, i.e., the arc (u,s) is used in the path 1 2 a= (1,4,2,3,5) 6 3 5 4 Memory approaches to improve multi-start constructive heuristics

  28. Vocabulary building • Solutions encoded as adjacency vectors • Solution a = (a1,a2,...,ak) represented as a vector x = x1,x2,...,x|S| • If xu = s, then probe s follows immediately after probe u, i.e., the arc (u,s) is used in the path 1 2 a= (1,4,2,3,5) 6 3 5 4 Memory approaches to improve multi-start constructive heuristics

  29. Vocabulary building • Solutions encoded as adjacency vectors • Solution a = (a1,a2,...,ak) represented as a vector x = x1,x2,...,x|S| • If xu = s, then probe s follows immediately after probe u, i.e., the arc (u,s) is used in the path 1 2 a= (1,4,2,3,5) 6 3 5 4 Memory approaches to improve multi-start constructive heuristics

  30. Vocabulary building • Some notation: • Set X of adjacency vectors • Size(x): number of arcs in the adjacency vector x • Inter(X): subset of arcs that appear in all vectors in X • Enclosure(y,X): set formed by all vectors in X that contain the arcs in the adjacency vector y Memory approaches to improve multi-start constructive heuristics

  31. 3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 Inter(x1,x2): Memory approaches to improve multi-start constructive heuristics

  32. 3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 Inter(x1,x2): Memory approaches to improve multi-start constructive heuristics

  33. 3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 3 4 2 5 Inter(x1,x2): 1 6 8 7 Memory approaches to improve multi-start constructive heuristics

  34. Vocabulary building • Some notation: • Set X of adjacency vectors • Size(x): number of arcs in the adjacency vector x • Inter(X): subset of arcs that appear in all vectors in X • Enclosure(y,X): set formed by all vectors in X that contain the arcs in the adjacency vector y • Find words: given an elite set X, find vectors y with |Enclosure(y,X)| as large as possible and Size(y) ≥ smin (non-elementary small words), where smin is a parameter Memory approaches to improve multi-start constructive heuristics

  35. Vocabulary building • Algorithm FindWords(X,smin): Y  , X’  X while X’  do x  rand(X’), Z  {x}, X’’  X - {x} while X’’  do x  rand(X’’) if Size(Inter(Z{x})) ≥ smin then Z  Z  {x} X’’  X’’ - {x}; end-while if |Z| > 1 then y  Inter(Z); Y  Y  {y} X’  X’ – Z end-while return Y Martins and Plastino, 2005: more effective algorithm based on data mining strategies Memory approaches to improve multi-start constructive heuristics

  36. Vocabulary building • Additional notation: • x and y: adjacency vectors • ExtInter(x,y): undefined variables in one of the vectors are filled with the corresponding defined variables in the other Memory approaches to improve multi-start constructive heuristics

  37. 3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 ExtInter(x1,x2): Memory approaches to improve multi-start constructive heuristics

  38. 3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 3 4 2 5 ExtInter(x1,x2): 1 6 8 7 Memory approaches to improve multi-start constructive heuristics

  39. Vocabulary building • Additional notation: • x and y: adjacency vectors • ExtInter(x,y): undefined variables in one of the vectors are filled with the corresponding defined variables in the other • Combine words: given a set of words Y, combine them into phrases • Very similar to the algorithm that finds words, replacing the original operator Inter by the new operator ExtInter Memory approaches to improve multi-start constructive heuristics

  40. Vocabulary building • Algorithm CombineWords(Y): Z  , Y’  Y while Y’  do y  rand(Y’), W  {y}, Y’’  Y - {y} while Y’’  do y  rand(Y’’) if MaxInDegree(ExtInter(W,y)) = 1 then W  W  {y} Y’’  Y’’ - {y}; end-while if |W| > 1 then z  ExtInter(W); Z  Z  {z} Y’  Y’ – W end-while return Z Memory approaches to improve multi-start constructive heuristics

  41. Vocabulary building • Combine words: given a set of words Y, combine them into phrases • Very similar to the algorithm that finds words, replacing the original operator Inter by the new operator ExtInter • Phrases may be incomplete or unfeasible • Make feasible the unfeasible phrases (solutions) • Insert probe s0 in the best place in case it does not appear in the phrase • Complete the solution joining subpaths of the phrase Memory approaches to improve multi-start constructive heuristics

  42. Vocabulary building • Algorithm VocabularyBuilding(X,smin): Y  FindWords(X,smin) Z  CombineWords(Y) A   for each z  Z do a  MakeFeasible(z) A  A  {a} end-for return A Memory approaches to improve multi-start constructive heuristics

  43. Complete heuristic: MS+MEM+VB Q: pool of elite solutions for adaptive memory X: pool of elite solutions for vocabulary building |X|>>|Q| • Algorithm MS+MEM+VB: Q, X ; a*  null for i = 1, ..., MAXITER a  GreedyRandomizedMemory(Q, ) if |a| > |a*| then a*  a update weight  and use a to update pools Q and X if i mod(nVB) = 0 then A  VocabularyBuilding(X,smin) for every a  A do use a to update pools Q and X and if |a| > |a*|then a*  a end-for end-for return a* Memory approaches to improve multi-start constructive heuristics

  44. Computational experiments • Conditions: • Pentium 2.4 GHz with 512 M of RAM memory • Linux 10.0 with kernel 2.6.3 • Codes in ANSI C++ compiled with GNU compiler version 3.3.2 • Instances: • set A: instances generated from real human DNA sequences obtained from GenBank • set R: instances randomly generated Memory approaches to improve multi-start constructive heuristics

  45. Computational experiments • Instances A: • Origin: 40 GenBank sequences • Five smaller sequences are generated from each original sequence, corresponding to their prefixes of size n = 109, 209, 309, 409, 509 • For each of them, we consider its ideal spectrum, with size resp. equal to 100, 200, 300, 400, 500, using an array with probes of size q = 10 • Total: 200 instances • 20% of false negatives and 20% of false positives generated for each instance (probe s0 appears in all of them, no repetitions) Memory approaches to improve multi-start constructive heuristics

  46. Computational experiments • Instances R: • Origin: 100 random sequences • Ten smaller sequences are generated from each original sequence, corresponding to their prefixes of size n = 100, 200, ..., 1000 • For each of them, we consider its ideal spectrum, with size resp. equal to 92, 192, ..., 992, using an array with probes of size q = 7 • Total: 1000 instances • 20% of false negatives and 20% of false positives generated for each instance (probe s0 appears in all of them, no repetitions) Memory approaches to improve multi-start constructive heuristics

  47. Computational experiments • Solution quality evaluation: • Number of probes in the solution: |a| • Similarity with the target sequence: • Perform the alignment between the solution and the target sequence (matches: +1, missmatches: -1) to compute the value align((a),*) by dynamic programming • Compute similarity(a) = 100.(align((a),*)+nmax)/(2.nmax), with nmax = max{|(a)|,|*|} • Fraction: • fraction(a) = 100.|a|/|a*| Memory approaches to improve multi-start constructive heuristics

  48. Computational experiments • Random instances in set R used for parameter seting and tuning • Weight  decreases with the iteration counter • Small values of  are used in the beginning, so as that purely greedy solutions are generated when no frequency information is available • Initial value of  decreases with the problem size • MAXITER = 10.n (iterations) • Parameters  and  are updated after blocks of n/2 iterations Memory approaches to improve multi-start constructive heuristics

  49. MS+MEM+VB MS Numerical results Average similarity with the target sequence over all R instances with the same size Each additional component (memory, VB) improves the multi-start heuristic Memory approaches to improve multi-start constructive heuristics

  50. Numerical results Average computation time over all R instances with the same size Memory approaches to improve multi-start constructive heuristics

More Related