slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Memory approaches to improve multi-start constructive heuristics PowerPoint Presentation
Download Presentation
Memory approaches to improve multi-start constructive heuristics

Loading in 2 Seconds...

play fullscreen
1 / 72

Memory approaches to improve multi-start constructive heuristics - PowerPoint PPT Presentation


  • 92 Views
  • Updated on

Celso C. Ribeiro Universidade Federal Fluminense, Brazil. Memory approaches to improve multi-start constructive heuristics. Joint work with Eraldo Fernandes (M.Sc., PUC-Rio, Brazil). WEA’2005 – IV Workshop on Experimental and Efficient Algorithms. Santorini, May 2005. Summary.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Memory approaches to improve multi-start constructive heuristics


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Celso C. Ribeiro Universidade Federal Fluminense, Brazil Memory approaches to improve multi-start constructive heuristics Joint work with Eraldo Fernandes (M.Sc., PUC-Rio, Brazil) WEA’2005 – IV Workshop on Experimental and Efficient Algorithms Santorini, May 2005

    2. Summary • Application: DNA sequencing • Motivation: sequencing by hybridization • Multi-start randomized constructive heuristic • Adaptive memory strategy • Vocabulary building • Complete heuristic: MS+MEM+VB • Computational experiments • Numerical results and comparisons • Concluding remarks Memory approaches to improve multi-start constructive heuristics

    3. DNA sequencing • DNA molecule: sequence formed by a combination of four different nucleotide bases - A, C, G, and T • Each DNA molecule may be represented as a word over the alphabet {A,C,G,T} of nucleotide bases • Example: ATAGGCAGGA • Sequencing: identification of the contents of a DNA molecule • Gel electrophoresis • Chemical method Memory approaches to improve multi-start constructive heuristics

    4. Sequencing by hybridization • SBH: alternative approach to DNA sequencing • Two phases: • Biochemical: hybridization experiment involving a DNA array and the target molecule to be sequenced • Computational: reconstruction problem using the results of the hybridization experiment Memory approaches to improve multi-start constructive heuristics

    5. Sequencing by hybridization • DNA array: • Bidimensional grid • Each cell contains a probe: small sequence of q nucleotides • Library C(q): set of all 4q probes of size q in the array • Hybridization experiment: • Array is introduced into a solution containing many copies of the target sequence • A copy of the target sequence reacts with a probe if the latter is a subsequence (of the complement) of the former • Spectrum: set of all probes of size q that reacted with the target sequence, i.e., subsequences of size q that appear in the target Memory approaches to improve multi-start constructive heuristics

    6. Sequencing by hybridization Library C(4): Target sequence: ATAGGCAGGA Memory approaches to improve multi-start constructive heuristics

    7. Sequencing by hybridization Library C(4): Target sequence: ATAGGCAGGA Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA} Memory approaches to improve multi-start constructive heuristics

    8. Sequencing by hybridization • Reconstruction problem: • Second phase: reconstruction of the target sequence from the spectrum • Find a sequence of the probes in the spectrum such that consecutive probes have q-1 bases of superposition • Hamiltonian path problem on the spectrum: • One vertex for each probe u in the spectrum • Arc (u,v) from probe uto vif the last q-1 bases of u coincide with the first q-1 bases of v ATAG TAGG AGGC GGCA GCAG CAGG AGGA ATAG TAGG AGGC GGCA GCAG CAGG AGGAATAGGCAGGA Memory approaches to improve multi-start constructive heuristics

    9. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA} TAGG AGGC ATAG GGCA AGGA CAGG GCAG Memory approaches to improve multi-start constructive heuristics

    10. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA} TAGG AGGC ATAG GGCA AGGA CAGG GCAG Memory approaches to improve multi-start constructive heuristics

    11. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GGCA, GCAG, CAGG, AGGA} ATAG TAGG AGGC GGCA GCAG CAGG AGGAATAGGCAGGA TAGG AGGC ATAG GGCA AGGA CAGG GCAG Memory approaches to improve multi-start constructive heuristics

    12. Sequencing by hybridization • Hybridization errors: • Hybridization experiment is not perfect • False positives: probes that appear in the spectrum but not in the target sequence • False negatives: probes that occur in the target sequence but not in the spectrum ATAG TAGG AGGC ----GCAG CAGG AGGAATAGGCAGGA Memory approaches to improve multi-start constructive heuristics

    13. Sequencing by hybridization • Problem of sequencing by hybridization (PSBH): given the spectrum S = {s1, s2, ..., sm}, the size q of the probes, the length n, and the first probe s0of the target sequence, find a sequence with size smaller than or equal to n with a maximum number of probes. • PSBH is NP-hard (Blazewicz et al., 1999) Memory approaches to improve multi-start constructive heuristics

    14. Sequencing by hybridization • Directed graph G = (V,E) • V = S (probes in the spectrum) • E = {(u,v): uS and vS} • Superposition o(u,v) between two probes u,vS: size of the largest sequence that is both a suffix of u and a prefix of v • Weight w(u,v) of the arc (u,v): Memory approaches to improve multi-start constructive heuristics

    15. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) TAGG AGGC ATAG GGCG: false positive GGCA: false negative GGCG AGGA CAGG GCAG Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

    16. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) TAGG 1 AGGC 1 1 3 ATAG 1 2 GGCG: false positive GGCA: false negative GGCG AGGA 3 1 CAGG GCAG 1 1 Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

    17. Sequencing by hybridization • Feasible solutions: acyclic paths in G emanating from vertex s0 with weight less than or equal to n-q • A path in G is a sequence a = (a1, a2, ..., ak) of probes ai S, i {1, 2, ..., k} • An optimal solution visits a maximum number of vertices and respects the above constraints • Heuristics: ant colony, tabu search, genetic algorithm • This work: multi-start constructive heuristic with a memory-based strategy Memory approaches to improve multi-start constructive heuristics

    18. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) TAGG 1 AGGC 1 1 3 ATAG 1 2 GGCG: false positive GGCA: false negative GGCG AGGA 3 1 CAGG GCAG 1 1 Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

    19. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) TAGG 1 AGGC 1 1 3 ATAG 1 2 GGCG: false positive GGCA: false negative GGCG AGGA 3 1 CAGG GCAG 1 1 Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

    20. Sequencing by hybridization Spectrum: {ATAG, TAGG, AGGC, GCAG, CAGG, AGGA, GGCG} (q = 4) ATAG TAGG AGGC ----GCAG CAGG AGGAATAGGCAGGA TAGG 1 AGGC 1 1 3 ATAG 1 2 GGCG AGGA 3 1 CAGG GCAG 1 1 GGCG: false positive GGCA: false negative Target sequence: ATAGGCAGGA (n = 10) Memory approaches to improve multi-start constructive heuristics

    21. Multi-start randomized constructive heuristic • Iteratively builds multiple solutions using a randomized constructive algorithm • Randomized constructive algorithm builds a different solution at each run • Returns the best solution found • Initial solution formed by a unique probe: a = (s0) • Current partial solution (path) is extended at each iteration by the insertion of a new probe at the end Memory approaches to improve multi-start constructive heuristics

    22. greediness Multi-start randomized constructive heuristic • Current partial solution (path) is extended at each iteration by the insertion of a new probe at the end • Probe to be inserted is probabilistically selected from a restricted candidate list (RCL) • S(a): probes in the current partial solution a • u: last probe in the current path • RCL = {v  S\S(a): o(u,v) ≥ (1-).max tS\S(a) o(u,t) and w(a) + w(u,v)  n-q} • Randomly select a probe v from RCL with probability p(u,v) = (1/w(u,v))/ΣtS\S(a) (1/w(u,t)) Memory approaches to improve multi-start constructive heuristics

    23. Adaptive memory strategy • Application to QAP: Fleurent and Glover, 1999 • Pool Q of elite solutions(best solutions found): diversity • Intensification strategy for the constructive algorithm • Makes use of two kinds of information in the construction: superposition between the probes and frequency of the arcs in the elite solutions • Parameter  used to balance the weights of the two terms: greediness (superposition) and frequency (memory) Memory approaches to improve multi-start constructive heuristics

    24. greediness frequency Adaptive memory strategy higher when the superposition between probes u and v is larger higher for arcs (u,v) appearing more often in the solutions of the elite set Probability p(u,v) of selecting a probe v from the RCL to extend the current partial solution whose last probe is u: Memory approaches to improve multi-start constructive heuristics

    25. Adaptive memory strategy • Pool update: • Pool size: at most q solutions • Solution a is a candidate to be inserted into the pool Q if it is better than the worst solution currently in the pool, i.e., |a| > min a’Q|a’| • Candidate solution a replaces the worst solution in the pool if it is better than the best solution in the pool (|a| > max a’Q|a’|) or if it is sufficiently different from every other solution in the pool (min a’Q dist(a,a’) ≥ dmin) Memory approaches to improve multi-start constructive heuristics

    26. Vocabulary building • Good solutions are very often formed by the same building blocks (paths) • Optimal solutions formed by components appearing in suboptimal solutions • Identify short paths with optimal superposition and combine them to build optimal solutions • Vocabulary building: Glover and Laguna, 1997 • Find common paths appearing in good solutions (words) • Combine them into new good solutions (phrases) Memory approaches to improve multi-start constructive heuristics

    27. Vocabulary building • Solutions encoded as adjacency vectors • Solution a = (a1,a2,...,ak) represented as a vector x = x1,x2,...,x|S| • If xu = s, then probe s follows immediately after probe u, i.e., the arc (u,s) is used in the path 1 2 a= (1,4,2,3,5) 6 3 5 4 Memory approaches to improve multi-start constructive heuristics

    28. Vocabulary building • Solutions encoded as adjacency vectors • Solution a = (a1,a2,...,ak) represented as a vector x = x1,x2,...,x|S| • If xu = s, then probe s follows immediately after probe u, i.e., the arc (u,s) is used in the path 1 2 a= (1,4,2,3,5) 6 3 5 4 Memory approaches to improve multi-start constructive heuristics

    29. Vocabulary building • Solutions encoded as adjacency vectors • Solution a = (a1,a2,...,ak) represented as a vector x = x1,x2,...,x|S| • If xu = s, then probe s follows immediately after probe u, i.e., the arc (u,s) is used in the path 1 2 a= (1,4,2,3,5) 6 3 5 4 Memory approaches to improve multi-start constructive heuristics

    30. Vocabulary building • Some notation: • Set X of adjacency vectors • Size(x): number of arcs in the adjacency vector x • Inter(X): subset of arcs that appear in all vectors in X • Enclosure(y,X): set formed by all vectors in X that contain the arcs in the adjacency vector y Memory approaches to improve multi-start constructive heuristics

    31. 3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 Inter(x1,x2): Memory approaches to improve multi-start constructive heuristics

    32. 3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 Inter(x1,x2): Memory approaches to improve multi-start constructive heuristics

    33. 3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 3 4 2 5 Inter(x1,x2): 1 6 8 7 Memory approaches to improve multi-start constructive heuristics

    34. Vocabulary building • Some notation: • Set X of adjacency vectors • Size(x): number of arcs in the adjacency vector x • Inter(X): subset of arcs that appear in all vectors in X • Enclosure(y,X): set formed by all vectors in X that contain the arcs in the adjacency vector y • Find words: given an elite set X, find vectors y with |Enclosure(y,X)| as large as possible and Size(y) ≥ smin (non-elementary small words), where smin is a parameter Memory approaches to improve multi-start constructive heuristics

    35. Vocabulary building • Algorithm FindWords(X,smin): Y  , X’  X while X’  do x  rand(X’), Z  {x}, X’’  X - {x} while X’’  do x  rand(X’’) if Size(Inter(Z{x})) ≥ smin then Z  Z  {x} X’’  X’’ - {x}; end-while if |Z| > 1 then y  Inter(Z); Y  Y  {y} X’  X’ – Z end-while return Y Martins and Plastino, 2005: more effective algorithm based on data mining strategies Memory approaches to improve multi-start constructive heuristics

    36. Vocabulary building • Additional notation: • x and y: adjacency vectors • ExtInter(x,y): undefined variables in one of the vectors are filled with the corresponding defined variables in the other Memory approaches to improve multi-start constructive heuristics

    37. 3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 ExtInter(x1,x2): Memory approaches to improve multi-start constructive heuristics

    38. 3 4 3 4 2 5 2 5 1 6 1 6 8 7 8 7 3 4 2 5 ExtInter(x1,x2): 1 6 8 7 Memory approaches to improve multi-start constructive heuristics

    39. Vocabulary building • Additional notation: • x and y: adjacency vectors • ExtInter(x,y): undefined variables in one of the vectors are filled with the corresponding defined variables in the other • Combine words: given a set of words Y, combine them into phrases • Very similar to the algorithm that finds words, replacing the original operator Inter by the new operator ExtInter Memory approaches to improve multi-start constructive heuristics

    40. Vocabulary building • Algorithm CombineWords(Y): Z  , Y’  Y while Y’  do y  rand(Y’), W  {y}, Y’’  Y - {y} while Y’’  do y  rand(Y’’) if MaxInDegree(ExtInter(W,y)) = 1 then W  W  {y} Y’’  Y’’ - {y}; end-while if |W| > 1 then z  ExtInter(W); Z  Z  {z} Y’  Y’ – W end-while return Z Memory approaches to improve multi-start constructive heuristics

    41. Vocabulary building • Combine words: given a set of words Y, combine them into phrases • Very similar to the algorithm that finds words, replacing the original operator Inter by the new operator ExtInter • Phrases may be incomplete or unfeasible • Make feasible the unfeasible phrases (solutions) • Insert probe s0 in the best place in case it does not appear in the phrase • Complete the solution joining subpaths of the phrase Memory approaches to improve multi-start constructive heuristics

    42. Vocabulary building • Algorithm VocabularyBuilding(X,smin): Y  FindWords(X,smin) Z  CombineWords(Y) A   for each z  Z do a  MakeFeasible(z) A  A  {a} end-for return A Memory approaches to improve multi-start constructive heuristics

    43. Complete heuristic: MS+MEM+VB Q: pool of elite solutions for adaptive memory X: pool of elite solutions for vocabulary building |X|>>|Q| • Algorithm MS+MEM+VB: Q, X ; a*  null for i = 1, ..., MAXITER a  GreedyRandomizedMemory(Q, ) if |a| > |a*| then a*  a update weight  and use a to update pools Q and X if i mod(nVB) = 0 then A  VocabularyBuilding(X,smin) for every a  A do use a to update pools Q and X and if |a| > |a*|then a*  a end-for end-for return a* Memory approaches to improve multi-start constructive heuristics

    44. Computational experiments • Conditions: • Pentium 2.4 GHz with 512 M of RAM memory • Linux 10.0 with kernel 2.6.3 • Codes in ANSI C++ compiled with GNU compiler version 3.3.2 • Instances: • set A: instances generated from real human DNA sequences obtained from GenBank • set R: instances randomly generated Memory approaches to improve multi-start constructive heuristics

    45. Computational experiments • Instances A: • Origin: 40 GenBank sequences • Five smaller sequences are generated from each original sequence, corresponding to their prefixes of size n = 109, 209, 309, 409, 509 • For each of them, we consider its ideal spectrum, with size resp. equal to 100, 200, 300, 400, 500, using an array with probes of size q = 10 • Total: 200 instances • 20% of false negatives and 20% of false positives generated for each instance (probe s0 appears in all of them, no repetitions) Memory approaches to improve multi-start constructive heuristics

    46. Computational experiments • Instances R: • Origin: 100 random sequences • Ten smaller sequences are generated from each original sequence, corresponding to their prefixes of size n = 100, 200, ..., 1000 • For each of them, we consider its ideal spectrum, with size resp. equal to 92, 192, ..., 992, using an array with probes of size q = 7 • Total: 1000 instances • 20% of false negatives and 20% of false positives generated for each instance (probe s0 appears in all of them, no repetitions) Memory approaches to improve multi-start constructive heuristics

    47. Computational experiments • Solution quality evaluation: • Number of probes in the solution: |a| • Similarity with the target sequence: • Perform the alignment between the solution and the target sequence (matches: +1, missmatches: -1) to compute the value align((a),*) by dynamic programming • Compute similarity(a) = 100.(align((a),*)+nmax)/(2.nmax), with nmax = max{|(a)|,|*|} • Fraction: • fraction(a) = 100.|a|/|a*| Memory approaches to improve multi-start constructive heuristics

    48. Computational experiments • Random instances in set R used for parameter seting and tuning • Weight  decreases with the iteration counter • Small values of  are used in the beginning, so as that purely greedy solutions are generated when no frequency information is available • Initial value of  decreases with the problem size • MAXITER = 10.n (iterations) • Parameters  and  are updated after blocks of n/2 iterations Memory approaches to improve multi-start constructive heuristics

    49. MS+MEM+VB MS Numerical results Average similarity with the target sequence over all R instances with the same size Each additional component (memory, VB) improves the multi-start heuristic Memory approaches to improve multi-start constructive heuristics

    50. Numerical results Average computation time over all R instances with the same size Memory approaches to improve multi-start constructive heuristics