1 / 24

Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering

Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering. Sharlee Climer and Weixiong Zhang. This research was supported in part by NDSEG and Olin Fellowships and by NSF grants IIS-0196057 and ITR/EIA-0113618. Overview. Introduction Example Results

waverly
Download Presentation

Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Take a Walk and Cluster Genes: A TSP-based Approach to Optimal Rearrangement Clustering Sharlee Climer and Weixiong Zhang This research was supported in part by NDSEG and Olin Fellowships and by NSF grants IIS-0196057 and ITR/EIA-0113618.

  2. Overview • Introduction • Example • Results • Conclusion Washington University in St. Louis

  3. Introduction • Rearrangement clustering • Rearrange rows of a matrix • Minimize the sum of the differences between adjacent rows • min Sd(i, i+1) • Rows correspond to objects • Columns correspond to features Washington University in St. Louis

  4. Introduction • Applications • Information retrieval • Manufacturing • Software engineering Washington University in St. Louis

  5. Example Washington University in St. Louis

  6. Example • Bond Energy Algorithm (BEA) • Introduced in 1972 (McCormick, Schweitzer, White) • Approximate solution • Still widely used Washington University in St. Louis

  7. Example Washington University in St. Louis

  8. Example • Optimal solution • Lenstra (1974) observed equivalence to the Traveling Salesman Problem (TSP) • Given n cities and the distance between each pair • Find shortest cycle visiting every city • NP-hard problem Washington University in St. Louis

  9. Example • Transform into a TSP • Each object corresponds to a city • Distance between two cities equal to difference between the corresponding objects • Dummy city added to problem • Costs from dummy city to all other cities equal a constant • Location of dummy city indicates position to cut cycle into a path Washington University in St. Louis

  10. Example • TSP solvers extremely slow even for small problems in the 70’s • Massive research efforts to solve TSP over last three decades • Current solvers • Concorde (Applegate, Bixby, Chvatal, Cook, 2001) • Solved a 15,112 city TSP Washington University in St. Louis

  11. Example Washington University in St. Louis

  12. Example • BEA and TSP offer approximate and optimal solutions • We have observed a flaw in the objective function when the objects form natural clusters • The objective minimizes the sum of every pair of adjacent rows • Inter-cluster distances tend to be significantly larger than intra-cluster distances • Summation dominated by inter-cluster distances Washington University in St. Louis

  13. Example • TSPCluster addresses this flaw • Add k dummy cities • k clusters are specified by the output • TSP solver ignores inter-cluster distances • Minimizes sum of intra-cluster distances • Use sufficiently small constant for distances to/from dummy cities • Dummy cities never adjacent to each other Washington University in St. Louis

  14. Example Washington University in St. Louis

  15. Results • Arabidopsis • 499 genes • 25 conditions • Comparison with BEA • Used BEA similarity measure • BEA score: 447,070 • TSPCluster score: 452,109 (k = 1) Washington University in St. Louis

  16. Results BEA TSPCluster Washington University in St. Louis

  17. Results • Compared with Cluster (Eisen et al., 1998) and k-ary (Bar-Joseph et al., 2003) • Used Pearson correlation coefficient • Cluster: 398 • k-ary: 427 • TSPCluster: 436 (k = 1) Washington University in St. Louis

  18. Results Cluster k-ary TSPCluster Washington University in St. Louis

  19. Results • TSPCluster with k equal to 2 to 50 • How many clusters? • Average inter-cluster distances • BEA local peaks: • 6, 13, 19, 26, 29, 35, 40, 47 • Pearson correlation coefficient local peaks: • 3, 9, 12, 21, 26, 40 • Computation time varied • Less than half minute to ~3 minutes Washington University in St. Louis

  20. Results k = 26 k = 40 Washington University in St. Louis

  21. Conclusion • Most problems have errors in their data • Error introduced by approximation algorithms can’t be expected to “undo” this error • Computers are cheap • Computers and solvers are sophisticated • Don’t have to always resort on approximate solutions even for NP-hard problems Washington University in St. Louis

  22. Conclusion • Rearrangement clustering provides a linear ordering • Linear ordering inherent to many applications • Information retrieval • Manufacturing • Software engineering Washington University in St. Louis

  23. Conclusion • Gene data arranged in linear order to examine data • Linear ordering not necessarily essential to gene clustering problems • Current work • Optimally solve subproblems in clustering algorithms Washington University in St. Louis

  24. Questions? Washington University in St. Louis

More Related