1 / 32

Supporting Efficient Top-k Queries in Type-A h ead Search

Supporting Efficient Top-k Queries in Type-A h ead Search. Guoliang Li 1 , Jiannan Wang 1 , Chen Li 2 , Jianhua Feng 1 1 Tsinghua University 2 UC Irvine, Bimaple Technology Inc. . SIGIR 2012, Portland, Oregon. Query suggestions. Type-ahead search (instant search).

gittel
Download Presentation

Supporting Efficient Top-k Queries in Type-A h ead Search

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Supporting Efficient Top-k Queries in Type-Ahead Search Guoliang Li1, Jiannan Wang1, Chen Li2, Jianhua Feng1 1 Tsinghua University • 2 UC Irvine, Bimaple Technology Inc. SIGIR 2012, Portland, Oregon

  2. Query suggestions Tsinghua/UC Irvine/Bimaple

  3. Type-ahead search (instant search) Finding answers instantly! Tsinghua/UC Irvine/Bimaple

  4. ipubmed.ics.uci.edu Fuzzy search Tsinghua/UC Irvine/Bimaple

  5. Advantages of instant fuzzy search • Save time • Correct errors • Mobile friendly Fat fingers! Tsinghua/UC Irvine/Bimaple

  6. Challenges • Speed • “100ms rule” • Prefix matching • Fuzzy matching • Quality Tsinghua/UC Irvine/Bimaple

  7. Contributions Techniques for computing top-k answers in instant fuzzy searchwithout generating all candidates • Ranking framework • Index Structures • Algorithms • Experimental evaluation Tsinghua/UC Irvine/Bimaple

  8. Outline • Problem Formulation • Instant exact search • Instant fuzzy search • Experiments Tsinghua/UC Irvine/Bimaple

  9. Problem Formulation • Data: records • Query: • w1, w2, …, wm • wmpartial keyword • Answers: k best records graphicdeli Prefix Tsinghua/UC Irvine/Bimaple

  10. Ranking Framework Aggregate li Query icde graph Max Score(graph) Score(liu) Score(lin) Score(icde) graph, gray, gross, icde, lin, liu Record Tsinghua/UC Irvine/Bimaple

  11. Index structures Trie i l g r c i u a o d n u i p y s u e m h s p Inverted Index Tsinghua/UC Irvine/Bimaple

  12. Basic Solution {graph, icde, li} k=1 • Too many candidates i l g r c i u a o d n u i p y s u e m h s p icde graph lin liu Tsinghua/UC Irvine/Bimaple

  13. Optimization 1: Heap-based Method Aggregate GetMax() icde graph Max Heap lin liu Tsinghua/UC Irvine/Bimaple

  14. Optimization 2: Top-k List-Merging Algorithm Example: Threshold algorithm T = 15 Sorted Access Sorted Access = 17 = 14 = 12 = 12 Random Access Tsinghua/UC Irvine/Bimaple Early termination

  15. Efficient Random Access: How? i l g r c i u a o d n u i p y s u e m h s p Tsinghua/UC Irvine/Bimaple

  16. Forward index [Ji et al. WWW’09] [7, 9] [1,4] [5, 6] i l g [9, 9] [7, 8] [1, 4] [5, 6] r [3, 4] c i u [1, 2] [5, 6] a o d n u i [3,3] [4, 4] [1,1] [2,2] 7 8 9 p y s u e m 2 5 6 h s p 1 4 3 Keyword ID Weight Tsinghua/UC Irvine/Bimaple

  17. Random Access Using Forward Index 7 ? [7, 9] [1,4] [5, 6] i l g [9, 9] [7, 8] [1, 4] [5, 6] r [3, 4] c i u [1, 2] [5, 6] a o d n u i [3,3] [4, 4] [1,1] [2,2] 7 8 9 p y s u e m 2 5 6 h s p 1 4 3 Tsinghua/UC Irvine/Bimaple

  18. Outline • Problem Formulation • Instant exact search • Instant fuzzy search • Experiments Tsinghua/UC Irvine/Bimaple

  19. Ranking Framework (Fuzzy matching) Aggregate li Query icde graph Max Sim(li,i) *Score(lin) Sim(icde,icdm) *Score(icdm) Score(graph) Score(liu) Score(lin) graph, gray, icdm, gross,lin, liu Record Tsinghua/UC Irvine/Bimaple

  20. Computing Similar Prefixes [Ji et al. WWW’09] {graph, icde, li}, similarity threshold τ=0.45 i l g r c i u a o d n u i p y s u e m h s p Tsinghua/UC Irvine/Bimaple

  21. Top-k Algorithm sum GetMax() GetMax() GetMax() li graph icde 2 3 Max Heap Max Heap 4 Max Heap ×0.5 ×0.5 ×1 ×1 ×0.5 ×1 ×1 ×0.5 similarity lui icde icde graph icdm lin icdm liu Tsinghua/UC Irvine/Bimaple

  22. Efficient Random Access (method 1) • Probing on Forward Lists [7, 9] [1,4] [5, 6] i l g [9, 9] [7, 8] [1, 4] [5, 6] r [3, 4] c i u [1, 2] [5, 6] a o d n u i [3,3] [4, 4] [1,1] [2,2] 7 8 9 p y s u e m 2 5 6 h s p 1 4 3 Binary Search: [5,6], [7,9], [7,8], [9,9], 7, 8, 9 Tsinghua/UC Irvine/Bimaple

  23. Efficient Random Access (method 2) • Probing on Trie Leaf Nodes [7,9] [1,4] [5,6] i l g [7,8] [1,4] [9,9] [5,6] r [3,4] c i u [1,2] [5,6] a o d n u i [3,3] [4,4] [1,1] [2,2] 7 8 9 p y s u l m 5 6 2 li, 0.5 h s p li, 1 1 4 3 li, 1 li, 0.5 li, 0.5 Traverse the forward list of Tsinghua/UC Irvine/Bimaple

  24. Optimization by materializing union lists • Time/space tradeoff • Cost-based analysis for a space budget i l g r c i u a o d n u i p y s u e m h s p Tsinghua/UC Irvine/Bimaple

  25. Outline • Problem Formulation • Instant exact search • Instant fuzzy search • Experiments Tsinghua/UC Irvine/Bimaple

  26. Data sets and index costs Tsinghua/UC Irvine/Bimaple

  27. Exact Search (DBLP) k=10, similarity threshold τ=0.6 Tsinghua/UC Irvine/Bimaple

  28. Exact Search (DBLP) k=10, similarity threshold τ=0.6 Tsinghua/UC Irvine/Bimaple

  29. Fuzzy Search DBLP, k=10, similarity threshold τ=0.6 TA NRA Tsinghua/UC Irvine/Bimaple

  30. Other results (not included in the paper) • More general ranking (e.g., positional information) • Other languages • Location-based search Tsinghua/UC Irvine/Bimaple

  31. Conclusions (ipubmed.ics.uci.edu) Efficient techniques for instant fuzzy search Tsinghua/UC Irvine/Bimaple

  32. Acknowledgements • The authors have financial interest in Bimaple Technology Inc., a company currently commercializing some of the techniques described in this publication. • Chen Li was partially supported by NIH grant 1R21LM010143-01A1. • Guoliang Li, Jianan Wang, and Jianhua Feng were partly supported by the National Natural Science Foundation of China under Grant No. 61003004, the National Grand Fundamental Research 973 Program of China under Grant No. 2011CB302206, a project of Tsinghua University under Grant No. 20111081073, and the “NExTResearch Center” funded by MDA, Singapore, under the Grant No. WBS:R-252-300-001-490. Tsinghua/UC Irvine/Bimaple

More Related