1 / 20

Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1 , Guoliang Li 2 , Chen Li 1 , Jianhua Feng 2 1 University

Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1 , Guoliang Li 2 , Chen Li 1 , Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University. Traditional Keyword Search. Too many results!. No result!. Complicated and still no result!.

maude
Download Presentation

Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1 , Guoliang Li 2 , Chen Li 1 , Jianhua Feng 2 1 University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Interactive Fuzzy Keyword Search Shengyue Ji1, Guoliang Li2, Chen Li1, Jianhua Feng2 1 University of California, Irvine 2 Tsinghua University

  2. Traditional Keyword Search Too many results! No result! Complicated and still no result!

  3. Interactive Fuzzy Keyword Search Features: • Interactive: data exploration • Fuzzy: error tolerant • Multiple keywords: search on-the-fly

  4. Fundamentals • Data • R: a set of records • W: a set of distinct words • Query • Q = {p1, p2, …, pl}: a set of prefixes • δ:Edit-distance threshold • Query result • RQ: a set of records such that each record has all query prefixes or their similar forms (conjunctive)

  5. Contributions / Outline • Step 1 • Incremental fuzzy prefix matching • Step 2 • Multi-prefix intersection methods • Cache-based prefix intersection

  6. Observation • W = {exam, example, exemplar, exempt, sample} • δ = 2 Q’ = exampl Q = example delete e delete e match e delete e substitute e with a match e

  7. Trie Indexing Computing set of active nodes ΦQ • Initialization • Incremental step e s x a a e m Active nodes for Q = example m m p 2 $ p p l 1 2 2 l l t e 0 2 e a $ $ $ r $

  8. Initialization • Q = ε 0 1 1 e s 2 2 x a a e m m m p $ p p l l l t e Initializing Φεwith all nodes within in depth of δ e a $ $ $ r $

  9. Incremental Computation: Algorithm • Incremental computation from ΦQ’ to ΦQ • add(ΦQ , <n, d>) has effect only if there exists no active node in ΦQ with the same n and smaller d Algorithm Details

  10. Incremental Computation: Example • Q = e 1 Active nodes for Q = ε 0 1 e s 1 2 x a 2 2 a e m m m p Active nodes for Q = e $ p p l l l t e e a $ $ r $ $

  11. Incremental Computation: Discussion • Insertions • Needed after matches • Not needed after deletions and substitutions • deletions and insertions do not co-occur in adjacent positions • adjacent substitutions and insertions are interchangeable • Correctness and Completeness • Can be proved by reducing from/to edit-distance computation

  12. Outline • Step 1 • Incremental fuzzy prefix matching • Step 2 • Multi-prefix intersection methods • Cache-based prefix intersection

  13. Multi-Prefix Intersection • Q = vldbli • Multi-prefix intersection • To return records such that each record has all query keywords as prefixes (or their similar forms)

  14. Multi-Prefix Intersection: Method 1 d l v a i u l t $ n u $ i d a 1 8 $ $ 4 s b 3 4 6 5 $ $ $ 4 1 2 3 6 6 7 8 • Q = vldbli li 1 3 4 5 6 8 6 8 vldb 6 7 8

  15. Multi-Prefix Intersection: Method 2 [1, 7] [2, 6] [7, 7] d [1, 1] l v [1, 1] [2, 4] [5, 6] [7, 7] a i u l [1, 1] [3, 3] [4, 4] [6, 6] [7, 7] t $ 2 n u $ 5 i d [1, 1] [6, 6] [7, 7] a 1 8 $ 3 $ 4 4 s b 3 4 6 5 $ 1 $ 6 $ 7 4 1 2 3 6 6 7 8 6 7 8 Read each Verify/Probe [2, 4] • Q = vldbli

  16. Experimental Results • Computing similar prefixes

  17. Experimental Results • Multi-prefix intersection

  18. Experimental Results • Overall scalability

  19. TASTIER: Efficient Auto-Completion, Type-Ahead Search http://tastier.ics.uci.edu/ Thank You! Questions? Questions? Efficient Interactive Fuzzy Keyword Search ShengyueJi, Guoliang Li, Chen Li, JianhuaFeng UC Irvine & Tsinghua

More Related