slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1 , Guoliang Li 2 , Chen Li 1 , Jianhua Feng 2 1 University PowerPoint Presentation
Download Presentation
Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1 , Guoliang Li 2 , Chen Li 1 , Jianhua Feng 2 1 University

Loading in 2 Seconds...

play fullscreen
1 / 20

Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1 , Guoliang Li 2 , Chen Li 1 , Jianhua Feng 2 1 University - PowerPoint PPT Presentation


  • 156 Views
  • Uploaded on

Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1 , Guoliang Li 2 , Chen Li 1 , Jianhua Feng 2 1 University of California, Irvine 2 Tsinghua University. Traditional Keyword Search. Too many results!. No result!. Complicated and still no result!.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1 , Guoliang Li 2 , Chen Li 1 , Jianhua Feng 2 1 University


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Efficient Interactive Fuzzy Keyword Search

Shengyue Ji1, Guoliang Li2, Chen Li1, Jianhua Feng2

1 University of California, Irvine

2 Tsinghua University

traditional keyword search
Traditional Keyword Search

Too many results!

No result!

Complicated and still

no result!

interactive fuzzy keyword search
Interactive Fuzzy Keyword Search

Features:

  • Interactive: data exploration
  • Fuzzy: error tolerant
  • Multiple keywords: search on-the-fly
fundamentals
Fundamentals
  • Data
    • R: a set of records
    • W: a set of distinct words
  • Query
    • Q = {p1, p2, …, pl}: a set of prefixes
    • δ:Edit-distance threshold
  • Query result
    • RQ: a set of records such that each record has all query prefixes or their similar forms (conjunctive)
contributions outline
Contributions / Outline
  • Step 1
    • Incremental fuzzy prefix matching
  • Step 2
    • Multi-prefix intersection methods
    • Cache-based prefix intersection
observation
Observation
  • W = {exam, example, exemplar, exempt, sample}
  • δ = 2

Q’ = exampl

Q = example

delete e

delete e

match e

delete e

substitute e with a

match e

trie indexing
Trie Indexing

Computing set of active nodes ΦQ

  • Initialization
  • Incremental step

e

s

x

a

a

e

m

Active nodes for Q = example

m

m

p

2

$

p

p

l

1

2

2

l

l

t

e

0

2

e

a

$

$

$

r

$

initialization
Initialization
  • Q = ε

0

1

1

e

s

2

2

x

a

a

e

m

m

m

p

$

p

p

l

l

l

t

e

Initializing Φεwith all nodes within in depth of δ

e

a

$

$

$

r

$

incremental computation algorithm
Incremental Computation: Algorithm
  • Incremental computation from ΦQ’ to ΦQ
  • add(ΦQ , <n, d>) has effect only if there exists no active node in ΦQ with the same n and smaller d

Algorithm Details

incremental computation example
Incremental Computation: Example
  • Q = e

1

Active nodes for Q = ε

0

1

e

s

1

2

x

a

2

2

a

e

m

m

m

p

Active nodes for Q = e

$

p

p

l

l

l

t

e

e

a

$

$

r

$

$

incremental computation discussion
Incremental Computation: Discussion
  • Insertions
    • Needed after matches
    • Not needed after deletions and substitutions
      • deletions and insertions do not co-occur in adjacent positions
      • adjacent substitutions and insertions are interchangeable
  • Correctness and Completeness
    • Can be proved by reducing from/to edit-distance computation
outline
Outline
  • Step 1
    • Incremental fuzzy prefix matching
  • Step 2
    • Multi-prefix intersection methods
    • Cache-based prefix intersection
multi prefix intersection
Multi-Prefix Intersection
  • Q = vldbli
  • Multi-prefix intersection
    • To return records such that each record has all query keywords as prefixes (or their similar forms)
multi prefix intersection method 1
Multi-Prefix Intersection: Method 1

d

l

v

a

i

u

l

t

$

n

u

$

i

d

a

1

8

$

$

4

s

b

3

4

6

5

$

$

$

4

1

2

3

6

6

7

8

  • Q = vldbli

li

1 3 4 5 6 8

6 8

vldb

6 7 8

multi prefix intersection method 2
Multi-Prefix Intersection: Method 2

[1, 7]

[2, 6]

[7, 7]

d

[1, 1]

l

v

[1, 1]

[2, 4]

[5, 6]

[7, 7]

a

i

u

l

[1, 1]

[3, 3]

[4, 4]

[6, 6]

[7, 7]

t

$

2

n

u

$

5

i

d

[1, 1]

[6, 6]

[7, 7]

a

1

8

$

3

$

4

4

s

b

3

4

6

5

$

1

$

6

$

7

4

1

2

3

6

6

7

8

6

7

8

Read each

Verify/Probe

[2, 4]

  • Q = vldbli
experimental results
Experimental Results
  • Computing similar prefixes
experimental results17
Experimental Results
  • Multi-prefix intersection
experimental results18
Experimental Results
  • Overall scalability
questions

TASTIER: Efficient Auto-Completion, Type-Ahead Search

http://tastier.ics.uci.edu/

Thank You!

Questions?

Questions?

Efficient Interactive Fuzzy Keyword Search

ShengyueJi, Guoliang Li, Chen Li, JianhuaFeng

UC Irvine & Tsinghua