- 71 Views
- Uploaded on
- Presentation posted in: General

Computer Science and Engineering

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Computer Science and Engineering

Inverted Linear Quadtree: Efﬁcient Top K Spatial Keyword Search

Chengyuan Zhang1,Ying Zhang1,Wenjie Zhang1, Xuemin Lin2,1

1The University of New South Wales, Australia

2 East China Normal University

- An enormous amount of spatio-textual objects available in many applications
- online local search
e.g., online yellow pages

- social network services
e.g., Facebook, Flickr

p5 (pizza,

steak,seafood)

p2 (pizza,

coffee,steak)

p4 (coffee,

sushi)

pizza,coffee

p3 (pizza,

sushi)

p1 (pizza,

coffee,sushi)

Data

- A set of spatio-textual objects
- Each object is represented a location and a set of keywords
Query

- Query location (q.loc)
- A set of query keywords (q.T)
Answer

- The closest k objects, each of which contains all query keywords

Running Example

11 spatio-textual objects

Vocabulary {t1, t2, t3}

Query q with q.T = {t1, t2} and k =1

p11 (t2)

p10 (t1)

P10 (t1)

p6 (t2,t3)

p7 (t3)

p4 (t1)

p9 (t2)

p1 (t1,t2)

p8 (t3)

Objects Accessed:

p3, p4, p7, p8 ,p5, p1!

p3 (t1,t3)

p5 (t2,t3)

p2 (t1,t2)

K=1, q.T={t1, t2}

For each keyword t, construct an R tree for objects containing t

E1

E2

R1 (t1)

Objects Accessed:

p3, p4, p5, p1!

E2

E1

R2 (t2)

E1

E2

R3 (t3)

Index Structure

- Combination of an R-Tree and signature technique
- Each node contains a rectangle and a signature ( a fixed length bitmap)
- Each word is hashed to a particular bit
- The signature of a node is the “ BitwiseOR ” of all the signatures of its child nodes

10

t1

Objects Accessed:

p3, p4, p7, p1!

01

t2

t3

01

k=1, q.T={t1, t2}

False positive!

E12

E11

E10

E9

E8

Result:

E7

E2

E3

E4

E1

E6

E5

E8

p5

p1

E5

Number of

object within

search region

Number of

object accessed

Avg. probability that

an objects is accessed

Naïve approach

- Disadvantages: all objects in the search region are accessed ( large s and p=1 )
Inverted R-tree

- Advantages: exclude unrelated objects ( small s )
- Disadvantages: cannot take advantage of AND semantics (p=1)
IR2-tree

- Advantages: have filtering technique to reduce p
- Disadvantages: large s and pis affected by non-related objects
Other Single Augmented R-tree

- Other spatial keyword search : KR tree [R. Hariharan, et al., SSDBM 2007]
WIR tree [D. Wu , et al., TKDE 2011]

- Spatial keyword ranking query : IR tree [G. Cong ,et al., PVLDB 2009]
CM-CDIR tree [D. Wu ,et al., VLDBJ 2012]

- Their shortcomings: same as IR2-tree

Cost model: n= s*p

Index structure

- have a small number of objects within the search region
- can prune objects within the search region
Properties

- falls in the category of inverted index
- exploit the AND semantics
- adaptive to the distribution of the objects for each keyword

Signature of a region regarding a keyword

1

non-Empty

Empty

0

p1: t1

Query Keyword: t1, t2

p2: t1, t2

p3: t2

t1 : 1

0

t2 :0

t1 : 1

1

t2 : 1

- Regular space partition based indexing
- Each node can be identified by its split sequence (Morton code, a.k.a Z order)
- A circle and a square to denote the non-leaf node and leaf node
- A leaf node is set black if it is not empty, otherwise, it is a white leaf node
- Keep the black leaf nodes (B+ tree)

NE

1100

SW, SE

0001

For each keyword ti ∈ V we build a linear quadtree, denoted by LQi, for the objects which contain the keyword ti

Besides the black leaf nodes we also keep the quadtree node information ( signature )

1 for black leaf nodes and non-leaf nodes and 0 otherwise

k=1, q.T={t1, t2}

Objects Accessed: p4, p1!

- Data
- A set of spatio-textual objects
- Each objects has a location and a set of keywords

- Query
- A location (q.loc)
- A set of query keywords (q.T)
- A direction [, ]

- Answer
- The closest k objects, each of which contains all keywords in q.T, and in the search direction

Query

- Spatial location
- Query keywords
Returns the k best objects ranked by

- Spatial distance to the query location
- Textual relevance to the query keywords
Spatio-textual ranking Score

- The spatial proximity (δ) is the normalized Euclidean distance between pand q
- The textual relevance (θ) is the tf-idf based textual similarity between the description of p and the query keywords.
Our Solution

- the maximal keywords weight replaces the bit signature – aggregate inverted linear quadtree
- spatial distance ranking function replaced by spatio-textual ranking score function
- Score based pruning based on weight and region of the quadtree node

Implemented in Java

Debian Linux

- Intel Xeon 2.40GHz dual CPU
- 4 GB memory
Dataset

GN : US Board on Geographic Names

Tigers, Cars :

- Spatial datasets from Rtree-Portal
- Textual content from 20 Newsgroups
SYN: synthetic dataset

Query (1000) : location , #l query keywords

Evaluate Response time and # I/O

Important Statistics

Parameters evaluated

w’ : Minimal depth of the black leaf node

c: The split threshold

Best performance:

- w’ = 8 and c = 64

l: The number of query keywords

Gird :[ M. Christoforaki,et al., CIKM, 2011]

Grid+SIG: the extension of Grid, utilizing signaturetechnique

ILQ

- Inverted Linear Quadtree based techniques
IVR

- inverted Rtree [Y. Zhou, et al., CIKM 2005]
MIR2

- [I. D. Felipe,et al., ICDE 2008]
KR

- [R. Hariharan,et al., SSDBM 2007]
WIR

- [D. Wu ,et al., TKDE 2011]
IR

- [G. Cong ,et al., PVLDB 2009]
CM-CDIR

- [D. Wu ,et al., VLDBJ 2012]

Important properties of indexing techniques to support top k spatial keyword search

Propose the inverted linear quadtree structure to efficiently support top k spatial keyword search

Extensive experiment on both real and synthetic data

Future work

Enhance the region based signature technique – group objects to reduce false positive.

Support top k spatial keyword search on other metric spaces

Thank you!

- Our Algorithm
- Aggregate ILQ

- Compare with
- IR [G. Cong, et al., PVLDB 2009]
- CM-CDIR [D. Wu ,et al., VLDBJ 2012]
- Dataset: Tiger

- Our Algorithm
- ILQ

- Compare with
- DESKS [G.Li,et al., ICDE 2012]