Computer Science and Engineering
This presentation is the property of its rightful owner.
Sponsored Links
1 / 32

Computer Science and Engineering PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on
  • Presentation posted in: General

Computer Science and Engineering. Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search. Chengyuan Zhang 1 ,Ying Zhang 1 ,Wenjie Zhang 1 , Xuemin Lin 2,1. 1 The University of New South Wales, Australia 2 East China Normal University. Background.

Download Presentation

Computer Science and Engineering

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Computer science and engineering

Computer Science and Engineering

Inverted Linear Quadtree: Efficient Top K Spatial Keyword Search

Chengyuan Zhang1,Ying Zhang1,Wenjie Zhang1, Xuemin Lin2,1

1The University of New South Wales, Australia

2 East China Normal University


Background

Background

  • An enormous amount of spatio-textual objects available in many applications

  • online local search

    e.g., online yellow pages

  • social network services

    e.g., Facebook, Flickr


Computer science and engineering

p5 (pizza,

steak,seafood)

p2 (pizza,

coffee,steak)

p4 (coffee,

sushi)

pizza,coffee

p3 (pizza,

sushi)

p1 (pizza,

coffee,sushi)


Top k spatial keyword search topk sk

Top k spatial keyword search (TOPK-SK)

Data

  • A set of spatio-textual objects

  • Each object is represented a location and a set of keywords

    Query

  • Query location (q.loc)

  • A set of query keywords (q.T)

    Answer

  • The closest k objects, each of which contains all query keywords


Na ve approach

Naïve Approach

Running Example

11 spatio-textual objects

Vocabulary {t1, t2, t3}

Query q with q.T = {t1, t2} and k =1

p11 (t2)

p10 (t1)

P10 (t1)

p6 (t2,t3)

p7 (t3)

p4 (t1)

p9 (t2)

p1 (t1,t2)

p8 (t3)

Objects Accessed:

p3, p4, p7, p8 ,p5, p1!

p3 (t1,t3)

p5 (t2,t3)

p2 (t1,t2)


Inverted r tree y zhou et al cikm 2005

Inverted R-tree [Y. Zhou,et al., CIKM 2005]

K=1, q.T={t1, t2}

For each keyword t, construct an R tree for objects containing t

E1

E2

R1 (t1)

Objects Accessed:

p3, p4, p5, p1!

E2

E1

R2 (t2)

E1

E2

R3 (t3)


Ir 2 tree i d felipe et al icde 2008

IR2-tree [ I. D. Felipe, et. al., ICDE 2008]

Index Structure

  • Combination of an R-Tree and signature technique

  • Each node contains a rectangle and a signature ( a fixed length bitmap)

  • Each word is hashed to a particular bit

  • The signature of a node is the “ BitwiseOR ” of all the signatures of its child nodes


Example

10

t1

Example

Objects Accessed:

p3, p4, p7, p1!

01

t2

t3

01

k=1, q.T={t1, t2}

False positive!

E12

E11

E10

E9

E8

Result:

E7

E2

E3

E4

E1

E6

E5

E8

p5

p1

E5


Observations

Number of

object within

search region

Observations

Number of

object accessed

Avg. probability that

an objects is accessed

Naïve approach

  • Disadvantages: all objects in the search region are accessed ( large s and p=1 )

    Inverted R-tree

  • Advantages: exclude unrelated objects ( small s )

  • Disadvantages: cannot take advantage of AND semantics (p=1)

    IR2-tree

  • Advantages: have filtering technique to reduce p

  • Disadvantages: large s and pis affected by non-related objects

    Other Single Augmented R-tree

  • Other spatial keyword search : KR tree [R. Hariharan, et al., SSDBM 2007]

    WIR tree [D. Wu , et al., TKDE 2011]

  • Spatial keyword ranking query : IR tree [G. Cong ,et al., PVLDB 2009]

    CM-CDIR tree [D. Wu ,et al., VLDBJ 2012]

  • Their shortcomings: same as IR2-tree

Cost model: n= s*p


Motivation

Motivation

Index structure

  • have a small number of objects within the search region

  • can prune objects within the search region

    Properties

  • falls in the category of inverted index

  • exploit the AND semantics

  • adaptive to the distribution of the objects for each keyword


Motivation1

Motivation

Signature of a region regarding a keyword

1

non-Empty

Empty

0

p1: t1

Query Keyword: t1, t2

p2: t1, t2

p3: t2

t1 : 1

0

t2 :0

t1 : 1

1

t2 : 1


Linear quadtree structure

Linear Quadtree Structure

  • Regular space partition based indexing

  • Each node can be identified by its split sequence (Morton code, a.k.a Z order)

  • A circle and a square to denote the non-leaf node and leaf node

  • A leaf node is set black if it is not empty, otherwise, it is a white leaf node

  • Keep the black leaf nodes (B+ tree)

NE

1100

SW, SE

0001


Il quadtree

IL-Quadtree

For each keyword ti ∈ V we build a linear quadtree, denoted by LQi, for the objects which contain the keyword ti

Besides the black leaf nodes we also keep the quadtree node information ( signature )

1 for black leaf nodes and non-leaf nodes and 0 otherwise


Search algorithm

k=1, q.T={t1, t2}

Search Algorithm

Objects Accessed: p4, p1!


Direction aware spatial keyword search g li et al icde 2012

Direction-aware spatial keyword search[G. Li, et al., ICDE 2012]

  • Data

    • A set of spatio-textual objects

    • Each objects has a location and a set of keywords

  • Query

    • A location (q.loc)

    • A set of query keywords (q.T)

    • A direction [, ]

  • Answer

    • The closest k objects, each of which contains all keywords in q.T, and in the search direction


Spatial keyword based ranking g cong et al pvldb 2009 vldbj 2012

Spatial Keyword Based Ranking[G. Cong ,et al., PVLDB 2009, VLDBJ 2012]

Query

  • Spatial location

  • Query keywords

    Returns the k best objects ranked by

  • Spatial distance to the query location

  • Textual relevance to the query keywords

    Spatio-textual ranking Score

  • The spatial proximity (δ) is the normalized Euclidean distance between pand q

  • The textual relevance (θ) is the tf-idf based textual similarity between the description of p and the query keywords.

    Our Solution

  • the maximal keywords weight replaces the bit signature – aggregate inverted linear quadtree

  • spatial distance ranking function replaced by spatio-textual ranking score function

  • Score based pruning based on weight and region of the quadtree node


Experimental setting

Experimental Setting

Implemented in Java

Debian Linux

  • Intel Xeon 2.40GHz dual CPU

  • 4 GB memory

    Dataset

    GN : US Board on Geographic Names

    Tigers, Cars :

    • Spatial datasets from Rtree-Portal

    • Textual content from 20 Newsgroups

      SYN: synthetic dataset

      Query (1000) : location , #l query keywords

      Evaluate Response time and # I/O


Computer science and engineering

Important Statistics

Parameters evaluated


Tuning

Tuning

w’ : Minimal depth of the black leaf node

c: The split threshold

Best performance:

  • w’ = 8 and c = 64


Computer science and engineering

l: The number of query keywords

Gird :[ M. Christoforaki,et al., CIKM, 2011]

Grid+SIG: the extension of Grid, utilizing signaturetechnique


Algorithms evaluated

Algorithms Evaluated

ILQ

  • Inverted Linear Quadtree based techniques

    IVR

  • inverted Rtree [Y. Zhou, et al., CIKM 2005]

    MIR2

  • [I. D. Felipe,et al., ICDE 2008]

    KR

  • [R. Hariharan,et al., SSDBM 2007]

    WIR

  • [D. Wu ,et al., TKDE 2011]

    IR

  • [G. Cong ,et al., PVLDB 2009]

    CM-CDIR

  • [D. Wu ,et al., VLDBJ 2012]


Evaluation on different datasets

Evaluation on different datasets


Comparison varying l

Comparison – Varying l


Comparison varying k

Comparison – Varying k


Comparison varying parameters

Comparison – Varying Parameters


Conclusion

Conclusion

Important properties of indexing techniques to support top k spatial keyword search

Propose the inverted linear quadtree structure to efficiently support top k spatial keyword search

Extensive experiment on both real and synthetic data

Future work

Enhance the region based signature technique – group objects to reduce false positive.

Support top k spatial keyword search on other metric spaces


Computer science and engineering

Thank you!


Spatial keyword ranking query

Spatial Keyword Ranking Query

  • Our Algorithm

    • Aggregate ILQ

  • Compare with

  • IR [G. Cong, et al., PVLDB 2009]

  • CM-CDIR [D. Wu ,et al., VLDBJ 2012]

  • Dataset: Tiger


Direction aware topk sk query

Direction-Aware TOPK-SK Query

  • Our Algorithm

    • ILQ

  • Compare with

    • DESKS [G.Li,et al., ICDE 2012]


Comparison varying k1

Comparison – Varying k


Ir tree

IR-Tree


Kr tree

KR* Tree


  • Login