Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, Lizhu Zhou Tsinghua University

EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, Lizhu Zhou Tsinghua University SIGMOD 2008 2009. 03. 19. Summarized by Jaehui Park, IDS Lab., Seoul National University Presented by Jaehui Park, IDS Lab., Seoul National University

INTRODUCTION • Keyword search capability into text documents, XML documents, and relational databases • Graph index • Instead of traditional inverted index • Effective for unstructured data • Inadequate for complex structural information. • EASE (Efficient and Adaptive keyword Search method) • Efficient algorithmic basis for scalable top-k-style processing of large amounts of heterogeneous data • Employing and adaptive, efficient and novel index

Contributions Model for unstructured, semi-structured and structured data as graphs Effective graph index as opposed to the inverted index Novel ranking mechanism for both DB and IR viewpoint Extensive performance study

Motivation • Unstructured • Link awareness • Relevant data may be separated into different pages but linked through hyperlinks • (Semi-) Structured • LCA (Lowest common ancestors) • Connected tree with minimal cost • Ex) Steiner trees

r-Radius Steiner Graph Problem • Meaningful Steiner graphs with acceptable sizes • Several concepts • Centric distance • Radius • r-Radius Steiner tree • Radius of a Steiner graph cannot be larger than r

Example DBLP example

The r-Radius Seiner Graph Problem Given a graph and an input keyword query K, the r-Radius Seiner Graph Problem is to find all the r-radius Steiner graphs in , which contain all or a portion of the input keywords in K, ranked by relevancy with K.

EASE: An adaptive search method • Inverted indices are not effective for discovering the much richer structural relationships existing in databases with complicated structured [10]. • Index r-radius Steiner graphs for each combination • Very expensive • Proposed method • 1. Discover r-radius graphs (indexing) • 2. Extracting r-radius Steiner graphs (on the fly) • By removing non-Steiner nodes

EASE: An adaptive search method • Adjacency Matrix • Extracting r-radius graphs effectively

EASE: An adaptive search method • Determining the subgraph that are r-radius graphs • By Lemma 1. • For efficient retrieval of r-radius graphs • Graph index • r-radius graph that contain query keywords k • Extracting r-radius Steiner graphs • By Theorem 1.

EASE: An adaptive search method Computing the Steiner nodes 11

EASE: An adaptive search method • Maximal r-Radius Graph • Avoid redundancy • Keep the maximal r-radius graphs in the graph index • Overlapping graphs • Graph partitioning • Avoid the incurrence of huge storage • Only need to retrieve the corresponding relevant graph partitions • Graph similarity • Bigger overlap -> higher similarity

Summary 1. Obtain adjacency matrix M 2. Compute Mr 3. Extract the maximal r-radius graphs 4. Cluster the graphs by employing the existing K-means algorithm and partition the graph 5. Construct the graph index to materialize the maximal r-radius graphs

Others • Ranking Functions • TF-IDF based IR-ranking • Structural Compactness-based DB Ranking • Intuitively, when an r-radius Steiner graph SG is more compact, SG is more likely to be meaningful and relevant. • Indexing

Experimental study • Dataset: DBLife, DBLP and IMDB • Comparison • Unstructured • InfoUnit [18] • Semi-structured • SLCA [28] • Structured • DPBF [6]

Experimental study

Conclusion • Proposed an efficient and adaptive keyword search method • EASE • Keyword queries over unstructured, semi-structured and structure data • Examined the issues of indexing and ranking • By taking into account both the structural compactness • Experimental results shows that EASE achieves both high search efficiency and quality for keyword search over heterogeneous data.

Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, Lizhu Zhou Tsinghua University

Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong Wang, Lizhu Zhou Tsinghua University

Presentation Transcript

Efficient Interactive Fuzzy Keyword Search Shengyue Ji 1 , Guoliang Li 2 , Chen Li 1 , Jianhua Feng 2 1 University

Joint work: Yuezhi Zhou, Yaoxue Zhang, Tsinghua University, China Yinglian Xie, Carnegie Mellon University

Ying QIU, Jianying ZHOU, Feng BAO

Nianle Su, Hongtao Hou, Feng Yang, Qun Li and Weiping Wang

Li Zhihong Zhou Qifang Shen Zuorui Wang Huimin China Agricultural University

Tsinghua University Supernova Program

Wang Li (wangli@mail.ritt)

Dongqian Wang Bing Zhou Chenghu Sun

Jianhua Yu University of Utah

Chua-Chin Wang, Chi-Chun Huang, Ching-Li Lee, and Tsai-Wen Cheng

Ying QIU, Jianying ZHOU, Feng BAO

Wang Li (wangli@mail.ritt)

Yajin Zhou Zhi Wang Wu Zhou Xuxian Jiang NDSS 2012

Kai Zhou (Tsinghua University,Beijing)

Generalized Inverted Index for Keyword Search Authors: Hao Wu, Guoliang Li, Lizhu Zhou

Feng Wei, Zhaoxia Wang, Yun Wu

Ying QIU, Jianying ZHOU, Feng BAO

Li Wang, Yaozong Gao, Feng Shi, Gang Li, Dinggang Shen Presented by Li Wang 09-18-2014

Dongqian Wang Bing Zhou Chenghu Sun

Tsinghua University Prof. Jianmin Li