Efficient keyword search over dblife dblp data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Efficient Keyword Search over DBLife & DBLP Data PowerPoint PPT Presentation


  • 68 Views
  • Uploaded on
  • Presentation posted in: General

Efficient Keyword Search over DBLife & DBLP Data. CS511 (Inprogress) Project Presentation, Dec-09-2005 Mayssam Sayyadian Nhung Nguyen Hieu Li. Introduction. DBLife: Manages Unstructured Data People are familiar with keyword searching unstructured data … but, DBLife  ER graph

Download Presentation

Efficient Keyword Search over DBLife & DBLP Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Efficient keyword search over dblife dblp data

Efficient Keyword Search over DBLife & DBLP Data

CS511 (Inprogress) Project Presentation, Dec-09-2005

Mayssam Sayyadian

Nhung Nguyen

Hieu Li


Introduction

Introduction

  • DBLife: Manages Unstructured Data

    • People are familiar with keyword searching unstructured data

  • … but, DBLife  ER graph

    • Entities, mentions, etc. : structured data extracted

  • DBLP: Well known, available, enriched database of publications

    • DBLife does not cover all the data in DBLP


Assumption

Assumption

  • Data is in relational format, not XML

  • DBMS provides text indexing at column level

    • Oracle, SQL Server, DB2, MySql, PostgreSQL

  • Support for XML data is subject of future work


Basic model

Basic Model

  • Database: modeled as a graph

    • Nodes = tuples

    • Edges = references between tuples

      • foreign key, inclusion dependencies, ..

      • Edges are directed.

eTuner: Tuning Schema …

iMAP: Discovering …

paper

writes

Mayssam Sayyadian

AnHai Doan

Pedro Domingos

author


Answer example

Answer Example

Query: Mayssam AnHai

paper

eTuner: Tuning Schema …

writes

writes

author

author

Mayssam

AnHai Doan


Answer model

Answer Model

  • Query: set of keywords {k1, k2, .., kn}

    • Each keyword ki matches set of nodes Si

  • Answer: rooted, directed tree connecting nodes, with one node from each Si

    • Root node (we call it an information node) has special significance, may be restricted to some relations

      • E.g. relations representing entities, not relationships

  • Multiple answers ranked by a scoring function


Score of result t

Score of Result T

  • Combining function Score combines scores of attribute values of T

  • One reasonable choice:

    Score=aTScore(a)/size(T)

  • Attribute value scores Score(a)calculated using the DBMS's IR Index


Implementation

Implementation

EasyDB Components

JSPs

Browser / Client

Java Beans

Java API

Http

DBLP

JDBC

Servlets

Http

Java API

DBLife

Web Server


Searching over multiple databases system architecture

DBLP

DBLP

DBLife

DBLife

Searching over Multiple Databases: System Architecture

Preprocessing: Offline

Querying: Online

User

Index Builder

Q

IR Engine

DBLife

IR Index

DBLP

IR Index

Tuplesets

ForeignKey Joins

Top-k

Generator

Join

Discovery

Schema

Matching

+

SQL Queries

Distributed SQL Query Processor


Top k generator

Top-K Generator

  • Contributions:

    • Iterative Refinement Algorithm

      • A unifying framework to search for Top-K best tuple-trees

    • Cast previous algorithms into IRA

    • Improve them substantially


Ira framework

IRA Framework

  • Concepts:

    • Abstract State, Concrete State, Score Interval

  • IRA Alg: branch and bound search

1. Abstraction: Create initial abstract states

2. While less than k states output, iteratively:

(a) Evaluation: Update the score intervals

(b) Elimination: Eliminate (prune) the space of states

(c) Refinement: Select an abstract state and refine it

(d) If the goal state (the top-1 state) is found:

Output it and remove it.


Ira example

iteration 1

iteration 2

iteration 3

K = {P2, P3}, min score = 0.7

.

.

.

.

.

.

P1 [0.6, 0.8]

P [0.6, 1]

.

P2 0.9

Res = {P2, R2}

min score = 0.85

.

.

.

Q [0.5, 0.7]

.

.

.

P3 0.7

R1 [0.4, 0.6]

.

.

.

.

.

.

.

R [0.4, 0.9]

R [0.4, 0.9]

R2 0.85

IRA - Example


Ira algorithms

IRA Algorithms

  • Kite: straight forward adaptation of state of the art algorithm (hybrid) to IRA

  • aKite: adaptive Kite  able to change and adapt over time

  • daKite: adaptive Kite algorithm armed with more sophisticated refinement rules (read: more cost effective search heuristics)


Preliminary experiments

Preliminary Experiments

  • Currently experiments over DBLP data


Future work

Future Work

  • Better UI & Browsing facilities

  • User feedback

  • Extend to handle XML data


References

References

  • V. Hristidis, L. Gravano, Y. Papakonstantinou, “Efficient IR-Style Keyword Search over Relational Databases”

  • S. Agrawal, S. Chaudhuri, G Das, “DBXplorer: A System for Keyword Search over Relational Databases”

  • G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabati, “Keyword Searching and Browsing in Databases using BANKS”


  • Login