efficient keyword search over dblife dblp data
Download
Skip this Video
Download Presentation
Efficient Keyword Search over DBLife & DBLP Data

Loading in 2 Seconds...

play fullscreen
1 / 16

Efficient Keyword Search over DBLife & DBLP Data - PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on

Efficient Keyword Search over DBLife & DBLP Data. CS511 (Inprogress) Project Presentation, Dec-09-2005 Mayssam Sayyadian Nhung Nguyen Hieu Li. Introduction. DBLife: Manages Unstructured Data People are familiar with keyword searching unstructured data … but, DBLife  ER graph

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Efficient Keyword Search over DBLife & DBLP Data' - barid


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
efficient keyword search over dblife dblp data

Efficient Keyword Search over DBLife & DBLP Data

CS511 (Inprogress) Project Presentation, Dec-09-2005

Mayssam Sayyadian

Nhung Nguyen

Hieu Li

introduction
Introduction
  • DBLife: Manages Unstructured Data
    • People are familiar with keyword searching unstructured data
  • … but, DBLife  ER graph
    • Entities, mentions, etc. : structured data extracted
  • DBLP: Well known, available, enriched database of publications
    • DBLife does not cover all the data in DBLP
assumption
Assumption
  • Data is in relational format, not XML
  • DBMS provides text indexing at column level
    • Oracle, SQL Server, DB2, MySql, PostgreSQL
  • Support for XML data is subject of future work
basic model
Basic Model
  • Database: modeled as a graph
    • Nodes = tuples
    • Edges = references between tuples
      • foreign key, inclusion dependencies, ..
      • Edges are directed.

eTuner: Tuning Schema …

iMAP: Discovering …

paper

writes

Mayssam Sayyadian

AnHai Doan

Pedro Domingos

author

answer example
Answer Example

Query: Mayssam AnHai

paper

eTuner: Tuning Schema …

writes

writes

author

author

Mayssam

AnHai Doan

answer model
Answer Model
  • Query: set of keywords {k1, k2, .., kn}
    • Each keyword ki matches set of nodes Si
  • Answer: rooted, directed tree connecting nodes, with one node from each Si
    • Root node (we call it an information node) has special significance, may be restricted to some relations
      • E.g. relations representing entities, not relationships
  • Multiple answers ranked by a scoring function
score of result t
Score of Result T
  • Combining function Score combines scores of attribute values of T
  • One reasonable choice:

Score=aTScore(a)/size(T)

  • Attribute value scores Score(a)calculated using the DBMS\'s IR Index
implementation
Implementation

EasyDB Components

JSPs

Browser / Client

Java Beans

Java API

Http

DBLP

JDBC

Servlets

Http

Java API

DBLife

Web Server

searching over multiple databases system architecture

DBLP

DBLP

DBLife

DBLife

Searching over Multiple Databases: System Architecture

Preprocessing: Offline

Querying: Online

User

Index Builder

Q

IR Engine

DBLife

IR Index

DBLP

IR Index

Tuplesets

ForeignKey Joins

Top-k

Generator

Join

Discovery

Schema

Matching

+

SQL Queries

Distributed SQL Query Processor

top k generator
Top-K Generator
  • Contributions:
    • Iterative Refinement Algorithm
      • A unifying framework to search for Top-K best tuple-trees
    • Cast previous algorithms into IRA
    • Improve them substantially
ira framework
IRA Framework
  • Concepts:
    • Abstract State, Concrete State, Score Interval
  • IRA Alg: branch and bound search

1. Abstraction: Create initial abstract states

2. While less than k states output, iteratively:

(a) Evaluation: Update the score intervals

(b) Elimination: Eliminate (prune) the space of states

(c) Refinement: Select an abstract state and refine it

(d) If the goal state (the top-1 state) is found:

Output it and remove it.

ira example

iteration 1

iteration 2

iteration 3

K = {P2, P3}, min score = 0.7

.

.

.

.

.

.

P1 [0.6, 0.8]

P [0.6, 1]

.

P2 0.9

Res = {P2, R2}

min score = 0.85

.

.

.

Q [0.5, 0.7]

.

.

.

P3 0.7

R1 [0.4, 0.6]

.

.

.

.

.

.

.

R [0.4, 0.9]

R [0.4, 0.9]

R2 0.85

IRA - Example
ira algorithms
IRA Algorithms
  • Kite: straight forward adaptation of state of the art algorithm (hybrid) to IRA
  • aKite: adaptive Kite  able to change and adapt over time
  • daKite: adaptive Kite algorithm armed with more sophisticated refinement rules (read: more cost effective search heuristics)
preliminary experiments
Preliminary Experiments
  • Currently experiments over DBLP data
future work
Future Work
  • Better UI & Browsing facilities
  • User feedback
  • Extend to handle XML data
references
References
  • V. Hristidis, L. Gravano, Y. Papakonstantinou, “Efficient IR-Style Keyword Search over Relational Databases”
  • S. Agrawal, S. Chaudhuri, G Das, “DBXplorer: A System for Keyword Search over Relational Databases”
  • G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabati, “Keyword Searching and Browsing in Databases using BANKS”
ad