160 likes | 293 Views
This presentation discusses the development of DBLife, a system for managing unstructured data through keyword searching, utilizing relational data extraction from the well-known DBLP database. It highlights the transformation of data into an entity-relationship graph, addressing the limitations of DBLife in covering DBLP's full dataset. The presentation outlines the basic model, including node and edge definitions, query examples, and innovative frameworks such as Iterative Refinement Algorithm (IRA) for top-K tuple-tree searches. Future work includes enhancing user interfaces and extending capabilities to handle XML data.
E N D
Efficient Keyword Search over DBLife & DBLP Data CS511 (Inprogress) Project Presentation, Dec-09-2005 Mayssam Sayyadian Nhung Nguyen Hieu Li
Introduction • DBLife: Manages Unstructured Data • People are familiar with keyword searching unstructured data • … but, DBLife ER graph • Entities, mentions, etc. : structured data extracted • DBLP: Well known, available, enriched database of publications • DBLife does not cover all the data in DBLP
Assumption • Data is in relational format, not XML • DBMS provides text indexing at column level • Oracle, SQL Server, DB2, MySql, PostgreSQL • Support for XML data is subject of future work
Basic Model • Database: modeled as a graph • Nodes = tuples • Edges = references between tuples • foreign key, inclusion dependencies, .. • Edges are directed. eTuner: Tuning Schema … iMAP: Discovering … paper writes Mayssam Sayyadian AnHai Doan Pedro Domingos author
Answer Example Query: Mayssam AnHai paper eTuner: Tuning Schema … writes writes author author Mayssam AnHai Doan
Answer Model • Query: set of keywords {k1, k2, .., kn} • Each keyword ki matches set of nodes Si • Answer: rooted, directed tree connecting nodes, with one node from each Si • Root node (we call it an information node) has special significance, may be restricted to some relations • E.g. relations representing entities, not relationships • Multiple answers ranked by a scoring function
Score of Result T • Combining function Score combines scores of attribute values of T • One reasonable choice: Score=aTScore(a)/size(T) • Attribute value scores Score(a)calculated using the DBMS's IR Index
Implementation EasyDB Components JSPs Browser / Client Java Beans Java API Http DBLP JDBC Servlets Http Java API DBLife Web Server
DBLP DBLP DBLife DBLife Searching over Multiple Databases: System Architecture Preprocessing: Offline Querying: Online User Index Builder Q IR Engine DBLife IR Index DBLP IR Index Tuplesets ForeignKey Joins Top-k Generator Join Discovery Schema Matching + SQL Queries Distributed SQL Query Processor
Top-K Generator • Contributions: • Iterative Refinement Algorithm • A unifying framework to search for Top-K best tuple-trees • Cast previous algorithms into IRA • Improve them substantially
IRA Framework • Concepts: • Abstract State, Concrete State, Score Interval • IRA Alg: branch and bound search 1. Abstraction: Create initial abstract states 2. While less than k states output, iteratively: (a) Evaluation: Update the score intervals (b) Elimination: Eliminate (prune) the space of states (c) Refinement: Select an abstract state and refine it (d) If the goal state (the top-1 state) is found: Output it and remove it.
iteration 1 iteration 2 iteration 3 K = {P2, P3}, min score = 0.7 . . . . . . P1 [0.6, 0.8] P [0.6, 1] . P2 0.9 Res = {P2, R2} min score = 0.85 . . . Q [0.5, 0.7] . . . P3 0.7 R1 [0.4, 0.6] . . . . . . . R [0.4, 0.9] R [0.4, 0.9] R2 0.85 IRA - Example
IRA Algorithms • Kite: straight forward adaptation of state of the art algorithm (hybrid) to IRA • aKite: adaptive Kite able to change and adapt over time • daKite: adaptive Kite algorithm armed with more sophisticated refinement rules (read: more cost effective search heuristics)
Preliminary Experiments • Currently experiments over DBLP data
Future Work • Better UI & Browsing facilities • User feedback • Extend to handle XML data
References • V. Hristidis, L. Gravano, Y. Papakonstantinou, “Efficient IR-Style Keyword Search over Relational Databases” • S. Agrawal, S. Chaudhuri, G Das, “DBXplorer: A System for Keyword Search over Relational Databases” • G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabati, “Keyword Searching and Browsing in Databases using BANKS”