1 / 11

Databases and Information Retrieval: Rethinking the Great Divide

Databases and Information Retrieval: Rethinking the Great Divide. SIGMOD Panel 14 Jun 2005 Jayavel Shanmugasundaram Cornell University. The Great Data Divide. The Great Query Divide. 10000 Foot View of Data Management. Information Retrieval Systems. Ranked Keyword Search. Queries.

eddy
Download Presentation

Databases and Information Retrieval: Rethinking the Great Divide

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databases and Information Retrieval:Rethinking the Great Divide SIGMOD Panel 14 Jun 2005 Jayavel Shanmugasundaram Cornell University

  2. The Great Data Divide The Great Query Divide 10000 Foot View of Data Management Information Retrieval Systems Ranked Keyword Search Queries Complex and Structured Database Systems Structured Unstructured Data

  3. Bridging the Great Divide • Option 1: Tie together existing DB and IR systems • Example: Approaches based on SQL/MM • Option 2: Extend existing DB systems with IR functionality, or vice versa • Example: Add searching and ranking to RDBMSs • Option 3: Design a new data management system from the ground-up • Example: Quark data management system

  4. Why Option 1 Wont Work Information Retrieval Systems Ranked Keyword Search Queries Complex and Structured Database Systems Structured Unstructured Data

  5. Bridging the Great Divide • Option 1: Tie together existing DB and IR systems • Example: Approaches based on SQL/MM • Drawback: Not powerful enough • Option 2: Extend existing DB systems with IR functionality, or vice versa • Example: Add searching and ranking to RDBMSs • Option 3: Design a new data management system from the ground-up • Example: Quark data management system

  6. <workshopdate=”28 July 2000”> <title> XML and Information Retrieval: A SIGIR 2000 Workshop </title> <editors> David Carmel, Yoelle Maarek, Aya Soffer </editors> <proceedings> <paperid=”1”> <title> XQL and Proximal Nodes </title> <author> Ricardo Baeza-Yates </author> <author> Gonzalo Navarro </author> <abstract> We consider the recently proposed language … </abstract> <sectionname=”Introduction”> Searching on structured text is becoming more important with XML … </section> … <citexmlns:xlink=”http://www.acm.org/www8/paper/xmlql> … </cite> </paper> … Find relevant elements in important workshops between the years 1999 and 2001 that are about ‘Ricardo’ and ‘XML’

  7. Why Extending (R)DBMSs Won’t Work • Violates many assumptions “hardwired” into current database systems • Structured queries over structured fields, keyword search queries over text fields • Is author name a structured or text field? • Operators have precise, well-defined semantics • Even the query result is not well-defined – do we return a paper or a workshop? • Scoring is an attribute tacked on as a relational attribute • How can this scoring generalize IR scoring?

  8. Why Extending IR Systems Won’t Work • IR systems provide little support for structured data • No support for complex operators • How can complex queries be evaluated? • Scoring does not take structure into account • How can scoring capture both structured and unstructured data?

  9. Bridging the Great Divide • Option 1: Tie together existing DB and IR systems • Example: Approaches based on SQL/MM • Drawback: Not powerful enough • Option 2: Extend existing DB systems with IR functionality, or vice versa • Example: Add searching and ranking to RDBMSs • Drawback: Shoehorns alien functionality into already complex systems • Option 3: Design a new data management system from the ground-up • Example: Quark data management system

  10. Why Option 3 Will Work • Designed ground-up with three principles • Structural data independence • Users can issues any query (complex and keyword) over any data (structured and unstructured) • Generalized scoring • Scoring works over any mix of structured and unstructured data (e.g., XRank over HTML and XML) • Flexible query language • Allows for arbitrary return results and scores (e.g., TeXQuery, precursor to XQuery Full-Text, NEXI)

  11. Bridging the Great Divide • Option 1: Tie together existing DB and IR systems • Example: Approaches based on SQL/MM • Drawback: Not powerful enough • Option 2: Extend existing DB systems with IR functionality, or vice versa • Example: Add searching and ranking to RDBMSs • Drawback: Shoehorns alien functionality into already complex systems • Option 3: Design a new data management system from the ground-up • Example: Quark data management system • Most promising alternative!

More Related