1 / 17

Databases & Information Retrieval

Databases & Information Retrieval. Maya Ramanath. ( Further Reading : Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G. Weikum , G. Kasneci , M. Ramanath and F.M. Suchanek , CACM, April 2009

kara
Download Presentation

Databases & Information Retrieval

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Databases & Information Retrieval Maya Ramanath (Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G. Weikum, G. Kasneci, M. Ramanath and F.M. Suchanek, CACM, April 2009 DB & IR: Both Sides Now. G. Weikum, Keynote at SIGMOD 2007)

  2. DB and IR: Different Motivations • Both deal with large amounts of information, but…

  3. Why Combine Now? • The applications drive the need • The need to manage both structured and unstructured data in an integrated manner • Healthcare example • Find young patients in central Europe who have been reported, in the last two weeks, to have symptoms of tropical virus diseases and an indication of anomalies. • Newspaper archives, product catalogues, etc.

  4. Integrating DB & IR top-k processing, keyword search on graphs IR Systems Untructured queries / ranked results (keywords/top-k) query processing for text search, effective query interfaces, ranking for structured data DB Systems extracting entities and relationships, ranking for entities Structured queries / boolean match results (SQL) Structured data (relational) Unstructured data (text)

  5. Modules • Top-k processing • Query Processing and Interfaces • Keyword Search on Graphs • Entity and Relationship Extraction • Ranking and Structured Data

  6. 1. Top-k Processing (1/2) • Structured data, with scores in multiple dimensions • Return the top-k “objects”

  7. 1. Top-k Processing (2/2) • Top-k Joins • Example: Return the best house-school pair

  8. 2. Query Processing and Interfaces (1/3) • Given: Database of text documents and a text-centric task. • Extract information about disease outbreaks • Strategies • Scan all documents – very expensive • Filter promising documents – affects recall • Develop cost models and execution strategies appropriate for this setting

  9. 2. Query Processing and Interfaces (2/3) Querying with “typed” keywords • Keyword querying: Easy to use • Structured queries: Precise Find the middle ground… Instead of “german has won nobel award” q(X) :- GERMAN(x), hasWonPrize(x,y), NOBEL_PRIZE(y) • “german, has won (nobel award)”

  10. 2. Query Processing and Interfaces (3/3) • Does the output have to be a boring list of ranked results? • Nope !

  11. 3. Keyword Search on Graphs (1/3) • Lots of graphs around • Relational DB (tuples+foreign keys) • XML data (elements/sub-elements/id/idrefs) • RDF (graph-structured knowledge-bases) • Easy to query with keywords, instead of SQL/XQuery/SPARQL • Results are the top-k interconnections between the keywords

  12. 3. Keyword Search on Graphs (2/3)

  13. 3. Keyword Search on Graphs (3/3) Query: “Einstein”, “Bohr” Tom Cruise vegetarian isa isa bornIn Einstein won 1962 won Nobel Prize Bohr diedIn

  14. 4. Entity and Relationship Extraction (1/2) Information Extraction (or Knowledge Harvesting) Apple was established on April 1, 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne. Infosys was founded on 2 July 1981 by seven entrepreneurs: N. R. Narayana Murthy, NandanNilekani, … Bill Gates was the founder of Microsoft and later it’s CEO.

  15. 4. Entity and Relationship Extraction (2/2) • How to build a knowledge-base of facts? • Structurize Wikipedia • Construct rules for extraction • How do I acquire all the facts in the world? • Extract “everything” • Don’t stop extracting

  16. 5. Ranking and Structured Data • Not the same as top-k processing • Given: Data with stucture in it • Relational tables (flat) • XML (trees/graphs) • Text documents consisting of entities • Task: Rank the query results • SQL/Xquery/”typed” keywords

  17. Questions?

More Related