1 / 16

Querying The Web Database

Querying The Web Database. Michael J. Cafarella University of Michigan CS4HS August 18, 2010. Two kinds of databases. Structured databases (your bank) Expensive, hard to use Few sources of data Powerful queries “Who lives in Ypsilanti and has a balance between $800 and $1400?”

michel
Download Presentation

Querying The Web Database

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Querying The Web Database Michael J. Cafarella University of Michigan CS4HS August 18, 2010

  2. Two kinds of databases • Structured databases (your bank) • Expensive, hard to use • Few sources of data • Powerful queries • “Who lives in Ypsilanti and has a balance between $800 and $1400?” • Unstructured databases (the Web) • Cheap, easy to use • Many sources of data • Very boring “topic” queries • britney spears, etc.

  3. The Structured Web? • What if we had a structured-data version of everything on the Web? • “A Database of Everything” • “List all scientists from Belgium who were left-handed” • “Which heart surgeon in Michigan has the highest success rate?” • “List Miami hotels with hot tubs near a beach”

  4. This page contains 16 distinct HTML tables, but only one structured database

  5. WebTables Schema Statistics Applications • WebTables system automatically extracts dbs from web crawl • An extracted database is one table plus labeled columns • Estimate that our crawl of 14.1B raw HTML tables contains ~154M good structured dbs Raw crawled pages Raw HTML Tables Recovered Databases

  6. Easy Data Analysis • Knowledge worker queries for“city population”[VLDB08, “WebTables: Exploring…”, Cafarella et al]

  7. Auto Synonym Discovery

  8. Structure Autocomplete

  9. Conclusions • The Structured Web exists in raw form today, but tools largely ignore it • Information Extraction helps gather structural information from existing Web info • These techniques bring the promise of the Structured Web much closer

More Related