Managing unstructured data
This presentation is the property of its rightful owner.
Sponsored Links
1 / 5

Managing Unstructured Data PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on
  • Presentation posted in: General

Managing Unstructured Data. AnHai Doan University of Wisconsin-Madison. Unstructured Data. Appears in many forms emails, Web pages, memos, call center text record, etc. Is pervasive 80% of the world data, and is growing Managed by many players

Download Presentation

Managing Unstructured Data

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Managing unstructured data

Managing Unstructured Data

AnHai Doan

University of Wisconsin-Madison


Unstructured data

Unstructured Data ...

  • Appears in many forms

    • emails, Web pages, memos, call center text record, etc.

  • Is pervasive

    • 80% of the world data, and is growing

  • Managed by many players

    • SIGIR/WWW/KDD/AAAI, Google/Yahoo/Microsoft/IBM

We should work on it, or risk missing the boat!

But what sets us apart from the above guys?


Structure system focus

Structure + System Focus!

  • Make it very easy to extract structures from raw data

    • in raw form  keyword search / bag analysis

    • many apps want to go beyond that, they want structure

    • we should encourage this  back to our play ground

    • not just DB + IR, but DB + IR + IE

  • Instead of working on isolated research problems, lets build end-to-end UDMS

    • should repeat what we did with System R / Ingres: system blueprint, followed by 20 years of rapid progress

    • unifies & accelerate our research efforts

    • keeps work grounded, make impact


What does this system look like

What Does this System Look Like?

DB + IR + IE + II, in a best-effort, Web 2.0 fashion

Joe Hellerstein

Flexible modes

of interaction

Extraction +

Integration

Joe Six-Pack

Mass collaboration

Best-effort, pay-as-you-go, improving over time

Scale up to huge data (by running over clusters)


Broader impacts

Broader Impacts

  • Great for many current applications

    • e-science, business, personal data, Web data, etc.

  • Great for many current research topics

    • IR, integration, PIM, data spaces

    • user interfaces, HCI, mashup

    • provenance, uncertainty

    • cluster management

    • query processing

    • monitoring, handling changes, pub/sub systems

  • Raises novel research issues

    • mass collab, best-effort, extraction, helping Joe Six-Pax

  • Helps define data mgt principles in broader contexts


  • Login