Ode ontology assisted data extraction
Download
1 / 7

ODE: Ontology-Assisted Data Extraction - PowerPoint PPT Presentation


  • 117 Views
  • Uploaded on

ODE: Ontology-Assisted Data Extraction. Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park. Overview. “Web databases…compose what is referred to as the deep Web” The goal of data extraction:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' ODE: Ontology-Assisted Data Extraction' - meryl


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Ode ontology assisted data extraction

ODE: Ontology-Assisted Data Extraction

Weifeng Su, Jiying Wang, Frederick H. Lochovsky

Summarized by Joseph Park


Overview
Overview

  • “Web databases…compose what is referred to as the deep Web”

  • The goal of data extraction:

    • (1) Query result sectionidentification - decides what section in a dynamically generated query result page contains the data that need to be extracted.

    • (2) Record segmentation - segments the query result section into records and extracts them.

    • (3) Data value alignment - aligns the data values from multiple records that belong to the same attribute so that they can be arranged into a table.

    • (4) Label assignment - assigns a suitable, meaningful label (i.e., an attribute name) to each column in an aligned table.


Problems
Problems

  • Automatically extract data from query results

  • Limitations of other systems:

    • Incapable of processing either zero or few query results.

    • Vulnerable to optional and disjunctive attributes.

    • Incapable of processing nested data structures.

    • No label assignment.


Approach
Approach

  • ODE – Ontology-assisted data extraction

  • PADE wrapper

  • Query result annotation

  • Attribute matching

  • Ontology construction


Approach continued
Approach continued

  • Query result section identification

  • Record segmentation

  • Data value alignment and label assignment

    • MaxEnt model is used


Experimental results
Experimental Results

Extraction performed using DeLa


Conclusion
Conclusion

  • Can only label attributes that appear in query result pages

  • References a few DEG papers

    • DKE99, Tisp, TANGO

  • Could take advantage of MaxEnt for pre-labeling data

  • Need to look into DeLa for data extraction


ad