ode ontology assisted data extraction
Download
Skip this Video
Download Presentation
ODE: Ontology-Assisted Data Extraction

Loading in 2 Seconds...

play fullscreen
1 / 7

ODE: Ontology-Assisted Data Extraction - PowerPoint PPT Presentation


  • 123 Views
  • Uploaded on

ODE: Ontology-Assisted Data Extraction. Weifeng Su, Jiying Wang, Frederick H. Lochovsky Summarized by Joseph Park. Overview. “Web databases…compose what is referred to as the deep Web” The goal of data extraction:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' ODE: Ontology-Assisted Data Extraction' - meryl


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
ode ontology assisted data extraction

ODE: Ontology-Assisted Data Extraction

Weifeng Su, Jiying Wang, Frederick H. Lochovsky

Summarized by Joseph Park

overview
Overview
  • “Web databases…compose what is referred to as the deep Web”
  • The goal of data extraction:
    • (1) Query result sectionidentification - decides what section in a dynamically generated query result page contains the data that need to be extracted.
    • (2) Record segmentation - segments the query result section into records and extracts them.
    • (3) Data value alignment - aligns the data values from multiple records that belong to the same attribute so that they can be arranged into a table.
    • (4) Label assignment - assigns a suitable, meaningful label (i.e., an attribute name) to each column in an aligned table.
problems
Problems
  • Automatically extract data from query results
  • Limitations of other systems:
    • Incapable of processing either zero or few query results.
    • Vulnerable to optional and disjunctive attributes.
    • Incapable of processing nested data structures.
    • No label assignment.
approach
Approach
  • ODE – Ontology-assisted data extraction
  • PADE wrapper
  • Query result annotation
  • Attribute matching
  • Ontology construction
approach continued
Approach continued
  • Query result section identification
  • Record segmentation
  • Data value alignment and label assignment
    • MaxEnt model is used
experimental results
Experimental Results

Extraction performed using DeLa

conclusion
Conclusion
  • Can only label attributes that appear in query result pages
  • References a few DEG papers
    • DKE99, Tisp, TANGO
  • Could take advantage of MaxEnt for pre-labeling data
  • Need to look into DeLa for data extraction
ad