1 / 14

A New Web Semantic Annotator Enabling A Machine Understandable Web

A New Web Semantic Annotator Enabling A Machine Understandable Web. BYU Spring Research Conference 2005 Yihong Ding. Sponsored by NSF. Ontology. Machine Understandable Web. Content is represented in commonly shared, explicitly defined, generic conceptualizations.

robbiej
Download Presentation

A New Web Semantic Annotator Enabling A Machine Understandable Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A New Web Semantic Annotator Enabling A Machine Understandable Web BYU Spring Research Conference 2005 Yihong Ding Sponsored by NSF

  2. Ontology Machine Understandable Web • Content is represented in • commonly shared, • explicitly defined, • generic conceptualizations. • Also known as the Semantic Web

  3. Why Machine Understandable? • Meaningful data • Exchangeable information • Interoperable programs/services • “… allows data to be shared and reused across application, enterprise, and community boundaries …” --- Tim Berners-Lee etc. 2001

  4. Semantic Annotation: A Way to Achieve Machine Understandable • Add explicit, formal, and unambiguous notes to web documents • Explicit: publicly accessible • Formal: publicly agreeable • Unambiguous: publicly identifiable

  5. Ontology-based IE Wrapper Document Semantic Annotation Using Automated IE Engines Non-ontology-based IE Wrapper Document

  6. Augmentations for the Annotator Semantic annotator using data-extraction ontologies: • a two-layer annotation model to achieve fast, high accurate, and resilient semantic annotation • a divide-and-conquer style architecture to scale system to large domains • a web ontology language augmentation to compliment OWL for semantic annotation purposes

  7. Same-Layout Documents Two-Layer Annotation Model Massive Annotation Process Structural Annotator Document Sample Annotation Process Conceptual Annotator using ontology-based IE tool

  8. Two-Layer Annotation Model, Benefits • Achieve both resiliency and fast speed of execution • Require no training for generating structural annotators • Demand no labeling to results from structural annotators

  9. Scalability Issues • Large domain containing many concepts • Large annotation task dealing with many web pages

  10. Observation • A large domain is a combination of several small domains. • Consistently clustered domains exist, where each this type of domain is • Composed with same cluster of concepts • Consistent to any larger domain in which it participates • Usually with small number of concepts

  11. (1) Selected Domain Ontologies (2) Document Document • Text classification • Scalable annotation Collection of small atomic domain ontologies …… Divide-and-Conquer Style Architecture for Scalability Issue

  12. Divide-and-Conquer, Benefits • Comparing to large ontologies, small ontologies are • Simpler to construct • Faster to execute • Easier to check and update • More convenient to reuse • Identify the range of an ontology dynamically in the web page level • Avoid the problem of narrowing a large domain ontology down to the web page level • Maximize the reuse of existing ontologies

  13. Ontology Representation • Two ontology languages • Data-extraction ontology (OSMX) • Semantic web ontology (OWL) • Language unification

  14. Contributions • Automatically semantic annotator using ontology-based IE wrapper • Two level annotation: layout-based annotator on top of conceptual annotator • Divide-and-conquer style solution to scale annotation process to large number of concepts • Web ontology language unification

More Related