1 / 35

Semantic web Bootstrapping & Annotation

Learn about the concept of annotation, its importance, and different methods for implementation in the context of the Semantic Web. Discover how a crawler, annotation model, and various annotation methods are utilized for effective implementation.

abramg
Download Presentation

Semantic web Bootstrapping & Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Semantic web Bootstrapping & Annotation Hassan Sayyadi sayyadi@ce.sharif.edu Semantic web research laboratory Computer department Sharif university of technology

  2. Outline • What is annotation? • Why use annotation? • Crawler • Annotation model • Annotation methods • Our Implementation

  3. Outline • What is annotation? • Why use annotation? • Crawler • Annotation model • Annotation methods • Our Implementation

  4. What is annotation? • People make notes to themselves in order to preserve ideas that arise during a variety of activities • The purpose of these notes is often to summarize, criticize, or emphasize specific phrases or events • Semantic annotations are to tag ontology class instance data and map it into ontology classes.

  5. Outline • What is annotation? • Why use annotation? • Crawler • Annotation model • Annotation methods • Our Implementation

  6. Why use annotation? • To have the world knowledge at one's finger tips seems possible. • The Internet is the platform for information. • Unfortunately most of the information is provided in an unstructured and non-standardized form.

  7. Why use annotation? (continue)

  8. Outline • What is annotation? • Why use annotation? • Crawler • Annotation model • Annotation methods • Our Implementation

  9. Crawler • A crawler is a program which traverses the Internet following these links from one page to the next.

  10. Focused crawler • Not all the Internet knowledge is required for every query. • This assumption seems reasonable because most people work on a restricted domain and do not need the knowledge of the whole Internet • Searching the whole Internet in this case is very inefficient and expensive. • Free texts in the Internet contain various information in diverse domains.

  11. Focused crawler (continue) • The focus can be achieved by examining keywords • Problems: • “Understanding“ the semantic of document • Extremely focusing on one topic • Another way to focus is the Internet connectivity structure

  12. Outline • What is annotation? • Why use annotation? • Crawler • Annotation model • Annotation methods • Our Implementation

  13. Annotation models • Mark in web page • Example: • SUT is one of the largest engineering schools in the Islamic Republic of Iran • <university>SUT</university> is one of the largest universities in the <country>Islamic Republic of Iran</country>

  14. Annotation models (continue) • Generate RDF • Example: • SUT is one of the largest engineering schools in the Islamic Republic of Iran • <rdf:Description rdf:about="http://sharif.edu/#SUT"> <rdf:type>university</rdf:type> <SHARIF:be_in rdf:resource="http://sharif.edu/#Islamic+Republic+of+Iran"/> </rdf:Description> <rdf:Descriptionrdf:about="http://sharif.edu/#Islamic+Republic+of+Iran”> <rdf:type>Country</rdf:type> </rdf:Description>

  15. Outline • What is annotation? • Why use annotation? • Crawler • Annotation model • Annotation methods • Our Implementation

  16. Annotation methods • Manually • Semi-automatically • Automatically

  17. Automatic Annotation • The fully automatic creation of semantic annotations is an unsolved problem. • Automatic semantic annotation for the natural language sentences in these pages is a daunting task and we are often forced to do it manually or semi-automatically using handwritten rules

  18. Manual Annotation • Manual annotation is more easily accomplished today, using authoring tools, which provide an integrated environment for simultaneously authoring and annotating text. • However, the use of human annotators is often fraught with errors due to factors such as annotator familiarity with the domain, amount of training, personal motivation and complex schemas • Manual annotation is also an expensive process

  19. Semi-automatic Annotation • To overcome the annotation acquisition bottleneck, semiautomatic annotation of documents has been proposed.

  20. Semi-automatic annotation • assumptions: • vocabulary set is limited • word usage has patterns • semantic ambiguities are rare • terms and jargon of the domain appear frequently

  21. Semantic Annotation Platform (SAP)

  22. Multistrategy SAPs • Multistrategy SAPs are able to combine methods from both pattern-based and machine learning-based systems. • No SAP currently implements the multistrategy approach for semantic annotation, although it has been implemented in systems for ontology extraction (such as On-To-Knowledge)

  23. Semi-automatic annotation (continue) • Example • I go to Shanghai • Link structure is more like a RDF graph

  24. The accuracy of concepts and relations about different algorithm

  25. Automatic annotation

  26. Source preprocessing • Document Object Model (DOM) • Text Model • Layout Model • NLP Model

  27. Information Identification • Operators • perform extraction actions on document access models • Retrieval, Check, Execute • Strategies • build operator sequences according to user time and quality requirements • Source Description • build operator sequences according to user time and quality requirements

  28. Ontology population • The final stage of the overall process is to decide which hypothesis represents the extracted information to insert into the ontology • The module simulates insertions and calculates the cost according to the number of new instance creations, instance modifications or inconsistencies found

  29. Outline • What is annotation? • Why use annotation? • Crawler • Annotation model • Annotation methods • Our Implementation

  30. Our implementation • Crawler: • Crawl all link that contains: • sharif.ir • sharif.edu • sharif.ac.ir

  31. Our implementation • Source pre-processing • Html to text • text = text.replaceAll("\n", "*_newline_*"); • text = text.replaceAll("\\<script.*?\\</script\\>", ""); • text = text.replaceAll("\\<style.*?</style.*\\>", ""); • text = text.replaceAll("<\\!--.*?--\\>", ""); • text = text.replaceAll("\\<.*?\\>", ""); • text = text.replaceAll("&nbsp;", " "); • text = text.replaceAll("&lt;", "<"); • … • text = text.replaceAll("\\*_newline_\\*", "\n"); • Additional • text = text.replaceAll("\n(\n|| )*\n","."); • text = text.replaceAll(",", " and ");

  32. Our implementation • Information extraction: • JMontyLingua • SUT is one of the largest engineering schools in the Islamic Republic of Iran • ("be" "SUT" "one" "of largest engineering school" "in Islamic Republic" "of Iran")

  33. Our implementation • JMontyLingua problem: • SUT has computer, mechanic and electric engineering departments • ("have" "SUT" "computer mechanic and electric engineering departments") • ("have" "SUT" "computer and mechanic and electric engineering departments")

  34. Our inplementation • ("be" "SUT" “university" "in Islamic Republic" "of Iran") • => ("be" "SUT" “university" "in Islamic Republic of Iran") • =>SUT,be,university & SUT,be_in,Islamic Republic of Iran • <rdf:Description rdf:about="http://sharif.edu/#SUT"> <rdf:type>university</rdf:type> <SHARIF:be_in rdf:resource="http://sharif.edu/#Islamic+Republic+of+Iran"/> </rdf:Description>

  35. Any question?

More Related