1 / 26

Automatic Creation and Simplified Querying of Semantic Web Content

Automatic Creation and Simplified Querying of Semantic Web Content. An Approach Based on Information-Extraction Ontologies. Yihong Ding, David W. Embley, and Stephen W. Liddle Brigham Young University. Fundamental Problems. Lack of semantic web content Difficulty of content creation

yonah
Download Presentation

Automatic Creation and Simplified Querying of Semantic Web Content

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Creation and Simplified Querying of Semantic Web Content An Approach Based on Information-Extraction Ontologies Yihong Ding, David W. Embley, and Stephen W. Liddle Brigham Young University

  2. Fundamental Problems • Lack of semantic web content • Difficulty of content creation • Inability to use semantic web content easily

  3. Proposed Solutions • Automatically annotate data-rich web pages (turning them into semantic web pages) • Provide for free-form, textual queries of semantic web content

  4. A Show-Case Vision Find me the price and mileage of red Nissans – I want a 1990 or newer.

  5. Demo I: Data Extraction

  6. Demo II: Semantic Annotation

  7. Demo III: Free-Form Query

  8. Explanation: How it Works • Extraction Ontologies • Semantic Annotation • Free-Form Query Interpretation

  9. Extraction Ontologies Object sets Relationship sets Participation constraints Lexical Non-lexical Primary object set Aggregation Generalization/Specialization

  10. Formalism & Extraction Ontologies (a quick side note) • Fully formalized in predicate calculus • Object set ~ 1-place predicate • N-ary relationship set ~ n-place predicate • Constraint ~ closed predicate-calculus formula • As a description logic ~ ALCN (Attributive Language with Complement and Numeric Restrictions)

  11. Extraction Ontologies Data Frame: Internal Representation: float Values External Rep.: \s*[$]\s*(\d{1,3})*(\.\d{2})? Left Context: $ Key Word Phrase Key Words: ([Pp]rice)|([Cc]ost)| … Operators Operator: > Key Words: (more\s*than)|(more\s*costly)|…

  12. Data-Extraction Results: Car Ads Salt Lake Tribune Recall % Precision % Year 100 100 Make 97 100 Model 82 100 Mileage 90 100 Price 100 100 PhoneNr 94 100 Feature 91 99 Training set for tuning ontology: 100 Test set: 116

  13. Car Ads: Comments • Dynamic sets • Missed: MERC, Town Car, 98 Royale • Could use lexicon of makes and models • Unspecified variation in lexical patterns • Missed: 5 speed (instead of 5 spd), p.l (instead of p.l.) • could adjust lexical patterns • Misidentification of attributes • Classified AUTO in AUTO SALES as automatic transmission • Could adjust exceptions in lexical patterns • Typographical errors • “Chrystler”, “DODG ENeon”, “I-15566-2441” • Could look for spelling variations and common typos

  14. General Extraction Results • ~ 20 Domains (cars, obituaries, cameras, jobs, games, prescription drugs, …) • Simple, unified domains: nearly 100% recall and precision • Complex, loosely defined domains (e.g. obituaries: 82% recall and 74% precision) • Typical: 80%+ recall and precision

  15. Generality & Resiliency ofExtraction Ontologies (another quick side note) • Assumptions about web pages (generality) • Data rich • Narrow domain • Document types • Simple multiple-record documents (easiest) • Single-record documents (harder) • Records with scattered components (even harder) • Declarative (resiliency) • Still works when web pages change • Works for new, unseen pages in the same domain • Scalable, but takes work to declare the extraction ontology

  16. Semantic Annotation

  17. Free-Form Query Interpretation • Parse Free-Form Query (with data extraction ontology) • Select Ontology • Formulate Query Expression • Run Query Over Semantically Annotated Data

  18. Parse Free-Form Query “Find me the and of all s – I want a ” price mileage red Nissan 1996 or newer >=Operator

  19. Select Ontology “Find me the price and mileage of all red Nissans – I want a 1996 or newer” Similarity value: 2 Similarity value: 5

  20. Formulate Query Expression • Conjunctive queries and aggregate queries • Mentioned object sets are all of interest in the result. • Values and operator keywords determine conditions. • Color = “red” • Make = “Nissan” • Year >= 1996 >= Operator

  21. Formulate Query Expression For Let Where Return

  22. Run QueryOver Semantically Annotated Data

  23. Query Interpretation Results:Pilot Experiment with Car Ads • 15 car-ads free-form queries from 3 volunteer CS students • Results • Recognizing object sets of interest • Recall: 85% • Precision: 90% • Recognizing constraints • Recall: 61% • Precision: 79% • Problems • Regular expressions not tuned up and lexicons incomplete • Ambiguities: “Are there any Ford mustangs, 2002, that are red?” (Is 2002 a year, mileage, or price?) • Caveats • No disjunction • No negation

  24. GeneralQuery Interpretation Results AskOntos (Pilot Experiment on 5 domains: cars, real estate, countries, movies, diamonds) • Object sets of interest recognized • Recall: 90% • Precision: 90% • Conditions recognized • Recall: 71% • Precision: 88%

  25. Pragmatics All is not rosy … • Technical problems • Extraction and query-interpretation accuracy • Execution speed • Harvesting • Crawling?! • Information behind forms on the hidden web • Social problems • Cooperation from web site developers • End-user concerns • Motivation • Trust

  26. Conclusions • Automatically create semantic-web content • Do data extraction over an ordinary web page • Create semantic-web page • Cache page • Store external semantic annotation wrt an ontology • Query semantic web pages • Free-form queries • Return results • Table • Link to original web page (scrolled and highlighted) • Pragmatic considerations www.deg.byu.edu

More Related