1 / 95

Evolution of OWL 2 QL and EL Ontologies

Evolution of OWL 2 QL and EL Ontologies . Bernardo Cuenca Grau , Ernesto Jiménez-Ruiz Computer Science Department, University of Oxford, UK Evgeny Kharlamov , Dmitriy Zheleznyakov KRDB research centre , Free University of Bozen -Bolzano, Italy. Outline. Ontologies and evolution

felton
Download Presentation

Evolution of OWL 2 QL and EL Ontologies

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evolution of OWL 2 QL and EL Ontologies Bernardo Cuenca Grau, Ernesto Jiménez-RuizComputer Science Department, University of Oxford, UK Evgeny Kharlamov, DmitriyZheleznyakovKRDB research centre, Free University of Bozen-Bolzano, Italy

  2. Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions

  3. Ontologies: schema + data • Schema provide • standard vocabularies for data • a way to structure data • means for machines to be able to understand data • Schemas are in terms of • classes: Person, Country, ... • (binary) properties: State-of-Origin, Subclass-of, ... • Data is a collections of facts • Instantiations of classes • Instantiations of properties

  4. Domain ontologies • Goal: to provide standard vocabularies to communities • Clinical sciencesontologies: • SNOMED CT: Systematized Nomenclature of Medicine - Clinical Terms • > 311k concepts • NCIt: National CancerInstitute thesaurus • ~ 89k concepts, 200m cross links between them [NCI] • FMA: Foundational Model of Anatomy • 75k classes, 168 relations, 120k terms, 3.1m relat. inst.

  5. Languages for domain ontologies • Domain ontologies are • complex and large • manually created • should be error free • Languages that are natural for domain ontologies • flexible to capture complex interaction • logic-based (e.g., based on Description Logics) • Ontology Web Language: OWL 2 • OWL DL • OWL 2 QL • OWL 2 EL • e.g. SMOMED forall x: instance-of (x, Common cold)  exists y: instance-of (y, Virus) and causative-agent (y, x)

  6. Evolution in SNOMED • Development teams • 1 main team and • 4 geographically distributed teams • each team makes modifications • Every 2 weeks the main team • integrates changes, resolve conflicts • From 2002 to 08 SNOMED went from 278k to 311k concepts [SM-1] • Example of modifications: • In Jan. 2006 a number of concepts from the “Clinical finding” hierarchywere moved to the “Event hierarchy”[SM-2]

  7. Evolution in NCI and FMA • Developers of NCI do over 900 monthly changes [HKR’08] • 20 full time editors for NCI • they work • independently • on a separate copy of the ontology • There is one curator for NCI • every 2 weeks curator • reviews changes using a workflow management tool • approves the changes • they merge results once a month • there is one curator who curates once a month • FMA “is an evolving computer-based knowledge source ...” [FMA]

  8. Evolution of domain ontologies • Evolution of domain ontologies is common • Ontologies are changed by • insertion of axioms • deletion of axioms • Evolution affectsboth • schema level • data level Evolution of domain ontologies should be error free

  9. Design errors: incoherency • incoherencyis a schema level design error: • incoherent concept = empty concepts • can be caused by disjointnessand cardinality restrictions • incoherent role = empty role • can be caused by disjointness and cardinality restrictions EquivalentClasses( :Nothing ObjectIntersectionOf(  :Airplane :Boat)) SubClassOf( :Amphibian :Airplane) SubClassOf( :Amphibian :Boat )

  10. Design errors: inconsistency • Inconsistencyis an error that involves both • data level and • schema level • Inconsistency: • disjoint concepts are Instantiated • functionality is violated • number restrictions are not respected EquivalentClasses( :Nothing ObjectIntersectionOf(  :Airplane :Boat )) ClassAssertion(:Airplane :BerievA-40 ) ClassAssertion(:Boat  :BerievA-40 )

  11. Insertions bring errors • Insertions introduce errors which should be repaired • Incoherency • Inconsistency • Challenge: how to repair the ontology after “bad” insertions? SubClassOf( :Amphibian :Boat ) ClassAssertion(:Boat  :BerievA-40 ) EquivalentClasses( :Nothing ObjectIntersectionOf(  :Airplane :Boat )) SubClassOf( :Amphibian :Airplane) EquivalentClasses( :Nothing ObjectIntersectionOf(  :Airplane :Boat )) ClassAssertion(:Airplane :BerievA-40 )

  12. Deletions bring headache • Deletions do not introduce (design) errors • no inconsistency • no incoherency • Contraction can provoke • restoring of implicit data • deletion of implicitly related data SubClassOf( :Airplane :Transport ) ClassAssertion( :Transport :BerievA-40 ) SubClassOf( :Airplane :Transport ) ClassAssertion( :Airplane  :BerievA-40)

  13. Deletions bring headache • Deletions do not introduce (design) errors • no inconsistency • no incoherency • Contraction can provoke • restoring of implicit data • deletion of implicitly related data SubClassOf( :Airplane :Transport ) ClassAssertion( :Airplane  :BerievA-40 ) ClassAssertion( :Transport :BerievA-40 ) SubClassOf( :Airplane :Transport ) • Challenge: how to respect implicit relations while deleting knowledge?

  14. SPARQL 1.1 Update • Proposed by HP and based on SPARUL extension of SPARQL for • adding • deleting • updating RDF triples • Deletion without deletion effect • only explicit occurrences of triples are deleted • there is no validationwhether the tupleis still there implicitly SubClassOf( :Airplane :Transport ) ClassAssertion( :Airplane  :BerievA-40 ) ClassAssertion( :Transport :BerievA-40 ) SubClassOf( :Airplane :Transport ) ClassAssertion( :Airplane  :BerievA-40 )

  15. Syntactic approaches to evolution • In the ontology: • “Children are baklava fans” • “Children are not cats” • To delete: “Children are baklava fans” • To this end it is enough to delete [HS’05] [JRCGHB’11] [KPSCG’06] and • In the resulted ontology: • “Children are not baklava fans” • “Children are not cats” is lost OK Not desirable

  16. Semantic approaches to evolution • How to restore knowledge which • was semantically deleted and • is desirable • One has to find semantic difference between • the original and • the obtained ontology • There is a number of approaches and tools to find semantic difference • CollaborativeProtege • DOGMA-MESS • Content CVS approach • .... [FDCM’08] [MDM’06] [JRCGHB’11]

  17. Limitations of current sem. approaches • Quite application and language oriented • Heuristic based • What is missing: the big picture • a general understandingof evolution of logic based ontologies • proper theory that explains relationships among • different types of ontology modifications • different ontology languages • feasibility and complexity of evolution computation • There are several attempts to understand logic based evolution • We are working on that too 2nd part of this tutorial is about current achievements in this direction!

  18. Summary on domain ontologies • Domain ontologies are • large • logic based • Changes in domain ontologies • are frequent • are about insertion and deletions • Insertions easily introduce errors • incoherency • inconsistency • Deletions • do not introduce (logical) errors • not trivial: implicit knowledge relationships should be traced

  19. Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions

  20. Web knowledge bases (ontologies) • Goal: gathering general purpose knowledge from the Web • DBpedia: • structural counterpart of Wikipedia • 320 classes, 1.650 different properties, 19m facts • Yago: • combines Wikipedia and WordNet, GeoNames, • 10m entities, 120m facts about them • (Open)Cyc: • started in 1984, formalizing knowledge manually • logic based KB with reasoing • 47.000 concepts, 306.000 facts • These ontologies are not static • they constantly change, since Wiki does so • Yagocrawls Wikipedia every couple of weeks ...

  21. Languages for Web KBs • Web KBs • have rather simple and small schemas • should be error free • errors are rare • Languages that are natural for domain ontologies • able to describe basic things • SubClassOf, Domain, Range, etc. • These languages are: • Resource Description Framework with Schema: RDF and RDFS • a bit of OWL 2: owl:equivalentClass • Some rule languages: OWL 2 RL • Evolution is performed ad hoc • Each KB has its approach

  22. Evolution in DBpedia • DBpedia • 18 functional properties • new information is obtained from Wikipedia • new data can violate functional properties • Inconsistency is possible FunctionalObjectProperty( :netIncome) FunctionalObjectProperty( :co2Emission) FunctionalObjectProperty( :height) ...

  23. Evolution in Yago • Yago is a clean (inconsistency fee) ontology • 95%of accuracy - manually validated on 6k facts • New knowledge should not cause contradictions

  24. Yago consistency check [Yago-1] • Yago has rules to check consistency • check uniqueness of entities and functional arguments • domains and rages of relations • type checking subclassOf Singer subclassOf Guitar Guitarist Rock Singer type born born Physics 1935

  25. Summary on Web KBs • Web KBs aim at consistency • Schemas of Web KBs are rather simple and small • it is hard to make errors • Evolution is performed ad hoc

  26. Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions

  27. Ontologies for semantic markup • Goal: • to nest semantics within existing content on web pages • to help search engines, crawlers and browsersfind the right data • Person: • name • photo • URL • ... text embedding semantic annotations

  28. Standards for semantic markup • Microformats, since 2003 • Small set of fixed formats. E.g.: • hcard: people, companies, organizations, and places • XFN : relationships between people • hCalendar: calendaring and events • RDFa: Resource Description Framework – in – attributes • since in 2004, W3C recommendation • serializationformatforembedding RDF data into HTML pages • canbeusedtogetherwithanyvocabulary, e.g. FOAF • Microdata • alternativetechniquesforembeddingstrucuted data • proposed in 2009, comeswith HTML 5

  29. Is semantic markup popular? [CB’12] • Yahoo Crawl of 2011 • 12 billion pages were crawled • 431 million of then contain RDFa in 2011 - 3.5%of the HTML pages had structured (meta) data

  30. Big step in promoting ontologies • Schema.orginitiative: • started on June 2011 • initiated by Bing, Google, Yahoo!, Yandex • they propose: to mark up / annotate websites with metadata • they support: Microdata

  31. Schema.org ontologies • Metadata by Schema.org: • Person • Organization • Event • Place • Product • ... • 200+ types

  32. Where can you see Scmeha.org impact?

  33. Semantic markup today • Common Crawl foundation • goal: building and maintaining an open crawl of the Web • current data is about 5 billion web pages • WebDataCommons.org project • goal: extracting Microformats, Microdata,RDFa from Common Crawl corpus • Feb 2012: • processed 1.4 billion HTML pages of CC corpus • 20.9 Terabyte of compressed data • this is a big fraction of the Web

  34. Structured Web data is fast growing • 1.4 billion HTML pages processes • 188 millions of them contain structural datain Microformat, Microdata, RDFa [CB’12] • This data is 3.2 billions RDF triples 13% of the HTML pages contain structured (meta) data from 2011 to 2012 the fraction of structured data went from 3.5% to 13%

  35. Evolution at schema level: Schema.org • It is a very simple and coherent schema • Coherency • basic Schema.org vocabulary can be mapped to RDFS • RDFS schemas are always coherent so does Schema.org • What is used from RDFS: [SO-2] • subclass • domain, range restriction of properties • literal, • ... • Schema can be extended • mechanism: specialization • of classes, properties, enums • Person/Engineer [SO-3] PloiceStation A police station. Subclass of: CivicStructure Subclass of: EmergencyService creator The creator/author of this Creative Work Domain: CreativeWork Domain: UserComments Range: Person Range: Organization

  36. Evolution at data level: Schema.org • It is RDFS embeddable  no inconsistency is possible • Schema.orgconvention: on range restriction [SO-1] • each property may have 1 or more types at its range • the value(s) of the property should be instances of at least one of these types • Thus, they accept that data can be inconsistent

  37. Evolution at data level: Schema.org • Is data inconsistency important? • Data gathered by crawling the Web is inconsistent by nature • data consistency is not important • data consistency is unrealistic • Data maintained locally can be consistent • consistency of data can be important [SO-1] In the spirit of "some data is better than none", we will accept this [inconsistent] markup and do the best we can.

  38. Summary on semantic markup • Semantic mark up schemas are • small • very simple • In many cases logical errors with semantic markup are simply impossible • Consistency and coherency is in general not important

  39. Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions

  40. Summary: ontologies and evolution • Three major groups of ontologies • unification of terminology by specific communities • domain ontologies • storing general purpose web content in • Web knowledge bases • enriching Web content with information understandable by agents, e.g. crawlers – 13% of Web data is enriched! • ontologies for semantic markup • In all these cases ontologies are dynamic • insertions and • deletions happen at the level of • schema and • data

  41. Summary: attitude to evolution • schema is simple (RDFS): errors are (almost) impossible • data may disrespect the schema • “some data is better than none” • “do the best we can” • schema is complex (OWL 2) – incoherency • data can easily be inconsistent • coherency + consistency are vital • logical reasoning can guarantee it • schema is more involved but still no incoherency(RDFS + some OWL e.g., functionality) • data may be inconsistent • conflicts can be detected by simple reasoning • many problems are solved by type checking don’tcare logic based Web knowledge bases • ontologies for semantic markup domain ontologies

  42. Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions

  43. Logic-Based Evolution • The main principle of logic-based evolution isthe principal of minimal change • Ontologies should change as little as possible • There are two main classes of logic-based approaches: • Model-based approach (MBA) • Formula-based approach (FBA) • There are two main types of evolution: • Update (or revision), when new information is added • Contraction (or erasure), when some old information is retracted • We illustrate • update with MBA • contraction with FBA [Wins’90] [Wins’90] [WWT’10] [KZ’11] [QD’09] [LLMW’06] [DGLPR’09] [CKNZ’10] [EG’92] [KM’91]

  44. Outline • Ontologies and evolution • Domain ontologies • Web knowledge bases • Semantic markup • Logic-based approaches • Model-Based approaches • Formula-Based approaches • Syntactic-deductive approach • Experiments • Conclusion and directions

  45. MBA: Evolution Process • Ontology • Models • ModelTransformer • Newdata • Evolvedontology • Evolvedmodels

  46. MBA: Ontology to Models Model 1: Model 2: …

  47. MBA: Evolution Process • Ontology • Models • ModelTransformer • Newdata • Evolvedontology • Evolvedmodels

  48. MBA: Data Evolution Model 1: Model 2: • Winslett’s operator • Dalal’s operator … • Satoh’s operator • …

  49. MBA: Data Evolution Model 1: ✔ Model 2.1: Model 2: ✔ ✘ Model 2.2: ✔ • Winslett’s operator

  50. MBA: Evolution Process • Ontology • Models • ModelTransformer • Newdata • Evolvedontology • Evolvedmodels

More Related