Data and Knowledge Evolution
Download
1 / 170

Data and Knowledge Evolution - PowerPoint PPT Presentation


  • 38 Views
  • Uploaded on

Data and Knowledge Evolution. Giorgos Flouris [email protected] Open Data Tutorials, May 2013. Slides available at: http://www.ics.forth.gr/~fgeo/Publications/ WOD13 .p pt. World Wide Web. WWW (and HTML) focus on human readability Page presentation (fonts, colors, images, …)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Data and Knowledge Evolution' - sileas


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Data and Knowledge Evolution

Giorgos [email protected]

Open Data Tutorials, May 2013

Slides available at: http://www.ics.forth.gr/~fgeo/Publications/WOD13.ppt


World wide web
World Wide Web

  • WWW (and HTML) focus on human readability

    • Page presentation (fonts, colors, images, …)

    • Human understanding

    • Presentation  Semantical content

    • Content is not formally described (for a machine to understand)

  • WWW contains documents, not data


Problems with the current web
Problems with the Current Web

  • Search and access becomes difficult

    • Software ignorant of the semantical content of a web page

    • Keyword search

    • High recall, low precision

  • Terminological issues

    • Synonyms (heart disease = cardiac disease)

    • Hyponyms/hypernyms (parliament members are politicians)

  • Queries on the semantical content cannot be made

    • Fetch articles that support B. Obama’s foreign policy

    • Fetch the home pages of all members of the Greek Parliament


Semantic web
Semantic Web

  • The Semantic Web is an extension of the current webin which information is given well-defined meaning, better enabling computers and people to workin cooperation[BLHL01]

  • The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries http://www.w3.org/2001/sw/

  • [Semantic Web] is a collaborative effort led by W3C with participation from a large number of researchers and industrial partnershttp://www.w3.org/2001/sw/


Semantic web in practice
Semantic Web in Practice

  • Web of data, rather than documents

    • HTML for presentation

    • Semantical languages for semantical content

    • Readable and understandable by humans and machines

  • Semantic Web languages, protocols, etc

    • Web page annotation (metadata descriptions etc)

    • Publication of data on the Internet

    • Efficient communication and manipulation of data over the Internet

  • Different applications

    • Efficient searching

    • Sharing of data (e-science, e-government, remote learning, …)

    • Linked Open Data (more on that later)


Ontologies and data datasets
Ontologies and Data (Datasets)

  • An ontology is an explicit specification of a shared conceptualizationof a domain [Gru93]

    • Precise, logical account of the intended meaning of terms

    • Common (shared) interpretation of terms

    • Formal vocabulary for information exchange (humans/machines)

  • Ontologies (vocabularies) allow the description of data

  • Terminology:

    • Ontology = vocabulary = schema

    • Data = instances

    • Dataset = data and the related ontology (i.e., a dataset may contain schema and/or data)


Dataset dynamics
Dataset Dynamics

  • Datasets change constantly

    • World changes (dynamic models)

    • View on the world changes (new knowledge, measurements, etc)

    • Perspective and usage changes

  • Example:

    • Gene Ontology (information about gene products): daily versions

    • DBPedia: 1,4 updates/second (http://live.dbpedia.org/LiveStats/) [MLA+12]

  • Need methodologies to cope with the problems related to dynamicity

    • Evolution (modify a dataset in response to a change)

    • Versioning (keep track of versions and their relations)

    • Debugging, cleaning, repairing, quality (maintain consistency and quality in a dynamic environment)

    • Change monitoring, detection and propagation (identify changes and use them to synchronize remote datasets)


Linked open data
Linked (Open) Data

  • Datasets can be interlinked

    • Sharing knowledge

    • Reusing knowledge

    • Modular development

    • Reuse of schemas

  • Linked Open Data (LOD) movement

    • Constantly growing

    • 31 billion triples and 295 datasets as of September 2011


Linked open data cloud diagram

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Linked Open Data Cloud Diagram


Linked open data challenges
Linked Open Data Challenges Anja Jentzsch. http://lod-cloud.net/

  • Both a blessing and a curse

    • Added-value benefits

    • Discovery of unknown correlations, connections, relationships

    • Vast amount of interrelated knowledge

    • No central control, everyone can publish and relate to others

    • Quality of datasets lies/depends on different providers

    • A change in one dataset affects all related ones

  • Several new problems related to dynamics

    • Propagation of changes among interrelated datasets

    • Maintaining the quality of local datasets

    • Co-evolution


Scope dynamic linked datasets
Scope: Dynamic Linked Datasets Anja Jentzsch. http://lod-cloud.net/

You are here

Dynamic Datasets

LinkedDatasets


Purpose of this talk
Purpose of This Talk Anja Jentzsch. http://lod-cloud.net/

  • To survey different research areas related to dynamic LOD

    • Remote Change Management

    • Repair

    • Data and Knowledge Evolution

  • Categorize and classify works in each field

    • Broad but shallow description

    • Several references for more in-depth study

    • No claims of completeness (references are just indicative)

    • Two relevant surveys: [FMK+08, ZAA+13]

  • Emphasis on some related work done in FORTH

    • Will avoid technical discussion

    • References will be given for further details


Defining remote change management
Defining Remote Change Management Anja Jentzsch. http://lod-cloud.net/

  • Managing the effects of remote changes on interlinked datasets

    • Remote changes have profound effects on local datasets

    • Good practices are important

      • Proper versioning, change logging, adaptation to remote changes, …

    • Attention exploded after the success of the LOD paradigm

  • Related research questions

    • How should I version my data?

    • How can I efficiently monitor changes in my dataset?

    • How can I detect changes in remote datasets?

    • How does the evolution of remote datasets affect my data?

    • How can I efficiently propagate changes from one dataset to another?


Remote change management visualization

RD Anja Jentzsch. http://lod-cloud.net/1

LD1

RD0

LD0

Remote Change Management: Visualization

Remote Site

Versioning, Change Monitoring

Change Detection

Local Site

Change Propagation


Remote change management structure
Remote Change Management: Structure Anja Jentzsch. http://lod-cloud.net/

  • Three subfields

    • Versioning

    • Change monitoring and detection

    • Change propagation

  • Structure

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]


Defining repair
Defining Repair Anja Jentzsch. http://lod-cloud.net/

  • Assessing and improving the quality and the semantical or structural integrity of the data

    • Maintaining consistency, coherency, validity

    • Restoring consistency, coherency, validity, when violated

    • Assessing and improving quality

    • Preserve quality/integrity in the face of remote changes

  • Related research questions

    • How can I preserve the integrity and quality of my data in a dynamic and interlinked environment?

    • How can I guarantee consistency and validity?

    • How can I restore consistency and validity, if violated?


Repair visualization
Repair: Visualization Anja Jentzsch. http://lod-cloud.net/

D1

D0

Repair Process(Cleaning, Debugging, Repairing, Quality Enhancement)

Assessment Module (Diagnosis, Quality Assessment)


Repair structure
Repair: Structure Anja Jentzsch. http://lod-cloud.net/

  • Four subfields

    • Cleaning

    • Debugging

    • Validity repair

    • Quality enhancement

  • Structure

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]


Defining evolution
Defining Evolution Anja Jentzsch. http://lod-cloud.net/

  • Modifying a dataset in response to a change in the domain or its conceptualization

    • Identify the result of applying new information on the dataset

    • Determine the result of change propagation from remote datasets

    • Understand the process of change

  • Related research questions

    • What is the semantics of evolution and change?

    • How can I efficiently compute the ideal evolution result?


Evolution visualization

D Anja Jentzsch. http://lod-cloud.net/1

D0

Evolution: Visualization

Real World

EvolutionAlgorithm

Delete_Class(…)Pull_Up_Class(…)Rename_Class(…)…

Dataset


Evolution summary
Evolution: Summary Anja Jentzsch. http://lod-cloud.net/

  • Evolution topics

    • Understanding the evolution challenges

    • Understanding the process of change

      • Balancing between philosophical and practical considerations

    • Cross-fertilization with belief change

  • Structure

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review


General structure of this talk
General Structure of this Talk Anja Jentzsch. http://lod-cloud.net/

  • Introduction to RDF/S, DLs, OWL

  • Remote change management

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]

  • Repair

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]

  • Data and Knowledge Evolution

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review

      The final few slides contain citations for the references in this talk

Part I(2 hours)

Part II(1 hour)


Talk structure a
Talk Structure (A) Anja Jentzsch. http://lod-cloud.net/

  • Introduction to RDF/S, DLs, OWL

  • Remote change management

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]

  • Repair

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]

  • Data and Knowledge Evolution

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review


Datasets
Datasets Anja Jentzsch. http://lod-cloud.net/

  • Basic structures

    • Classes (or concepts): collections of objects (e.g., Actor, Politician)

    • Properties (or roles): binary relationships between objects (e.g., started_on, member_of)

    • Instances (or individuals): objects (e.g., Giorgos, B. Obama)

  • Relations between them

    • Subsumption (Parliament_Member subclass of Politician), instantiation (B. Obama instance of Politician), …

    • The allowed relations and their semantics depend on the language

  • Different representation languages for LOD

    • RDF/S, OWL


Visualization triples serialization

instantiation Anja Jentzsch. http://lod-cloud.net/

subsumption

Visualization, Triples, Serialization

Visualization

Triple Representation

Serialization (RDF/XML)

Period

<rdfs:Class rdf:ID=“Period”>

</rdfs:Class>

<rdf:Property rdf:ID=“participants”>

<rdfs:domain rdf:resource=“Onset”/>

<rdfs:range rdf:resource=“Actor”/>

</rdf:Property>

<G_Birth rdf:about Birth>

<participants>

<Giorgos rdf:about Actor/>

</participants>

</G_Birth>

<rdfs:Class rdf:ID=“Event”>

<rdfs:subClassOf rdf:resource=“Period”/>

</rdfs:Class>

Define classes

[Period type Class]

Define properties

[participants type Property]

[participants domain Onset]

[participants range Actor]

Instantiate/define individuals

[G_Birth type Birth]

[Giorgos type Actor]

[G_Birth participants Giorgos]

Define hierarchies

[Event subClass Period]

Actor

Event

participants

started_on

Onset

Existing

Stuff

Birth

participants

Giorgos

G_Birth


Rdf and rdfs
RDF and RDFS Anja Jentzsch. http://lod-cloud.net/

  • An RDF dataset consists of triples

  • RDFS adds semantics

    • Subsumption hierarchies (classes and properties)

      • Transitive

    • Instantiation

      • Inheritance, implicit instantiation

  • Sometimes more than subsumption/instantiation is needed

    • Combining concepts, roles to form more complex relations

      • Concept definitions: a mother is a female who has a child

      • Other knowledge: all items stored in warehouse X are flammable

    • Constraints on data

      • Each person must have one mother


Extensions of rdf s dls 1 2
Extensions of RDF/S: DLs (1/2) Anja Jentzsch. http://lod-cloud.net/

  • Description Logics (DLs)

    • http://dl.kr.org/

    • Formal underpinning of web representation languages

    • Family of logical formalisms

      • Well-defined semantics

      • Model-theoretic reasoning based on interpretations

    • Formally studied

      • Expressiveness, reasoning tools, computational complexity, …

  • Components

    • Individuals: specific objects (instances) – Giorgos

    • Concepts: sets of individuals (classes) – Parent

    • Roles: sets of pairs of individuals (properties) – has_child

  • Operators: , ⊓, , {.}, ⊤, …

  • Connectives: ⊑, ≡, …


Extensions of rdf s dls 2 2
Extensions of RDF/S: DLs (2/2) Anja Jentzsch. http://lod-cloud.net/

  • Definitions, partial definitions, constraints, subsumptions, …

    • A mother is a female who has a child

      • Mother ≡ has_child ⊓ Female

    • Each person must have one mother

      • Person ⊑ has_child-1.Mother

  • A great variety of DLs (trade-off involved)

    • Different properties

    • Different expressive power

    • Different reasoning complexity


Extensions of rdf s owl
Extensions of RDF/S: OWL Anja Jentzsch. http://lod-cloud.net/

  • OWL (Web Ontology Language)

    • http://www.w3.org/2004/OWL/

    • General-purpose representation language

    • Compatible with the architecture of the Semantic Web

  • A family of languages

    • Flavors: OWL-Lite, OWL-DL, OWL Full

    • Profiles: OWL 2 EL, OWL 2 QL, OWL 2 RL

    • Different expressiveness (and complexity)

  • Each corresponds to a specific DL

    • Useful from a modeling perspective

    • Expressive but not too complex

    • Appealing computationally


Representation languages in lod
Representation Languages in LOD Anja Jentzsch. http://lod-cloud.net/

  • Mostly RDF

    • With RDFS semantics

      • Instantiations

      • Class subsumption

      • Property subsumption is rare

  • Some OWL

    • Mostly OWL Lite

    • Extensive use of owl:sameAs

      • Often abusing it [HHM+10]

    • OWL 2 profiles are gaining ground


Talk structure b1
Talk Structure (B1) Anja Jentzsch. http://lod-cloud.net/

  • Introduction to RDF/S, DLs, OWL

  • Remote change management

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]

  • Repair

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]

  • Data and Knowledge Evolution

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review


Motivation for remote change management

D Anja Jentzsch. http://lod-cloud.net/R

DL

uses

Motivation for Remote Change Management

  • Crucial problem for dynamic linked datasets

    • Linking: datasets linked to other datasets (e.g., vocabularies)

    • Dynamics: changes cause problems to linked datasets

    • No central curation or control

      • No control over (or knowledge of) other datasets’ evolution process

    • Curators don’t bother annotating and logging changes

      • Temporal and versioning information is usually missing [RPH+12]

  • Remote change management seeks solutions to allow:

    • Keeping track of versions

    • Restoring previous versions

    • Assessing compatibility of versions

    • Monitoring and detecting changes

    • Tracing back the evolution history (of datasets, concepts, …)

      • For visualization and understanding

    • Propagating changes to synchronize linked datasets


Subfields of remote change management
Subfields of Remote Change Management Anja Jentzsch. http://lod-cloud.net/

  • Remote Change Management

    • Versioning

      • Keep track of versions

    • Change monitoring and detection

      • Monitoring: record changes as they happen

      • Detection: identify changes after they happen

    • Change propagation

      • Propagate changes across linked datasets for synchronization purposes


Versioning
Versioning Anja Jentzsch. http://lod-cloud.net/

  • Versioning

    • Keep track of versions

    • Identify different versions of a dataset

    • Enable transparent access to the “correct” version (smooth interoperation)

  • Issues involved

    • Identification

      • Determine which versions to store and how to identify them

      • Manually or automatically (syntactical, semantical considerations)

      • Packaging of changes

    • Relation between versions

      • A sequence or a tree

    • Compatibility information

      • Backwards/forwards compatibility and how to determine it (often manually)

      • Dataset-wide compatibility or fine-grained compatibility (e.g., at resource level)

      • Metadata on the different versions

    • Transparent access

      • Relate versions with (compatible) data sources, applications etc


Change monitoring and detection

D Anja Jentzsch. http://lod-cloud.net/R

DL

uses

Change Monitoring and Detection

  • Change monitoring

    • Record changes as they happen

      • Manual (error-prone and often incorrect)

      • Automatic (not used in practice)

    • In the good will of the dataset owner

    • Sometimes change logs are inaccessible

  • Change detection

    • Identify changes after they happen

    • Based on the previous and current versions

  • In both cases, a change language is required

    • Supported set of changes, along with their semantics

    • Can be low-level or high-level


Change propagation
Change Propagation Anja Jentzsch. http://lod-cloud.net/

  • Change propagation

    • Communicate changes to linked datasets for synchronization

  • Push-based or pull-based propagation

    • Push-based: locally-initiated, via “registration” or via monitoring and versioning

    • Pull-based: consumer-initiated

  • Communication based on deltas (rather than versions)

    • Reduce communication overhead

    • Reduce storage requirements

    • On average, 2-3% of a dataset changes between versions [OK02]

  • Deltas are based on a language of changes


Talk structure b2
Talk Structure (B2) Anja Jentzsch. http://lod-cloud.net/

  • Introduction to RDF/S, DLs, OWL

  • Remote change management

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]

  • Repair

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]

  • Data and Knowledge Evolution

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review


Versioning approaches 1 3
Versioning Approaches (1/3) Anja Jentzsch. http://lod-cloud.net/

  • Capture different aspects of versioning, such as:

    • Detecting versions

    • Storing versions efficiently

    • Allow cross-snapshot queries

      • Find gene products whose functions have not changed in the last 50 versions

      • Determine price fluctuation for x along different versions of the product catalog

  • Early versioning approaches inspired by SVN

    • Good for files, not directly adaptable to semantical languages

  • SHOE language [HH00]

    • Machine-readable version information (e.g., compatibility)

    • Provided by curator as SHOE statements

  • Memento [SSN+10]

    • Fine-grained versioning at URI level (resources, web pages)

    • Machine-readable version information, in the HTTP header

      • Timestamps, traversal information (prior/current versions) etc


Versioning approaches 2 3
Versioning Approaches (2/3) Anja Jentzsch. http://lod-cloud.net/

  • Theoretical foundations for versioning [HP04]

    • Formal definitions to capture notions such as:

      • Compatibility (between versions)

      • Commitment (resources committing to a certain ontology)

      • Ontology perspectives (the part of the web committing to an ontology)

  • Temporal approaches [HS05, PTC05, KLGE07]

    • For capturing temporal relations between versions

    • For allowing cross-snapshot queries

  • Versioning in multi-editor environments [RSDT08]

    • Via change monitoring


Versioning approaches 3 3
Versioning Approaches (3/3) Anja Jentzsch. http://lod-cloud.net/

  • Automatically detecting version relationships [AAM09]

    • Using heuristics based on URIs

  • Study of “relatedness” between versions [CQ13]

    • A model of “relatedness” between vocabularies from various sources

    • Similar to links in web pages

  • POI: Partial Order Index [TTA08]

    • Efficient method for storing versions and their differences

    • Stores several versions, exploiting their common triples for efficient storage


Change languages 1 2
Change Languages (1/2) Anja Jentzsch. http://lod-cloud.net/

  • Change languages necessary for monitoring, detection, propagation

  • Granularity

    • Low-level (or atomic, or elementary)

      • Simple add/remove operations

      • Add(s,p,o), Delete(s,p,o)

      • Simple to detect and define

      • Focus on machine-readability: determinism, well-defined semantics

    • High-level (or complex, or composite)

      • More coarse-grained, compact, closer to editor’s perception and intuition

      • Generalize_Domain(P,A), Delete_Class(A)

      • More interesting; harder to detect and define

      • Focus on human-understandability: often unclear and/or informal semantics


Change languages 2 2
Change Languages (2/2) Anja Jentzsch. http://lod-cloud.net/

  • Many different high-level languages (no standard)

    • [HGR12, JAP09, PFF+13, SK03, AH06, DA09, PTC07, …]

    • Some are domain-specific (e.g., [HGR12])

    • Some are dynamic (e.g,, [AH06, DA09, PTC07])

      • Allow custom, user-defined changes

    • Some allow terminological changes (e.g., [PFF+13])

      • Rename, merge, split

      • Common, but tough to detect (easily confused with add/delete)


Representation issues
Representation Issues Anja Jentzsch. http://lod-cloud.net/

  • Deltas are just sets of changes from the change language

  • Changes usually represented using a change ontology

    • Ontology represents changes

    • A specific change is an instance of such an ontology

    • Deltas associated with sets of such instances

    • Different proposals [NCLM06, KFKO02, KN03, PT05]

    • Allows the manipulation and communication of deltas/changes using standard Semantic Web technologies


Change monitoring approaches
Change Monitoring Approaches Anja Jentzsch. http://lod-cloud.net/

  • Using a version log [PT05]

    • Logging actions on the dataset

    • Use it for change detection, as well as proper versioning

    • Good quality, high-level change monitoring

    • Based on a dynamic language of changes

  • Using migration specifications [ZZL+03]

    • Similar to logs, but with a more formal structure

  • DBPedia change monitoring [MLA+12]

    • http://live.dbpedia.org/

    • Live versions, as opposed to “standard” versions


Low level change detection 1 2
Low-Level Change Detection (1/2) Anja Jentzsch. http://lod-cloud.net/

  • SemVersion [VWS+05]

    • Developed in Karlsruhe (FZI, AIFB)

    • Low-level change detection tool for RDF

    • Provides also versioning functionalities

    • Allows cross-snapshot queries

  • For RDF [ILK12]

    • Low-level change detection based on set difference

    • Aggregating and compressing deltas

    • Also dealing with versioning issues

  • For RDF/S [ZTC11]

    • Takes into account semantics (RDFS inference)

    • Four different methods to compute deltas (all based on set difference)

    • Formal analysis of these methods’ properties and semantics

    • Extension: effect of blank nodes on change detection [TLZ12]


Low level change detection 2 2
Low-Level Change Detection (2/2) Anja Jentzsch. http://lod-cloud.net/

  • Bubastis (http://www.ebi.ac.uk/fgpt/sw/bubastis/index.html)

    • Simple diff tool (triple-based comparison)

    • Basically RDF, but also supports OWL

  • For DL-Lite [KWZ08]

    • Formal, semantical approach

  • For EL [KWW08]

    • Uses a concept-based description of changes

  • For propositional knowledge bases [FMV10]

    • Propositional, but generic; it can be applied to DLs

    • Formal analysis of the problem

    • Also dealing with propagation semantics


High level change detection 1 2
High-Level Change Detection (1/2) Anja Jentzsch. http://lod-cloud.net/

  • For OWL: PromptDiff [NKKM04], OntoView [KFKO02]

    • Employ heuristics and probabilistic methods

    • Evaluation using precision/recall metrics against a gold standard

    • Integrated into tools that also provide versioning functionalities

  • For RDF/S [PFF+13]

    • Dealing with both machine-readability and human-understandability

    • Also dealing with propagation (applying changes)

    • To be discussed in detail later

  • COnto-Diff [HGR12]

    • Rule-based approach

    • Also dealing with propagation


Change propagation approaches
Change Propagation Approaches Anja Jentzsch. http://lod-cloud.net/

  • Usually part of other tools [SMMS02, MMS+03]

    • Versioning, monitoring tools (push-based propagation)

    • Detection tools (pull-based propagation)

    • Evolution and repair tools (pull-based propagation)

      • Adapt your data to be “compatible” with the new remote version

  • SparqlPush [PM10]

    • Push-based propagation of changes on SPARQL “views”

  • PRISM, PRISM++ [CMZ08, CMDZ10]

    • High-level language of schema changes for relational data

      • Also supports changes on the integrity constraints

    • Identifies and propagates the changes required in the data for abiding to the new schema

    • Query and update rewriting

      • For applications that try to access the old schema


Other change management approaches
Other Change Management Approaches Anja Jentzsch. http://lod-cloud.net/

  • Complete approach for XML [SP10]

    • Representing changes inline with the data using a graph (“evograph”)

    • Supports different change representation languages (both low-level and high-level)

    • Timestamps changes

    • Monitoring: evograph can be used to log the changes

    • Propagation: changes can be accessed and propagated

    • Versioning: timestamps in changes can be used to generate snapshots (versions) at different times

    • Allows cross-snapshot queries

    • Fairly generic, can be adapted for RDF


Talk structure b3
Talk Structure (B3) Anja Jentzsch. http://lod-cloud.net/

  • Introduction to RDF/S, DLs, OWL

  • Remote change management

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]

  • Repair

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]

  • Data and Knowledge Evolution

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review


Our approach on change detection
Our Approach on Change Detection Anja Jentzsch. http://lod-cloud.net/

Purpose of this work: change detection [PFF+13]

A posteriori detect the differences (delta or diff) between versions in a concise, intuitive and correct way

Main design choices

Change detection based on a general-purposehigh-levellanguage

Human-understandable, but also machine-readable

Clear, formal semantics

Provable formal properties and functionality guarantees

Detection and application (propagation) semantics

C1

C2

C3

C4

V1

V2

V3

V4

V5


Sample evolution
Sample Evolution Anja Jentzsch. http://lod-cloud.net/

instantiation

instantiation

subsumption

subsumption

Version 1 (V1)

Version 2 (V2)

Period

participants

Actor

Event

Actor

Event

started_on

Birth

Persistent

Onset

participants

Evolution

started_on

Onset

Existing

Stuff

Stuff

Birth

participants

G_Birth

Giorgos

participants

Giorgos

G_Birth


Analyzing the evolution using triples
Analyzing the Evolution (Using Triples) Anja Jentzsch. http://lod-cloud.net/

Triples in V1 (partial list)

[Event type Class]

[Period type Class]

[Event subclass Period]

[participants type Property]

[participants domain Onset]

[participants range Actor]

[Giorgos type Actor]

[Existing type Class]

[Stuff subclass Existing]

[started_on domain Existing]

[Onset subclass Event]

[Birth subclass Onset]

Triples in V2 (partial list)

[Event type Class]

[participants type Property]

[Event domain participants]

[participants range Actor]

[Giorgos type Actor]

[Persistent type Class]

[Stuff subclass Persistent]

[started_on domain Persistent]

[Onset subclass Event]

[Birth subclass Event]


Low level delta
Low-Level Delta Anja Jentzsch. http://lod-cloud.net/

Triples in V2 but not in V1(added triples)

[Event domain participants]

[Persistent type Class]

[Stuff subclass Persistent]

[started_on domain Persistent]

[Birth subclass Event]

Triples in V1 but not in V2(deleted triples)

[Period type Class]

[Event subclass Period]

[participants domain Onset]

[Existing type Class]

[Stuff subclass Existing]

[started_on domain Existing]

[Birth subclass Onset]

Low-Level Delta

Add([Event domain participants])Add([Persistent type Class])

…Del([Period type Class])…


Analyzing the evolution visually
Analyzing the Evolution (Visually) Anja Jentzsch. http://lod-cloud.net/

instantiation

subsumption

Version 1 (V1)

Version 2 (V2)

Period

participants

Actor

Event

started_on

Actor

Event

Birth

Persistent

Onset

participants

Evolution

started_on

Onset

Existing

Stuff

participants

G_Birth

Giorgos

Stuff

Birth

High-Level Delta

Generalize_Domain(participants, Onset, Event)

Pull_Up_Class(Birth, Onset, Event)

Delete_Class(Period, Ø, {Event}, Ø, Ø, Ø, Ø)

Rename_Class(Existing, Persistent)

participants

Giorgos

G_Birth


Comparing the deltas
Comparing the Deltas Anja Jentzsch. http://lod-cloud.net/

Del([participants domain Onset])

Add([participants domain Event])

Del([Period type Class])

Del([Event subclass Period])

Del([Birth subclass Onset])

Add([Birth subclass Event])

Delete_Class (Period,Ø,{Event},Ø,Ø,Ø,Ø)

Generalize_Domain(participants, Onset, Event)

Pull_Up_Class(Birth, Onset, Event)

instantiation

subsumption

Version 1 (V1)

Version 2 (V2)

Period

participants

Actor

Event

started_on

Actor

Event

Birth

Persistent

Onset

participants

Evolution

started_on

Onset

Existing

Stuff

participants

G_Birth

Giorgos

Stuff

Birth

participants

Giorgos

G_Birth

Low-level delta

High-level delta


Associations partitioning
Associations (Partitioning) Anja Jentzsch. http://lod-cloud.net/


Challenges for high level languages
Challenges for High-Level Languages Anja Jentzsch. http://lod-cloud.net/

High-level deltas are superior

More concise (e.g., Rename_Class)

More intuitive (e.g., Pull_Up_Class)

Carry additional information (e.g., Generalize_Domain)

Challenges for high-level languages

Must be deterministic (exactly one high-level delta)

Must be fine-grained enough to capture subtle changes

Must be coarse-grained enough to be concise

Must be intuitive and close to editor’s perception of the changes

Compatible detection and application algorithms

Intuitive results

Efficient


Proposed language l
Proposed Language L Anja Jentzsch. http://lod-cloud.net/

The formal definition of a change consists of:

Changes required in the low-level delta (added/deleted triples)

Conditions that should hold in V1 and/or V2

Generalize_Domain(P, X, Y)

Del([P domain X])

Add([P domain Y])

P existing property in both V1, V2

X, Y existing classes in both V1, V2

X subclass of Y in both V1, V2

Generalize_Domain(participants, Onset, Event): detectable

Similarly for the other changes in L (132 high-level ones)


Types and number of defined changes
Types and Number of Defined Changes Anja Jentzsch. http://lod-cloud.net/

Changes

(134)

Low-Level (2)

High-Level (132)

AddDel

Basic(54)

Composite(51)

Heuristic

(27)

Delete_Subclass

Delete_Domain

Pull_Up_Class

Change_Domain

Rename_Class

Split_Class


Results on l granularity
Results on L: Granularity Anja Jentzsch. http://lod-cloud.net/

Granularity problem: solved by defining levels of changes

Basic Changes: fine-grained, roughly correspond to low-level

Composite Changes: coarse-grained, group several basic changes together

Heuristic Changes: based on heuristics, necessary for Rename, Merge, Split etc; require mappings between URIs

Problems with determinism

One evolution could correspond to different sets of basic/composite changes

Priorities in detection

Heuristic  Composite  Basic


Results on l determinism
Results on L: Determinism Anja Jentzsch. http://lod-cloud.net/

Each low-level change is associated with exactly one detectable high-level change

Full partitioning of low-level changes into high-level ones

Each pair of versions (V1, V2) is associated with:

Exactly one low-level delta

Exactly one high-level delta

Determinism is necessary

More than one would lead to ambiguities

Less than one would make some inputs (V1, V2) irresolvable


Results on l propagation
Results on L: Propagation Anja Jentzsch. http://lod-cloud.net/

Version 1 (V1)

Version 2 (V2)

Period

participants

Actor

Event

Actor

Event

Detect C

started_on

Birth

Persistent

Onset

participants

started_on

Apply C

Onset

Existing

Stuff

Apply C-1

Stuff

Birth

participants

G_Birth

Giorgos

participants

Giorgos

G_Birth


Results on l deltas keep version history
Results on L: Deltas Keep Version History Anja Jentzsch. http://lod-cloud.net/

Can reproduce all versions as long as you keep (any) one version and the deltas

Deltas are more concise than the versions themselves

Storage and communication efficiency

C1

C2

C3

C4

V1

V2

V3

V4

V5


Change detection evaluation
Change Detection: Evaluation Anja Jentzsch. http://lod-cloud.net/

Detection and application algorithms implemented for evaluation

Performance

Complexity: O(max{N1,N2,N2})

Performance depends on the detected changes (type, number)

Bottleneck: calculating the low-level delta (>80% of total time)

Intuitiveness

Changes in our language are used in practice

Results confirmed by literature/editor notes (CIDOC, GO)

Better than CIDOC’s manually recorded changes (18 changes missed)

Conciseness

Basic ≈ Low-Level

Basic + Composite + Heuristic << Low-Level

Up to 80% reduction, depending on the case


Summary and conclusions rcm
Summary and Conclusions: RCM Anja Jentzsch. http://lod-cloud.net/

  • Remote change management is at the heart of LOD

    • Uncontrolled character of LOD makes it critical

  • Various related fields

    • Versioning, change monitoring and detection, change propagation

    • Unfortunately, not used in practice in LOD

  • Presented a formal approach for change detection [PFF+13]

  • Other possible directions (related to LOD)

    • Best practices should be studied and promoted

      • Automated versioning and monitoring mechanisms embedded in evolution tools/editors

      • Understand and use temporal and provenance metadata on versions

    • Improved change monitoring and detection

      • A standard change language?


Talk structure c1
Talk Structure (C1) Anja Jentzsch. http://lod-cloud.net/

  • Introduction to RDF/S, DLs, OWL

  • Remote change management

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]

  • Repair

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]

  • Data and Knowledge Evolution

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review


Motivation for repair
Motivation for Repair Anja Jentzsch. http://lod-cloud.net/

Published data is usually problematic

Several different types of problems in LOD [HHP+10]

Pedantic web initiative (http://pedantic-web.org/)

Advice for data owners on how to prevent common problems in their data


Causes of data problems
Causes of Data Problems Anja Jentzsch. http://lod-cloud.net/

  • Several reasons for data problems

    • Erroneous data (faulty sensors, human mistakes etc)

    • Different symbolisms and terminology

    • Modeling errors (e.g., all birds fly)

    • Requirements (constraints) on the data may change

      • E.g., when applications’ needs change

    • Reuse data by different providers (no quality guarantees)

    • Quality jeopardized by re-use and open evolution

    • Integration/merging of datasets


Generic approaches
Generic Approaches Anja Jentzsch. http://lod-cloud.net/

  • Four ways to deal with problems in data [HHH+05]

    • Prevent it (careful evolution, merging etc)

      • Can only prevent problems caused by changes in the local dataset

    • Correct it (repair)

      • Actively address the problem (after it appears)

    • Ignore it (consistent query answering, non-monotonic reasoning)

      • CQA: popular in database community; prevents user from noticing the problem by rewriting queries (common denominator approach)

      • NMR: popular in AI community; avoid trivialization of reasoning (paraconsistent reasoning, defeasible reasoning, default reasoning, …)

    • Use versions (versioning)

      • Make sure you refer to the correct (compatible) version

      • Only when the problem is due to a remote change


Subfields of repair
Subfields of Repair Anja Jentzsch. http://lod-cloud.net/

Cleaning

Mainly related to literal quality

Terminology, symbols, metric units etc

Debugging

Consistency (at least one model)

Coherency (no unsatisfiable classes)

Relevant for DL/OWL only

Validity repair

Satisfaction of custom integrity constraints (e.g., business rules)

Expressed in OWL, DL, Datalog or predicate logic

Quality enhancement

Assessing and improving the quality of data

Different dimensions (timeliness, completeness, reputation, …)


Cleaning
Cleaning Anja Jentzsch. http://lod-cloud.net/

  • Literals in LOD are often messy, and have to be “cleaned up”

    • Different formats for names, dates etc

      • &gf name “Giorgos Flouris” &gf name “Flouris, Giorgos”

      • &gf birth_date 03/05/76 &gf birth_date 05/03/76

      • &gf birthplace “Hellas” &gf birthplace “Greece”

    • Different symbols

      • Paris land_area 105,4 Paris land_area 105.4

      • Paris population 2.234.105 Paris population 2,234,105

    • Different metric units

      • Paris land_area 105,4 Paris land_area 40,7

      • &x price 30 &x price 39

    • Inconsistent values

      • &x price 0 &x price “free”

    • Data is not in the desired form (data transformation)

      • LIP6 addr “4, P. Jussieu” LIP6 street “P. Jussieu” LIP6 streetno 4


Debugging

has Anja Jentzsch. http://lod-cloud.net/Horns

hasHorns

Horse

Unicorn

Debugging

  • Coherency

    • No unsatisfiable classes

    • Indicates good modeling

  • Consistency

    • At least one model

    • Avoids reasoning triviality

  • Relevant for DL/OWL only

canFly

canFly

Bird

Penguin

Pengo


Validity repair
Validity Repair Anja Jentzsch. http://lod-cloud.net/

  • Validity repair

    • Satisfaction of custom integrity constraints (e.g., business rules)

    • Encode context- or application-specific requirements

      • PROV-DM: http://www.w3.org/TR/2013/REC-prov-constraints-20130430/

    • Applications may be useless over invalid data

  • Expressed in OWL, DL, Datalog, Datalog±, predicate logic, …

    • Different expressive power

    • Different semantics (OWA/CWA, UNA) [TSBM10, MHS09]

  • Various types of constraints

    • Functional, inverse functional, transitivity, cardinality constraints

    • Disjointness constraints

    • Primary key, foreign key, inclusion constraints

    • Tuple-generating dependencies (tgd), equality-generating dependencies (egd)


Quality enhancement
Quality Enhancement Anja Jentzsch. http://lod-cloud.net/

  • Quality is defined as “fitness for use” [Jur74]

    • Multi-faceted (timeliness, completeness, reputation, …)

    • Task-dependent

    • Subjective

  • Assessing quality

    • Via assessment functions (e.g., [BC09]) or SPARQL queries (e.g., [FH10])

    • Some kind of combined scoring over the relevant dimensions

  • Improving (enhancing) quality

    • Usually manual

    • Tries to improve the assessment score


Talk structure c2
Talk Structure (C2) Anja Jentzsch. http://lod-cloud.net/

  • Introduction to RDF/S, DLs, OWL

  • Remote change management

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]

  • Repair

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]

  • Data and Knowledge Evolution

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review


Cleaning tool openrefine
Cleaning Tool: OpenRefine Anja Jentzsch. http://lod-cloud.net/

Open source

Originally developed by google (GoogleRefine)

http://openrefine.org/

Applies on various representations of the input data

CSV/TSV, Excel, JSON, XML, RDF as XML, etc

RDF extension

Functionalities (related to this talk)

Data exploration and cleaning

Both automated and manual (interface assists in manual cleaning)

Data transformation (format conversion)

Uses GREL (Google Refine Expression Language) and regular expressions


Cleaning tool odcleanstore
Cleaning Tool: ODCleanStore Anja Jentzsch. http://lod-cloud.net/

  • Web application, written in Java

    • Developed by Charles University (Prague)

    • http://www.ksi.mff.cuni.cz/~knap/odcs/sections/odcs.html

  • Functionalities (related to this talk)

    • Cleaning

      • Via “transformers” (policies for cleaning)

      • Expressed using SPARQL or regular expressions

    • Quality assessment

      • Transformer assigns a score to data

    • Validity repair

      • Supports conflict resolution for functional properties

      • Decides what to drop based on the quality of the data items involved

      • Supports aggregation functionalities based on “aggregation policies”


Other cleaning approaches
Other Cleaning Approaches Anja Jentzsch. http://lod-cloud.net/

  • Involve users in the loop [KHS12]

    • Manual requests for improvements (cleaning, quality, …)

    • Patch Request Ontology (PRO)

    • Use a GWAP (Game With A Purpose) for identifying data problems


Debugging literature overview
Debugging: Literature Overview Anja Jentzsch. http://lod-cloud.net/

  • Identify and resolve inconsistency/incoherency

  • Two phases

    • Diagnosis: identify inconsistency/incoherency

    • Repair: remove inconsistency/incoherency

  • Literature mostly dealing with diagnosis

    • Repair requires additional user input

  • Diagnosis is more than reasoning

    • Pinpoint the causes of inconsistency/incoherency

  • Repair

    • User input required (manual or semi-automatic approaches)

    • Automatic approaches also require user input or domain knowledge (ad-hoc solutions)


Debugging approaches
Debugging Approaches Anja Jentzsch. http://lod-cloud.net/

  • Diagnosis using tableau-based algorithms for various DLs

    • Identify minimal sets of responsible axioms

      • [SC03, MLBP06, PT06, WHR+05]

    • Identify responsible parts of axioms (more fine-grained)

      • [KPS+06, LPSV06]

  • Repair

    • Manual: editors and related tools

      • Onion [MWK00], PROMPT [NM00], Chimaera [MFRW00]

    • Semi-automatic

      • Interactive approach via suggestions: ORE tool [LB10]

    • Automatic:

      • Using external information, e.g., for stratified datasets [QP07, MLB05]


Validity repair literature overview
Validity Repair: Literature Overview Anja Jentzsch. http://lod-cloud.net/

  • Identify and resolve invalidity(custom constraints)

  • Two phases

    • Diagnosis: identify invalidity

    • Repair: remove invalidity

  • Literature mostly dealing with diagnosis

    • Repair requires additional user input

  • Diagnosis is more than validation

    • Pinpoint the causes of invalidity

  • Repair

    • User input required (manual or semi-automatic approaches)

    • Automatic approaches also require user input or domain knowledge (ad-hoc solutions)


Validity repair approaches
Validity Repair Approaches Anja Jentzsch. http://lod-cloud.net/

  • Not much work in repairing custom constraints in LOD

    • A large body of related work for the relational setting

      • For various constraint types and repair methodologies

  • Existing tools

    • Stardog (http://www.stardog.com/docs/)

      • Commercial RDF database that supports validation of custom constraints

    • Rondo (relational/XML) [Mel04]

      • Repair based on a fixed “importance” of data items

    • Declarative repairing based on preferences [RFC11]

      • To be discussed in detail later

    • Repairing functional properties ([FRPV+12], Sieve [MMB12])


Data quality frameworks 1 4
Data Quality Frameworks (1/4) Anja Jentzsch. http://lod-cloud.net/

  • Many different quality assessment methodologies and frameworks

    • Several different quality dimensions

    • Different works consider different dimensions

    • Different proposals for their classification and organization

  • There is no single, generally accepted data quality framework

    • Cannot be one

    • Different applications have different needs


Data quality frameworks 2 4
Data Quality Frameworks (2/4) Anja Jentzsch. http://lod-cloud.net/

  • Quality dimensions, quality indicators, scoring functions and assessment metrics [BC09]

    • Different quality dimensions

      • Timeliness, completeness, reputation, …

    • Each dimension associated with different indicators

      • Timeliness: last modification date, creation date, …

    • Each indicator associated with different scoring functions

      • E.g., days since last update

    • Scoring functions from relevant indicators are combined using assessment metrics

      • E.g., Reputation_value*0,6 + days_since_update*0,4


Data quality frameworks 3 4
Data Quality Frameworks (3/4) Anja Jentzsch. http://lod-cloud.net/

[RH09]


Data quality frameworks 4 4
Data Quality Frameworks (4/4) Anja Jentzsch. http://lod-cloud.net/

[ADA98]


Talk structure c3
Talk Structure (C3) Anja Jentzsch. http://lod-cloud.net/

  • Introduction to RDF/S, DLs, OWL

  • Remote change management

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]

  • Repair

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]

  • Data and Knowledge Evolution

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review


Our approach on validity repair
Our Approach on Validity Repair Anja Jentzsch. http://lod-cloud.net/

  • Declarative approach for validity repair [RFC11]

  • Main design choices

    • Both diagnosis and repair

    • Applicable for RDF/S

    • Adopted relational semantics (CWA) for the constraints

    • Generality on the supported constraints (DEDs)

    • Minimal user interaction (all info provided at input)

    • Automatic diagnosis

    • Automatic repair using preferences (provided by the user at input)


Rdf s representation model
RDF/S Representation Model Anja Jentzsch. http://lod-cloud.net/

  • Express RDF/S over an adequate relational schema

    • Hybrid method

      • C_IsA(A,B): A is a subclass of B

      • C_Inst(x,A): x is an instance of A

      • Domain(P,A): the domain of P is A

  • Alternatives

    • Schema-specific

      • One table/predicate for each class/property (A(x), B(x), P(x,y), …)

      • Not amenable to changes (e.g., delete class)

    • Schema-agnostic (triple-store)

      • One table with three columns (spo)

      • Harder to define constraints, less intuitive


Allowed constraints
Allowed Constraints Anja Jentzsch. http://lod-cloud.net/

  • Considered a very general class of constraints

    • Disjunctive Embedded Dependencies (DEDs) [Deu09]

  • Very general class

    • Functional, inverse functional, transitivity, cardinality constraints

    • Disjointness constraints

    • Primary key, foreign key, inclusion constraints

    • Tuple-generating dependencies (tgd), equality-generating dependencies (egd)


Constraints
Constraints Anja Jentzsch. http://lod-cloud.net/

  • Express validity constraints over the aforementioned schema:

    • Class subsumption must be acyclic

      • x,y C_IsA(x,y)  C_IsA(y,x) ⊥

    • Correct classification in property instances

      • x,y,p,a P_Inst(x,y,p)  Domain(p,a)  C_Inst(x,a)

      • x,y,p,a P_Inst(x,y,p)  Range(p,a)  C_Inst(y,a)

  • Closed World Assumption (CWA)

    • Failure to prove something, is a proof for its negation

  • Syntactical manipulations on constraints allow

    • Diagnosis

      • Finding violated constraints

    • Repair

      • Identifying repairing options per violation


Repairing example
Repairing Example Anja Jentzsch. http://lod-cloud.net/

Correct classification in property instancesx,y,p,a P_Inst(x,y,p)  Domain(p,a)  C_Inst(x,a)

Dataset D0

Class(Sensor), Class(SpatialThing), Class(Observation)

Prop(geo:location)

Domain(geo:location,Sensor)

Range(geo:location,SpatialThing)

Inst(Item1), Inst(ST1)

P_Inst(Item1,ST1,geo:location)

C_Inst(Item1,Observation), C_Inst(ST1,SpatialThing)

geo:location

SpatialThing

Sensor

Observation

Schema

Data

geo:location

Item1

ST1

Item1 geo:location ST1

Sensor is the domain of geo:location

Item1 is not a Sensor

P_Inst(Item1,ST1,geo:location)D0

Domain(geo:location,Sensor)D0

C_Inst(Item1,Sensor)D0

  • Remove P_Inst(Item1,ST1,geo:location)

  • Remove Domain(geo:location,Sensor)

  • Add C_Inst(Item1,Sensor)


Preferences for repair
Preferences for Repair Anja Jentzsch. http://lod-cloud.net/

Which repairing option is best?

Data owner determines that via preferences

Preferences

Specified beforehand

High-level “specifications” for the ideal repair

Serve as “instructions” to determine the preferred (optimal) solution


Preferences on datasets
Preferences (On Datasets) Anja Jentzsch. http://lod-cloud.net/

D1

D2

D3

D0

Score: 3

Score: 4

Score: 6


Preferences on deltas
Preferences (On Deltas) Anja Jentzsch. http://lod-cloud.net/

D1

D2

D3

D0

Score: 2

-P_Inst (Item1,ST1, geo:location)

-Dom (geo:location,Sensor)

Score: 1

+C_Inst (Item1,Sensor)

Score: 5


More details on preferences
More Details on Preferences Anja Jentzsch. http://lod-cloud.net/

Preferences on datasets are result-oriented

Consider the quality of the repair result

Ignore the impact of repair

Popular options: prefer newest/trustable information, prefer a specific schema structure

Preferences on deltas are impact-oriented

Consider the impact of repair

Ignore the quality of the repair result

Popular options: minimize schema changes, minimize addition/deletion of information, minimize delta size

Properties of preferences

Quality metrics can be used for stating preferences

Metadata on the data can be used (e.g., provenance)

Can be qualitative or quantitative


Generalizing the approach
Generalizing the Approach Anja Jentzsch. http://lod-cloud.net/

For one violated constraint

Diagnose invalidity

Determine minimal ways to resolve it

Determine and return preferred solution based on the preference

For many violated constraints

Problem becomes more complicated

More than one resolution steps are required

Issues:

Resolution order

When and how to filter non-optimal solutions?

Constraint (and resolution) interdependencies


Constraint interdependencies
Constraint Interdependencies Anja Jentzsch. http://lod-cloud.net/

A given resolution may:

Cause other violations (bad)

Resolve other violations (good)

Optimal resolution unknown ‘a priori’

Cannot predict a resolution’s ramifications

Exhaustive, recursive search required (resolution tree)

Two ways to create the resolution tree

Globally-optimal (GO) / locally-optimal (LO)

When and how to filter non-optimal solutions?


Resolution tree creation go
Resolution Tree Creation (GO) Anja Jentzsch. http://lod-cloud.net/

  • Find all minimal resolutions for all the violated constraints, then find the optimal ones

  • Globally-optimal (GO)

    • Find all minimal resolutions for one violation

    • Explore them all

    • Repeat recursively until valid

    • Return the optimal leaves

Optimal repairs (returned)


Resolution tree creation lo
Resolution Tree Creation (LO) Anja Jentzsch. http://lod-cloud.net/

  • Find the minimal and optimal resolutions for one violated constraint, then repeat for the next

  • Locally-optimal (LO)

    • Find all minimal resolutions for one violation

    • Explore the optimal one(s)

    • Repeat recursively until valid

    • Return all remaining leaves

Optimal repair (returned)


Comparison go versus lo

Characteristics of GO Anja Jentzsch. http://lod-cloud.net/

Exhaustive

Less efficient: large resolution trees

Always returns optimal repairs

Insensitive to constraint syntax

Deterministic (result does not depend on resolution order)

Characteristics of LO

Greedy

More efficient: small resolution trees

May return sub-optimal repairs

Sensitive to constraint syntax

Non-deterministic (result may depend on resolution order)

Comparison (GO versus LO)


Repair generality results
Repair: Generality Results Anja Jentzsch. http://lod-cloud.net/

  • The approach is very general

    • Thanks to the generality/flexibility of preferences

  • Repair approaches can be captured using adequately designed preferences

    • Using either the LO or the GO strategy

    • All the current approaches that we checked

    • Practically all future ones

      • This has been proved, under some general conditions regarding the behavior of the repair approach

  • Our model can be viewed as a general approach engulfing other repair approaches


Repair algorithms and complexity
Repair: Algorithms and Complexity Anja Jentzsch. http://lod-cloud.net/

Implemented both algorithms

Detailed complexity analysis for GO/LO and various different types of constraints and preferences

Inherently difficult problem

Exponential complexity (in general)

Main exception: LO is polynomial (in special cases)

Theoretical complexity is misleading as to the actual performance of the algorithms


Performance in practice
Performance in Practice Anja Jentzsch. http://lod-cloud.net/

Performance in practice

Linear with respect to dataset size

Linear with respect to tree size

Types of violated constraints (tree width)

Number of violations (tree height) – causes the exponential blowup

Constraint interdependencies (tree height)

Preference (for LO): affects pruning (tree width)

Further performance improvement

Use optimizations

Use LO with restrictive preference

Currently considering a redesign for further improvement


Summary and conclusions repair
Summary and Conclusions: Repair Anja Jentzsch. http://lod-cloud.net/

  • Data usually problematic

    • Different types of problems

  • Repair is done using different approaches depending on the type of the problem

    • Cleaning, debugging, repairing, quality assessment and enrichment

  • Presented a formal approach for validity repair [RFC11]

  • Other possible directions (related to LOD)

    • Most approaches detect problems, but don’t resolve them

    • Efficiency problems (for repairing algorithms)

    • Exploit external knowledge on the cause of the problem (e.g., propagation of invalidity by a linked dataset)


Talk structure d1
Talk Structure (D1) Anja Jentzsch. http://lod-cloud.net/

  • Introduction to RDF/S, DLs, OWL

  • Remote change management

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]

  • Repair

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]

  • Data and Knowledge Evolution

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review


Motivation for evolution
Motivation for Evolution Anja Jentzsch. http://lod-cloud.net/

Reasons for evolution

New observations or experiments

Change in the viewpoint or usage of the dataset

Newly gained access to information (previously classified, unknown or otherwise unavailable)

Incomplete or inaccurate conceptualization

Changes in the world itself

Repairing

Change propagation (cascading evolution in LOD)

Not an LOD-specific problem

But critical for LOD as well


Definition of evolution
Definition of Evolution Anja Jentzsch. http://lod-cloud.net/

The process of modifying a dataset in response to a change in the domain or its conceptualization

Dealing with both data and schema changes

OriginalDataset

ModifiedDataset

EvolutionAlgorithm

NewData/Knowledge


Evolution setting the scope
Evolution: Setting the Scope Anja Jentzsch. http://lod-cloud.net/

  • Evolution is an overloaded term

  • Phases of evolution

    • Six phases in [SMMS02], five phases in [PT05]

    • Detecting the need for evolution, change propagation, logging changes, versioning etc

  • Scope: apply the change and compute the new dataset

    • Out of scope: deciding on the change, evaluating the result, managing versions, logging changes etc


Explaining evolution 1 4
Explaining Evolution (1/4) Anja Jentzsch. http://lod-cloud.net/

Chess Dataset

Representation Language: RDF

Change: Add([King rdf:type Red])

ChessPiece

Wooden

Plastic

Red

White

Black

Schema Level

Data Level

King


Explaining evolution 2 4
Explaining Evolution (2/4) Anja Jentzsch. http://lod-cloud.net/

Chess Dataset

Representation Language: RDF

Change: Del([King rdf:type Black])

Is the King Wooden?

ChessPiece

Wooden

Plastic

Red

White

Black

Schema Level

Data Level

King


Explaining evolution 3 4
Explaining Evolution (3/4) Anja Jentzsch. http://lod-cloud.net/

Chess Dataset

Representation Language: RDF

Change: Del([King rdf:type Wooden])

Some domain knowledge required(extra-logical considerations)

ChessPiece

Wooden

Plastic

Red

White

Black

Schema Level

Data Level

King


Explaining evolution 4 4
Explaining Evolution (4/4) Anja Jentzsch. http://lod-cloud.net/

Chess Dataset

Representation Language: OWL

Wooden and Plastic are disjoint

[Wooden owl:disjointClass Plastic]

Change: Add([King rdf:type Plastic])

Is the King Black?

Is the King Wooden?

ChessPiece

Wooden

Plastic

disjoint

Red

White

Black

Schema Level

Data Level

King


Side effects in evolution
Side-effects in Evolution Anja Jentzsch. http://lod-cloud.net/

  • Changes should not undermine the “quality” of the dataset

    • Side-effects: additional changes that need to be applied along with the original change to maintain knowledge integrity and quality

    • Consistency, coherency, custom constraints, quality metrics, …

  • Main challenge in determining the evolution result

    • Determining side-effects


Determining side effects
Determining Side-effects Anja Jentzsch. http://lod-cloud.net/

  • Challenges in determining side-effects

    • Evolution result not always obvious (even for humans)

      • Understand the process of change

      • Various philosophical considerations involved

    • Selection involved (extra-logical considerations)

      • Domain expertise

      • Preferences (trust, provenance, axiom “strength” or “entrenchment”)

  • Early evolution approaches rather naïve in this respect

    • Ignored such issues or addressed them in an ad-hoc manner


Belief change
Belief Change Anja Jentzsch. http://lod-cloud.net/

Belief change (often referred to as belief revision)

The process of modifying a knowledgebase in the face of new, possibly contradictory knowledge

Mature, well-established field

Focuses for logical formalisms (propositional, first-order logic)

Recent survey on belief change [FH11]

Aims to understand the process of change

The philosophical/logical counterpart of dataset evolution

Can provide solutions and inspiration


Cross fertilization with belief change
Cross-Fertilization with Belief Change Anja Jentzsch. http://lod-cloud.net/

  • Cross-fertilization beneficial [Flo06, FPA05, FPA06]

  • Benefits

    • Similar problems

    • Differences on the underlying intuitions are minimal

    • Belief change field more mature

    • Frame problems and provide inspiration towards a solution

    • Protect from pitfalls

    • Avoid “reinventing the wheel”

  • Problems

    • Representation languages and formalisms are different

    • Assumptions regarding the underlying representation language

      • These assumptions do not hold for LOD representation languages

  • Can reuse the ideas, not the results themselves


Talk structure d2
Talk Structure (D2) Anja Jentzsch. http://lod-cloud.net/

  • Introduction to RDF/S, DLs, OWL

  • Remote change management

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]

  • Repair

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]

  • Data and Knowledge Evolution

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review


Challenges and considerations
Challenges and Considerations Anja Jentzsch. http://lod-cloud.net/

  • List of challenges and problems related to evolution

    • As well as some answers from the belief change field

  • Challenges and the complexity of formalisms

    • Some of the problems do not appear in simpler formalisms (RDF)

    • Some of the problems are only relevant in the presence of schema

      • Data changes are simpler (on a fixed schema)

    • Part of the discussion only relevant for DL, OWL


Importance of implicit data example
Importance of Implicit Data (Example) Anja Jentzsch. http://lod-cloud.net/

Chess Dataset

Representation Language: RDF

Change: Del([King rdf:type Black])

Is the King Wooden?

ChessPiece

Wooden

Plastic

Red

White

Black

Schema Level

Data Level

King


Importance of implicit data

Explicit and implicit data Anja Jentzsch. http://lod-cloud.net/equally important

The coherence viewpoint

King is Wooden

The closure of the dataset is considered during changes

Belief set semantics

Implicit data persistent

Explicit support not necessary for implicit data

No discrimination

No need to distinguish explicit data from implicit

Redundant data can be deleted

Explicit data more important than implicit

The foundational viewpoint

King is not Wooden

Only explicit knowledge is considered during changes

Belief base semantics

Implicit data volatile

Retained only as long as there is explicit support

Discrimination

Explicit data should be explicitly marked as such

Redundant data should persist

Importance of Implicit Data


Redundant data
Redundant Data Anja Jentzsch. http://lod-cloud.net/

Chess Dataset

Representation Language: RDF

Change: Add([King rdf:type Black])

ChessPiece

Wooden

Plastic

Red

White

Black

Schema Level

Data Level

King


The king is black
The King Is Black Anja Jentzsch. http://lod-cloud.net/

Chess Dataset

Representation Language: RDF

Observation: the King is Black

Change: Add([King rdf:type Black])

Is the King Wooden?

ChessPiece

Wooden

Plastic

Red

White

Black

Schema Level

Data Level

King


Paint it black
Paint It Black Anja Jentzsch. http://lod-cloud.net/

Chess Dataset

Representation Language: RDF

Action: King is painted Black

Change: Add([King rdf:type Black])

Is the King Wooden?

ChessPiece

Wooden

Plastic

Red

White

Black

Schema Level

Data Level

King


Static and dynamic worlds
Static and Dynamic Worlds Anja Jentzsch. http://lod-cloud.net/

  • Same dataset, same change, but different expected result

    • Different semantics between the two cases [KM91]

    • Different operations

  • Static world change semantics

    • The world does not change, but our perception of it changes

    • Modeling or conceptualization problems, new observation etc

  • Dynamic world change semantics

    • The world changes, and we need to keep ourselves up-to-date

    • No problems with the original conceptualization


Types of operations
Types of Operations Anja Jentzsch. http://lod-cloud.net/

  • Static world

    • Revision (add)

    • Contraction (delete)

  • Dynamic world

    • Update (add)

    • Erasure (delete)

  • Plus some more (forget, expansion, …)

    • Less well-studied

    • Ignored for this talk

    • Irrelevant for LOD or trivial


Example revision and contraction
Example: Revision and Contraction Anja Jentzsch. http://lod-cloud.net/

Chess Dataset

Representation Language: OWL

Change #1I believe that the King is not Black

Add([King rdf:type NotBlack],

[NotBlack owl:complementOf Black])

Change #2

I do not believe that the King is Black

Del([King rdf:type Black])

ChessPiece

Wooden

Plastic

Red

White

Black

NotBlack

Schema Level

Data Level

King


Expressing the change
Expressing the Change Anja Jentzsch. http://lod-cloud.net/

  • Different paradigms for expressing the change

    • Modification-based

      • “Add([King rdf:type NotBlack], [NotBlack owl:complementOf Black])”

      • The exact modifications that should be applied to accommodate the new knowledge

      • Must know the conceptualization

      • Closer to the ontology expert

    • Fact-based

      • “I believe that the King is not Black”

      • A new fact that should be accommodated in the dataset

      • Extra layer of abstraction (extra step required to determine modifications)

      • Closer to the domain expert

  • Handling multiple changes

    • Iterated belief change

    • Package versus choice semantics (contraction and erasure)

    • Merging


Evolution principles partial list
Evolution Principles (Partial List) Anja Jentzsch. http://lod-cloud.net/

  • Principle of Success (Primacy of New Information)

    • New information is unconditionally accepted

    • Non-prioritized belief change

  • Principle of Validity (Consistency Maintenance)

    • Belief change: usually logical consistency

    • LOD evolution: consistency, coherency, custom constraints, …

  • Principle of Minimal Change

    • Determine the side-effects that have minimal impact

      • But satisfying the other principles

    • Corresponds to the selection process

    • Minimality depends on the task, context, user, application, …

    • Different postulates and intuitions (recovery, relevance etc)

    • Different metrics (model-based, formula-based, cardinality etc)


Understanding the principles
Understanding the Principles Anja Jentzsch. http://lod-cloud.net/

Chess Dataset

Representation Language: OWL

Wooden and Plastic are disjoint

[Wooden owl:disjointClass Plastic]

Change: Add([King rdf:type Plastic])

Invalidity (basically, inconsistency)The King is both Wooden and Plastic

Three options (Minimal Change)

ChessPiece

Wooden

Plastic

disjoint

Red

White

Black

Schema Level

Data Level

King


Non obvious side effects
Non-obvious Side-effects Anja Jentzsch. http://lod-cloud.net/

Chess Dataset

Representation Language: ALC DL

I don’t believe that all White items are Chess_Pieces

Replace subsumptions with:

White ⊓ Chess_Piece ⊑ PlasticPlastic ⊑ White ⊔ Chess_Piece

Chess_Piece

ChessPiece

White

White

Plastic

Wooden

Plastic

Red

White

Black

Schema Level

Data Level

King


Talk structure d3
Talk Structure (D3) Anja Jentzsch. http://lod-cloud.net/

  • Introduction to RDF/S, DLs, OWL

  • Remote change management

    • Introduction, definition of subfields

    • Literature review

    • An approach for change detection [PFF+13]

  • Repair

    • Introduction, definition of subfields

    • Literature review

    • An approach for validity repair [RFC11]

  • Data and Knowledge Evolution

    • Introduction, connection with belief change

    • Understanding the process of change

    • Literature review


Classes of belief change approaches 1 2
Classes of Belief Change Approaches (1/2) Anja Jentzsch. http://lod-cloud.net/

  • Postulates (one set for each operation)

    • Formalize the principles, using logical conditions

    • Essentially define the properties of a rational change operator

      • Some principles not considered or given varying semantics

      • Principle of Minimal Change is the most controversial

    • Do not uniquely define an operator

      • A class of operators (expected rational results)

      • Extra-logical considerations would determine the actual result

      • Operator-specific (preferences, axiom strength, hard-coded semantics, …)

    • Belief change context [AGM85, KM91, Han91]

    • Evolution context [FKAC13, WWT10, QLB06a, QLB06b]


Classes of belief change approaches 2 2
Classes of Belief Change Approaches (2/2) Anja Jentzsch. http://lod-cloud.net/

  • Construction methods

    • Intuitive constructions for a family of operators of a certain type

    • Representation theorems

      • Proof that the constructed family corresponds exactly to the class of operators that satisfy a certain set of postulates

    • Can be used as “templates” to construct rational change operators

    • Parameterized selection process

      • Preferences, axiom strength, etc

    • Popular in belief change, not so much in evolution

  • Explicit algorithms

    • Implement a specific operatorthat satisfies some of the postulates

    • Hard-coded or parameterized selection process

    • Popular in evolution context, not so much in belief change


Discussion on the operator types 1 2
Discussion on the Operator Types (1/2) Anja Jentzsch. http://lod-cloud.net/

  • Connections between the various operators

    • Static: revision/contraction interdefinable [AGM85]

    • Dynamic: update/erasure interdefinable [KM91]

    • Model-theoretic characterization of the connection between static/dynamic worlds (revision-update, contraction-erasure) [KM91]

  • Postulates critical for establishing those results

  • Revision and update more useful in practice

    • Contraction/erasure only used to express agnosticism

  • Contraction and erasure more interesting from a theoretical perspective

    • More fundamental operations


Discussion on the operator types 2 2
Discussion on the Operator Types (2/2) Anja Jentzsch. http://lod-cloud.net/

  • Revise with φ (in belief change)

    • Contract¬φ

      • This resolves, a priori, any potential inconsistency problems

    • Add φ (without side-effects)

  • Revise with φ (in LOD)

    • Contract data that could potentially cause problems

      • Inconsistency, incoherency, …

    • Add φ (without side-effects)

  • Contraction is the basis for revision

    • Simpler operation

    • Basically, if you know how to contract, you know how to revise

    • Most of the focus in belief change and also in LOD evolution

  • Same for update/erasure


Evolution via editors
Evolution via Editors Anja Jentzsch. http://lod-cloud.net/

Features

Intuitive interfaces

Easy to add/delete triples (but not facts)

Some help for determining the side-effects of a change

Embedded reasoners and/or debugging/repair tools to propose side-effects

Additional facilities

Versioning, monitoring, undo/redo, …

Main problems

User should be both ontology and domain expert

Not applicable in some cases

Examples: automated agents, time-critical applications, massive streaming input

No formal properties

Examples

Protégé (http://protege.stanford.edu/)

NeOn toolkit (http://neon-toolkit.org/wiki/Main_Page)

OntoStudio (http://www.semafora-systems.com/en/products/ontostudio/)

KAON2 (http://kaon2.semanticweb.org/)


Declarative approaches
Declarative Approaches Anja Jentzsch. http://lod-cloud.net/

  • SPARQL Update (http://www.w3.org/TR/sparql11-update/)

    • For RDF

    • Fixed semantics, no side-effects

    • Data and schema operations (also bulk changes)

  • RUL [MSCK05]

    • For RDF/S, taking into account RDFS semantics

    • Fixed semantics, predefined set of side-effects per operation

    • Only for data operations (also bulk changes)

  • EvoPat [RHTA10]

    • Declaratively associate changes with side-effects (using SPARQL)

    • SPARQL queries determine whether side-effects should be applied

    • SPARQL update statements represent such side-effects

  • Tempus fugit [LRV09]

    • Event-driven, declarative specification of the operators’ semantics


Fixed operations approach
Fixed-Operations Approach Anja Jentzsch. http://lod-cloud.net/

Standard approach in the early days (e.g., [SMMS02])

Set of supported operations (Add_Class, Add_Domain, …)

Identify potential problems and side-effects per operation

Decision is hard-coded or user-defined (from a set of options)

Example: when deleting a subsumption, how about implicit subsumptions?

Automatic or semi-automatic

Problems

No consensus on the language of changes

No limit on the number of operations

What about unknown/unsupported operators?

No exhaustive formal analysis of potential side-effects

No formal properties or other guarantees

Incomplete understanding of the change process


Approaches inspired by belief change 1 2
Approaches Inspired by Belief Change (1/2) Anja Jentzsch. http://lod-cloud.net/

Revision in ALU DL [LM04]

Using preferences among axioms

Inspired by “epistemic entrenchment”

Revision in generic DLs [QD09]

Three model-based revision operators for DLs

Emphasis on the Principle of Irrelevance of Syntax

Semantical, rather than syntactical, considerations should drive the result

Revision in DL-Lite [GQW12]

Using a graph-based algorithm

For data changes only (Abox)

Update and erasure in RDF/S [GHV06, GHV11]

Taking into account RDFS inference

Update is trivial, erasure is challenging (due to RDFS inference)


Approaches inspired by belief change 2 2
Approaches Inspired by Belief Change (2/2) Anja Jentzsch. http://lod-cloud.net/

Using the maxi-adjustment algorithm [MLB05, QLB06a, QLB06b]

Used to repair inconsistencies in propositional knowledge bases

Requires a stratification in the knowledge

Adapted for disjunctive DLs

Using kernel operators [Han94]

Kernels: minimal sets of formulas leading to inconsistency

Minimal Inconsistency Preserving Sub-Tboxes (MIPS) [SC03]

OWL [HWK06]

DLs [QHHP08]

Generic formalisms with no negation (such as RDF) [RW07]


Postulation approaches in evolution 1 3
Postulation Approaches in Evolution (1/3) Anja Jentzsch. http://lod-cloud.net/

AGM: dominating paradigm in belief change [AGM85]

The single most influential work in the field of belief change

Contributions

AGM postulates: two sets of 6 basic and 2 supplementary postulates

One set for each operator (revision and contraction)

Plus various related results

Partial meet contraction

Representation theorems

Connections between operators

Only for classical logics(satisfying certain assumptions)

Propositional, first-order, modal logics, …

Not for LOD formalisms (RDF/S, DLs, OWL)


Postulation approaches in evolution 2 3
Postulation Approaches in Evolution (2/3) Anja Jentzsch. http://lod-cloud.net/

AGM contraction postulates adapted for monotonic logics [Flo06, FPA05, FPA06]

Includes all LOD formalisms

But: no satisfying contraction operator exists for many such logics

Cannot find a proper result in certain cases

Necessary and sufficient conditions for the existence of such an operator [FPA06, Flo06]

Negative results for RDF/S, OWL, most DLs [FPA05, RWFA13]

Problem stems from the postulate of recovery [AGM85]

Captures the Principle of Minimal Change

Controversial [Han91]


Postulation approaches in evolution 3 3
Postulation Approaches in Evolution (3/3) Anja Jentzsch. http://lod-cloud.net/

  • Replacing recovery with optimal recovery [FPA06, FHP+06]

    • Equivalent to recovery for classical logics

    • But weaker in general

    • Not particularly successful either

  • Replacing recovery with relevance [Han91]

    • An intuitive, well-established alternative to recovery

    • Equivalent with recovery for classical logics

    • Applicable under quite general conditions [RWFA13]

      • Applicable for all compact logics

      • Includes RDF/S, practically all DLs and OWL flavors and profiles

    • Adequate for expressing the principles of contraction in LOD languages

    • Connections with recovery established for non-classical logics


Principle of adequacy of representation
Principle of Adequacy of Representation Anja Jentzsch. http://lod-cloud.net/

  • Principle of Adequacy of Representation

    • The evolution result should be expressible in the same formalism as the original dataset

    • Obvious and trivial

  • Not always compatible with our requirements for the evolution result

    • Postulates (e.g., AGM postulates)

    • Specific incarnations of the Principle of Minimal Change

    • Specific computational methods or classes of operators

  • Two stages for the computation [CGKZ12]

    • Find the “optimal” evolution result according to the requirements

    • Express it in the target language (not always possible)

      • Inexpressibility results


Inexpressibility for classes of operators
Inexpressibility for Classes of Operators Anja Jentzsch. http://lod-cloud.net/

  • Generic contraction methods [CGKZ12]

    • Syntactic: remove a minimal set of explicit axioms

    • Formula-based: remove a minimal set of axioms from the closure

      • Three different semantics for minimality

    • Model-based: modify the model in a minimal manner

      • Eight different methods to find the “minimal” distance between models

  • Existing contraction algorithms can be categorized along these generic classes of methods

  • Different contraction methods not compatible in general (for DLs)

    • Model-based and formula-based are compatible in classical logics

  • Inexpressibility results for DL-Lite, EL (i.e., OWL2 QL, OWL2 EL) [CGKZ12]

  • Proposal: a “hybrid” operator combining ideas from syntactic and formula-based approaches [CGKZ12]


More inexpressibility results
More Inexpressibility Results Anja Jentzsch. http://lod-cloud.net/

DL-Lite evolution [CKNZ10]

Focusing on model-based and formula-based approaches for contraction

Inexpressibility results

Propose a formula-based approach

DL revision [LLMW06]

Model-based approach, limited to Abox only (data level)

Inexpressibility results

Propose a new DL that supports model-based evolution

Approximations

DL-LiteF [GLPR07, GLPR09]

Update and erasure approximation algorithms for data-level changes only

Alternative: extend DL-LiteF to make sure that result is expressible

DL-Lite [WWT10]

Provide postulates and approximation algorithms for revision


Other approaches
Other Approaches Anja Jentzsch. http://lod-cloud.net/

Evolution using ideas from argumentation frameworks [MRF08]

ALC DL

Inconsistency in a dataset is an “attack” between arguments

Acceptability semantics used to resolve such attacks and eliminate inconsistencies

Useful for both debugging and evolution

Evolution can be reduced to debugging/repair [HHH+05]

Apply the change

Then repair the result to resolve problems (Principle of Validity)

Making sure the change is not “undone” during repair (Principle of Success)


Evolution under custom constraints
Evolution Under Custom Constraints Anja Jentzsch. http://lod-cloud.net/

  • Evolution in the presence of custom validity constraints [KFAC07, FKAC13]

  • Methodology

    • Apply the change (Principle of Success)

    • Guarantee satisfaction of constraints (Principle of Validity)

    • Use a preference to determine minimality (Principle of Minimal Change)

  • Features

    • Generic method, applied for RDF/S evolution

    • A formal expression of the principles for the proposed setting

    • Exhaustive method to determine all possible side-effects and identify the “best” (according to the preference)

    • Constrain allowed preferences for rationality and performance

  • Based on similar ideas as the repairing approach of [RFC11]


Summary and conclusions evolution
Summary and Conclusions: Evolution Anja Jentzsch. http://lod-cloud.net/

  • The problem of evolution is very challenging

    • Several issues need to be considered

      • Not obvious to a newcomer

      • Often ignored

  • Evolution approaches

    • Direct: manual, based on fixed operators, declarative

    • Indirect: postulation attempts

    • Adapted: adapting belief change algorithms or methods

  • Other possible directions (related to LOD)

    • Adapt for the “linked” character of LOD

      • Evolution during propagation or after change detection

      • Extra knowledge that can be exploited for adapting preferences, fine-tuning of automated algorithms etc


Thank you
Thank You! Anja Jentzsch. http://lod-cloud.net/


References 1 18
References (1/18) Anja Jentzsch. http://lod-cloud.net/

[AAM09] C. Allocca, M. d'Aquin, E. Motta. Detecting Different Versions of Ontologies in Large Ontology Repositories. IWOD-09, 2009.

[ADA98] M.L. Abate, K.V. Diegert, H.W. Allen. A Hierarchical Approach to Improving Data Quality. Data Quality Journal, 4(1), 1998.

[AGM85] C. Alchourron, P. Gärdenfors, D. Makinson. On the Logic of Theory Change: Partial Meet Contraction and Revision Functions. Journal of Symbolic Logic, 50:510-530, 1985.

[AH06] S. Auer, H. Herre. A Versioning and Evolution Framework for RDF Knowledge Bases. PSI-06, Revised Papers, 2006.

[BC09] C. Bizer, R. Cyganiak. Quality-driven Information Filtering Using the WIQA Policy Framework. Journal of Web Semantics, 7:1–10, 2009.

[BLHL01] T. Berners-Lee, J. Hendler, O. Lassila, The Semantic Web. Scientific American, 2001.


References 2 18
References (2/18) Anja Jentzsch. http://lod-cloud.net/

  • [CGKZ12] B. Cuenca Grau, E. Kharlamov, D. Zheleznyakov. Ontology Contraction: Beyond the Propositional Paradise. AMW-12, 2012.

  • [CKNZ10] D. Calvanese, E. Kharlamov, W. Nutt, D. Zheleznyakov. Evolution of DL-Lite Knowledge Bases. ISWC-10, 2010.

  • [CMDZ10] C.A. Curino, H.J. Moon, A. Deutsch, C. Zaniolo. Update Rewriting and Integrity Constraint Maintenance in a Schema Evolution Support System: PRISM++. PVLDB 4(2):117-128, 2010.

  • [CMZ08] C.A. Curino, H.J. Moon, C. Zaniolo. Graceful Database Schema Evolution: The PRISM Workbench. PVLDB 1(1):761-772, 2008.

  • [CQ13] G. Cheng, Y. Qu. Relatedness Between Vocabularies on the Web of Data: A Taxonomy and an Empirical Study. Web Semantics: Science, Services and Agents on the World Wide Web, 2013. Available at: http://dx.doi.org/10.1016/j.websem.2013.02.001


References 3 18
References (3/18) Anja Jentzsch. http://lod-cloud.net/

  • [Deu09] A. Deutsch. FOL Modeling of Integrity Constraints (Dependencies). Encyclopedia of Database Systems, 2009.

  • [DA09] R. Djedidi, M. Aufaure. Change Management Patterns (CMP) for Ontology Evolution Process. IWOD-09, 2009.


References 4 18
References (4/18) Anja Jentzsch. http://lod-cloud.net/

  • [FH10] C. Furber, M. Hepp. Using Semantic Web Resources for Data Quality Management. EKAW-10, 2010.

  • [FH11] E. Ferme, S.O. Hansson. AGM 25 Years: Twenty-five Years of Research in Belief Change. Journal of Philosophical Logic 40:295-331, 2011.

  • [FHP+06] G. Flouris, Z. Huang, J.Z. Pan, D. Plexousakis, H. Wache. Inconsistencies, Negations and Changes in Ontologies. AAAI-06, 2006.

  • [FKAC13] G. Flouris, G. Konstantinidis, G. Antoniou, V. Christophides. Formal Foundations for RDF/S KB Evolution. International Journal on Knowledge and Information Systems, 35(1):153-191, 2013.

  • [Flo06] G. Flouris. On Belief Change and Ontology Evolution. Ph.D. thesis, University of Crete, 2006.


References 5 18
References (5/18) Anja Jentzsch. http://lod-cloud.net/

  • [FPA05] G. Flouris, D. Plexousakis, G. Antoniou. On Applying the AGM Theory to DLs and OWL. ISWC-05, 2005.

  • [FPA06] G. Flouris, D. Plexousakis, G. Antoniou. On Generalizing the AGM Postulates. STAIRS-06, 2006.

  • [FMK+08] G. Flouris, D. Manakanatas, H. Kondylakis, D. Plexousakis, G. Antoniou. Ontology Change: Classification and Survey. Knowledge Engineering Review, 23(2):117-152, 2008.

  • [FMV10] E. Franconi, T. Meyer, I. Varzinczak. Semantic Diff as the Basis for Knowledge Base Versioning. NMR-10, 2010.

  • [FRPV+12] G. Flouris, Y. Roussakis, M. Poveda-Villalon, P.N. Mendes, I. Fundulaki. Using Provenance for Quality Assessment and Repair in Linked Open Data. EvoDyn-12, 2012.


References 6 18
References (6/18) Anja Jentzsch. http://lod-cloud.net/

  • [GHV06] C. Gutierrez, C. Hurtado, A. Vaisman. The Meaning of Erasing in RDF Under the Katsuno-Mendelzon Approach. WebDB-06, 2006.

  • [GHV11] C. Gutierrez, C. Hurtado, A. Vaisman. RDFS Update: From Theory to Practice. ESWC-11, 2011.

  • [GLPR07] G. Giacomo, M. Lenzerini, A. Poggi, R. Rosati. On the Approximation of Instance Level Update and Erasure in Description Logics. AAAI-07, 2007.

  • [GLPR09] G. Giacomo, M. Lenzerini, A. Poggi, R. Rosati. On Instance-level Update and Erasure in Description Logic Ontologies. Journal of Logic and Computation 19(5):745-770, 2009.

  • [GQW12] S. Gao, G. Qi, H. Wang. A New Operator for ABox Revision in DL-Lite. AAAI-12, 2012.

  • [Gru93] T.R. Gruber. A Translation Approach to Portable Ontology Specifications. Knowledge Acquisition, 5 (2), 1993.


References 7 18
References (7/18) Anja Jentzsch. http://lod-cloud.net/

  • [Han91] S.O. Hansson. Belief Contraction Without Recovery. Studia Logica 50(2):251-260, 1991.

  • [Han94] S.O. Hansson. Kernel Contraction. Journal of Symbolic Logic, 59(3):845-859, 1994.

  • [HGR12] M. Hartung, A. Gross, E. Rahm. COnto-diff: Generation of Complex Evolution Mappings for Life Science Ontologies. Journal of Biomedical Informatics, 2012.

  • [HH00] J. Heflin, J. Hendler. Dynamic Ontologies on the Web. AAAI-00, 2000.

  • [HHM+10] H. Halpin, P.J. Hayes, J.P. McCusker, D.L. McGuiness, H.S. Thompson. When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data. ISWC-10, 2010.

  • [HHH+05] P. Haase, F. van Harmelen, Z. Huang, H. Stuckenschmidt, Y. Sure. A Framework for Handling Inconsistency in Changing Ontologies. ISWC-05, 2005.

  • [HHP+10] A. Hogan, A. Harth, A. Passant, S. Decker, A. Polleres. Weaving the Pedantic Web. LDOW-10, 2010.

  • [HP04] J. Heflin, J.Z. Pan. A Model Theoretic Semantics for Ontology Versioning. ISWC-04, 2004.

  • [HS05] Z. Huang, H. Stuckenschmidt. Reasoning with Multi-version Ontologies: A Temporal Logic Approach. ISWC-05, 2005.

  • [HWK06] C. Halaschek-Wiener, Y. Katz. Belief Base Revision for Expressive Description Logics. OWLED-06, 2006.


References 8 18
References (8/18) Anja Jentzsch. http://lod-cloud.net/

  • [ILK12] D.H. Im, S.W. Lee, H.J. Kim. A Version Management Framework for RDF Triple Stores. International Journal of Software Engineering and Knowledge Engineering, 22(1):85-106, 2012.

  • [JAP09] M. Javed, Y. Abgaz, C. Pahl. A Pattern-based Framework of Change Operators for Ontology Evolution. OTM-09, 2009.

  • [Jur74] J.M. Juran. The Quality Control Handbook. McGraw-Hill, New York, 1974.


References 9 18
References (9/18) Anja Jentzsch. http://lod-cloud.net/

  • [KFAC07] G. Konstantinidis, G. Flouris, G. Antoniou, V. Christophides. Ontology Evolution: A Framework and its Application to RDF. SWDB-ODBIS-07, 2007.

  • [KFKO02] M. Klein, D. Fensel, A. Kiryakov, D. Ognyanov. Ontology Versioning and Change Detection on the Web. EKAW-02, 2002.

  • [KHS12] M. Knuth, J. Hercher, H. Sack. Collaboratively Patching Linked Data. USEWOD-12, 2012.

  • [KLGE07] N. Keberle, Y. Litvinenko, Y. Gordeyev, V. Ermolayev. Ontology Evolution Analysis with OWL-MeT. IWOD-07, 2007.

  • [KM91] H. Katsuno, A.O. Mendelzon. On the Difference Between Updating a Knowledge Base and Revising It. KR-91, 1991.

  • [KN03] M. Klein, N. Noy. A Component-based Framework for Ontology Evolution. IJCAI-03 Workshop on Ontologies and Distributed Systems, CEUR-WS, vol. 71, 2003.

  • [KPS+06] A. Kalyanpur, B. Parsia, E. Sirin, B. Cuenca Grau. Repairing Unsatisfiable Concepts in OWL Ontologies. ESWC-06, 2006.

  • [KWW08] B. Konev, D. Walther, F. Wolter. The Logical Difference Problem for Description Logic Terminologies. IJCAR-08, 2008.

  • [KWZ08] R. Kontchakov, F. Wolter, M. Zakharyaschev. Can you Tell the Difference Between DL-Lite Ontologies? KR-08, 2008.


References 10 18
References (10/18) Anja Jentzsch. http://lod-cloud.net/

  • [LB10] J. Lehmann, L. Buhmann. ORE - A Tool for Repairing and Enriching Knowledge Bases. ISWC-10, 2010.

  • [LLMW06] H. Liu, C. Lutz, M. Milicic, F. Wolter. Updating Description Logic ABoxes. KR-06, 2006.

  • [LM04] K. Lee, T. Meyer. A Classification of Ontology Modification. AI-04, 2004.

  • [LPSV06] S.C. Lam, J. Pan, D. Sleeman, W. Vasconcelos. A Fine-grained Approach to Resolving Unsatisfiable Ontologies. WI-06, 2006.

  • [LRV09] U. Lusch, S. Rudolph,D. Vrandecic. Tempus Fugit: Towards an Ontology Update Language. ESWC-09, 2009.


References 11 18
References (11/18) Anja Jentzsch. http://lod-cloud.net/

  • [Mel04] S. Melnik. Generic Model Management: Concepts and Algorithms. Springer, 2004.

  • [MFRW00] D.L. McGuinness, R. Fikes, J. Rice, S. Wilder. An Environment for Merging and Testing Large Ontologies. KR-00, 2000.

  • [MHS09] B. Motik, I. Horrocks, U. Sattler. Bridging the Gap Between OWL and Relational Databases. Journal of Web Semantics, 7(2):74-89, 2009.

  • [MLA+12] M. Morsey, J. Lehmann, S. Auer, C. Stadler, S. Hellmann. DBpedia and the Live Extraction of Structured Data from Wikipedia. Program: Electronic library and Information Systems, 46(2):157-181, 2012.

  • [MLB05] T. Meyer, K. Lee, R. Booth. Knowledge Integration for Description Logics. AAAI-05, 2005.

  • [MLBP06] T. Meyer, K. Lee, R. Booth, J.Z. Pan. Finding Maximally Satisfiable Terminologies for the Description Logic ALC. AAAI-06, 2006.


References 12 18
References (12/18) Anja Jentzsch. http://lod-cloud.net/

  • [MMB12] P. Mendes, H. Muhleisen, C. Bizer. Sieve: Linked Data Quality Assessment and Fusion. LWDM-12, 2012.

  • [MMS+03] A. Maedche, B. Motik, L. Stojanovic, R. Studer, R. Volz. An Infrastructure for Searching, Reusing and Evolving Distributed Ontologies. WWW-03, 2003.

  • [MRF08] M. Moguillansky, N. Rotstein, M. Falappa. A Theoretical Model to Handle Ontology Debugging and Change through Argumentation. IWOD-08, 2008.

  • [MSCK05] M. Magiridou, S. Sahtouris, V. Christophides, M. Koubarakis. RUL: A Declarative Update Language for RDF. ISWC-05, 2005.

  • [MWK00] P. Mitra, G. Wiederhold, M.L. Kersten. A Graph-oriented Model for Articulation of Ontology Interdependencies. EDBT-00, 2000.


References 13 18
References (13/18) Anja Jentzsch. http://lod-cloud.net/

  • [NCLM06] N. Noy, A. Chugh, W. Liu, M. Musen. A Framework for Ontology Evolution in Collaborative Environments. ISWC-06, 2006.

  • [NKKM04] N. Noy, S. Kunnatur, M. Klein, M. Musen. Tracking Changes During Ontology Evolution. ISWC-04, 2004.

  • [NM00] N.F. Noy, M.A. Musen. Prompt: Algorithm and Tool for Automated Ontology Merging and Alignment. In AAAI/IAAI-00, 2000.

  • [OK02] D. Ognyanov, A. Kiryakov. Tracking Changes in RDF(S) Repositories. EKAW-02, 2002.


References 14 18
References (14/18) Anja Jentzsch. http://lod-cloud.net/

  • [PFF+13] V. Papavassiliou, G. Flouris, I. Fundulaki, D. Kotzinos, V. Christophides. High-Level Change Detection in RDF(S) KBs. Transactions on Database Systems (TODS), 38(1), 2013.

  • [PM10] A. Passant, P.N. Mendes. SparqlPuSH: Proactive Notification of Data Updates in RDF Stores Using PubSubHubbub. SFSW-10, 2010.

  • [PT05] P. Plessers, O. de Troyer. Ontology Change Detection Using a Version Log. ISWC-05, 2005.

  • [PT06] P. Plessers, O. de Troyer. Resolving Inconsistencies in Evolving Ontologies. ESWC-06, 2006.

  • [PTC05] P. Plessers, O. de Troyer, S. Casteleyn. Event-based Modeling of Evolution for Semantic-driven Systems. CAiSE-05, 2005.

  • [PTC07] P. Plessers, O. de Troyer, S. Casteleyn. Understanding Ontology Evolution: A Change Detection Approach. Web Semantics: Science, Services and Agents on the WWW, 2007.


References 15 18
References (15/18) Anja Jentzsch. http://lod-cloud.net/

  • [QD09] G. Qi, J. Du. Model-based Revision Operators for Terminologies in Description Logics. IJCAI-09, 2009.

  • [QHHP08] G. Qi, P. Haase, Z. Huang, J.Z. Pan. A Kernel Revision Operator for Terminologies. DL-08, 2008.

  • [QLB06a] G. Qi, W. Liu, D. Bell. Knowledge Base Revision in Description Logics. JELIA-06, 2006.

  • [QLB06b] G. Qi, W. Liu, D. Bell. A Revision-based Approach for Handling Inconsistency in Description Logics. NMR-06, 2006.

  • [QP07] G. Qi, J. Pan. A Stratification-based Approach for Inconsistency Handling in Description Logics. IWOD-07, 2007.


References 16 18
References (16/18) Anja Jentzsch. http://lod-cloud.net/

  • [RFC11] Y. Roussakis, G. Flouris, V. Christophides. Declarative Repairing Policies for Curated KBs. HDMS-11, 2011.

  • [RH09] T. Ravn, M. Hoedbolt. How to Measure and Monitor the Quality of Master Data. 2009. Available at: http://www.information-management.com/issues/2007_58/master_data_management_mdm_quality-10015358-1.html

  • [RHTA10] C. Riess, N. Heino, S. Tramp, S. Auer. EvoPat - Pattern-based Evolution and Refactoring of RDF Knowledge Bases. ISWC-10, 2010.

  • [RPH+12] A. Rula, M. Palmonari, A. Harth, S. Stadtmüller, A. Maurino. On the Diversity and Availability of Temporal Information in Linked Open Data. ISWC-12, 2012.

  • [RSDT08] T. Redmond, M. Smith, N. Drummond, T. Tudorache. Managing Change: An Ontology Version Control System. OWLED-08, 2008.

  • [RW07] M.M. Ribeiro, R. Wassermann. Base Revision in Description Logics – Preliminary Results. IWOD-07, 2007.

  • [RWFA13] M.M. Ribeiro, R. Wassermann, G. Flouris, G. Antoniou. Minimal Change: Relevance and Recovery Revisited. AI Journal (to appear), 2013.


References 17 18
References (17/18) Anja Jentzsch. http://lod-cloud.net/

  • [SC03] S. Schlobach, R. Cornet. Non-Standard Reasoning Services for the Debugging of Description Logic Terminologies. IJCAI-03, 2003.

  • [SMMS02] L. Stojanovic, A. Maedche, B. Motik, N. Stojanovic. User-driven Ontology Evolution Management EKAW-02, 2002.

  • [SK03] H. Stuckenschmidt, M. Klein. Integrity and Change in Modular Ontologies. IJCAI-03, 2003.

  • [SP10] Y. Stavrakas, G. Papastefanatos. Supporting Complex Changes in Evolving Interrelated Web Databanks. CoopIS-10, 2010.

  • [SSN+10] H. Van de Sompel, R. Sanderson, M.L. Nelson, L.L. Balakireva, H. Shankar, S. Ainsworth. An HTTP-Based Versioning Mechanism for Linked Data. LDOW-10, 2010.

  • [TSBM10] J. Tao, E. Sirin, J. Bao, D.L. McGuinness. Integrity Constraints in OWL. AAAI-10, 2010.

  • [TTA08] Y. Tzitzikas, Y. Theoharis, D. Andreou. On Storage Policies for the Semantic Web Repositories that Support Version. ESWC-08, 2008.

  • [TLZ12] Y. Tzitzikas, C. Lantzaki, D. Zeginis. Blank Node Matching and RDF/S Comparison Functions. ISWC-12, 2012.


References 18 18
References (18/18) Anja Jentzsch. http://lod-cloud.net/

  • [VWS+05] M. Volkel, W. Winkler, Y. Sure, S. Kruk, M. Synak. SemVersion: A Versioning system for RDF and Ontologies. ESWC-05, 2005.

  • [WHR+05] H. Wang, M. Horridge, A. Rector, N. Drummond, J. Seidenberg. Debugging OWL-DL Ontologies: A Heuristic Approach. ISWC-05, 2005.

  • [WWT10] Z. Wang, K. Wang, R. Topor. A New Approach to Knowledge Base Revision in DL-Lite. AAAI-10, 2010.

  • [ZAA+13] F. Zablith, G. Antoniou, M. d’Aquin, G. Flouris, H. Kondylakis, E. Motta, D. Plexousakis, M. Sabou. Ontology Evolution: A Process Centric Survey. Knowledge Engineering Review (to appear).

  • [ZTC11] D. Zeginis, Y. Tzitzikas, V. Christophides. On Computing Deltas of RDF/S Knowledge Bases. ACM Transactions on the Web (TWEB) 5(3), 2011.

  • [ZZL+03] Z. Zhang, L. Zhang, C.X. Lin, Y. Zhao, Y. Yu. Data Migration for Ontology Evolution. Poster ISWC-03, 2003.


ad