APARSEN Webinar on Interoperability and Intelligibility Nov 8, 2013

APARSEN Webinar on Interoperability and Intelligibility Nov 8, 2013 Topic: Interoperability Strategies and Automated Reasoning (Task 2520 and D25.2) YannisTzitzikas, tzitzik@ics.forth.gr FORTH (& University of Crete) Leader of APARSEN WP25 APARSEN Webinar Nov 8. 2013

Outline (duration < 20’) • Context (APARSEN, WP25) (1’) • Objectives(of Task 2520) (1’) • Main Results and Example (5’) • D25.2: Table of Contents (2’) • The system Epimenides(3’) • Concluding Remarks (3’) • Contribution to VCoE (2’) • Publications (0’)

Context: topic usability Year 3 Year 4 Year 2 Year 1 To collect and prioritize interoperability objectives, and to advance technical research D25.1 D25.2 WP25Interoperability and Intelligibility(M20,33) D27.1 WP27Scalability (M20,31) D16.1 WP16Common tools, software rep/y and market place (M8,48) D13.1 WP13Common Standards(M4-M48)

Task 2520 “Intelligibility Modelling and Reasoning”:The objectives in one slide Objectives Propose a modelling approach that enables task performabilitychecking, which in turn could greatly reduce the human effort required for checking or monitoring whether a task on an archived digital object or collection is performable, and consequently whether an interoperability objective is achievable. Such services could also assist preservation planning, especially if converters and emulatorscan be modeled and exploited by the dependency services. This is one of the few technical tasks of APARSEN

Main Results

Main results We advanced past rule-based approaches for dependency management (for digital preservation) for capturing also converters and emulators. We demonstrate how this modeling allows performing the desired reasoning and thus enables offering more advanced digital preservation services. These services could greatly reduce the human effort required for checking (or periodically monitoring) whether a task on a digital object is performable. A prototype system (Epimenides) that is based on this approach has been developed

Example (from the domain of software) Consider a user, wanting to run on his mobile, software source code written before many years. E.g. code written in Pascal programming lang. and stored in a file game.pas Questions: • What can he do? • What should we (as community) do? • Do we have to develop a Pascal compiler for Android OS? Do we have to standardize programming languages? Do we have to standardize operating systems, virtual machines, etc. Direction and Answer (according to Task 2520) • It is worth investigating if it is already possible to run it on android by “combining” existing software! • By applying a series of transformations and emulations

Cont. Suppose we have only the following: • a converterfrom Pascal to C++ (say p2c++), • a C++ compiler (gcc) for WindowsOS, • an emulator of WinOSexecutables over Android OS (say emulWin). Well, it seems that we could run game.pason his mobile phone by first converting the Pascal code to C++, then compiling the C++ code, and finally by running over the emulator the executable yielded by the compilation.

Cont. The work done in Task 2520 shows how we can model our information in a way that allows this kind of automated reasoning

A quick look at the D25.2’s Table of Contents

D25.2: Table of Contents Discusses the connection with D25.1, i.e. with what Barbara has just presented Identifies the main interoperability strategies

Cont. It explains how interoperability relates to dependency management. Lists the requirements for automated reasoning. Provides a methodology for applying the proposed approach Details the technical approach for making this feasible

Cont. Which tasks to model? Can we layer them? Discusses possible implementation technologies Shows that real world tasks, converters and emulators can be modeled.

Cont. Description of EPIMENIDES, the prototype system that we have developed that realizes the proposed approach

Cont. Lists Use Cases from DANS, and discusses applicability in general. Concluding remarks

The system Epimenides Proving the technical feasibility of the proposed approach

Epimenides • Why ? • For proving the technical feasibility, as well as for demonstration and dissemination purposes, we have build the system Epimenides • Results? • Positive from all aspects. • How? • It is based on semantic web technologies but the offrered reasoning approach is novel (i.e. it is not offered by the existing tools, only by Epimenides). • Where is it? • A deployment for demonstration is web accessible: • http://www.ics.forth.gr/isl/epimenides

The .. trailer of Epimenides

Epimenides: Use Cases For plain users For Archivists

For plain users: The user upload a file or zipped bundle of files Upload a demo zip file Upload your own digital objects

The System finds the tasks that usually make sense to apply to the uploaded digital objects Rendering for this .txt file Runnability for this .exe file Requesting performability checking

Getting the results of the Dependency Analysis(the results of the automatic reasoning) Reds: Inability to perform this task on this file Greens: Ability to perform these tasks over these objects

Ability to explore the dependencies related to one task Direct dependencies of Rendering Task

Use Case for Archivists:Aiding the Definition of new Tasks Name of the new task Define the dependencies of this task

Use Case for Archivists:Consequences of a Hypothetical Loss

Exploring the contents of its Knowledge Base Explore the contents of the underling RDF/S triple store

Epimenides: Evaluating its Usability We have decided to evaluate the usability of the Epimenides. Main questions: • Can a user can understand the main concepts of the approach by using the system? • How the system per se is usable Work done: • we created a short tutorial • we defined some scenariosthat we asked users to carry out using the system • we prepared a small questionnaire that the users had to answer after using the system.

Epimenides: Evaluating its Usability Responses (1/2)

cont

D25.2: Concluding Remarks

Concluding Remarks (from D25.2) (1/ ) • Each interoperability objective or challenge (like those described in APARSEN D25.1 Interoperability Objectives and Approaches) can be considered as a kind of demand for the performability of a particular task(or tasks). • However each task for being performed has various prerequisites (e.g. operating system, tools, software libraries, parameters, etc). We call all these dependencies. • The definition and adoption of standards (for data and services), aids interoperability because it is more probable to have (now and in the future) systems and tools that support these standards, than having systems and tools that support proprietary formats. From a dependency point of view, standardization essentially reduces the dependencies and makes them more easily resolvable; it does not vanish dependencies

(2/7) In all cases (standardization or not), we cannot achieve interoperability when the involved parties are not aware of the dependencies of the exchanged artifacts. However, the ultimate objective is the ability of perform a task, not the compliance to a standard. Even if a digital object is not compliant to a standard, there may be tools and processes that can enable the performance of a task on that object. As the scale and complexity of information assets and systems evolves towards overwhelming the capability of human archivists and curators (either system administrators, programmers and designers), it is important to aid this task, by offering services that can check whether it is feasible to perform a task over a digital object.

(3/7) For example, a series of conversions and emulations could make feasible the execution of software written in 1986 software on a 2013 platform. The process of checking whether this is feasible or not could be too complex for a human and this is where advanced and automated reasoning services, could contribute, because such services could greatly reduce the human effort required for periodically checking (monitoring) whether a task on a digital object is performable.

(4/7) • Towards this vision, D25.2 describes how we have advanced past rule-based approaches for dependency management for capturing convertersand emulators, and we have demonstrated that the proposed modeling enables the desired reasoning regarding task performability, which in turn could greatly reduce the human effort required for periodically checking or monitoring whether a task on an archived digital object is performable.

(5/7) • We have provided various examples including examples that show how real converters and emulators can be modeled. We have designed and implemented a proof of concept prototype (Epimenides) for testing whether the proposed reasoning approach behaves as expected. The results were successful, therefore the technical objectives of this task (as described in the DoW) are fully accomplished. • Although the knowledge base of the prototype system (which has been implemented using W3C semantic web technologies) currently represents only some indicative tasks, it can demonstrate the benefits of the proposed approach. In addition we used this prototype system as a means to specify a number of concrete use cases for the case of DANS.

(6/7) • We should also mention that since the implementation is based on W3C standards, it can be straightforwardly enriched with information coming from other external sources (i.e. SPARQL endpoints). In any case we should stress that the methodology presented is general and can be used for extending the modeled tasks, modules, converters and emulators, in order to capture the desired requirements.

(7/7) • For cases where the considered modules have internal and known structure, e.g. as in the case of formally expressed community knowledge (vocabularies, taxonomies, ontologies and semantically described datasets), instead of considering each such module as an atom (undivided element), the internal structure can be exploited for computing more refined gaps. If furthermore, this internal structure is represented using Semantic Web Languages (RDF/S, OWL), which currently form the lingua franca for structured content, then one can apply general purpose (application independent) RDFdiff tools (tools that compute the difference between two RDF/S knowledge bases), for computing more refined gaps. To this end, in this deliverable we have reported some recent contributions that we have made on such tools that concern the management of blank nodes.

Contributions to the VCoE

Contribution to the VCoE • The methodology for capturing, modeling, managing and exploiting the various interoperability dependencies can be considered as a significant contribution to the VCoE: • expertise in designing and realizing novel inference services for task-performability, risk-detection and for computing intelligibility gaps. • the implemented system (which is already web accessible) can be used for disseminating the results of this work, as well as for investigating and planning future operational applications of this approach, either in the context of single organizations (e.g. the DANS case), or in the context of the VCoE (e.g. as an advanced semantic registry).

Publications

Publications related to D25.2 • Y. Tzitzikas, Y. Marketakis and Y. Kargakis (2012). Conversion and Emulation-aware Dependency Reasoning for Curation Services, iPres2012 • (http://www.ics.forth.gr/~tzitzik/publications/Tzitzikas_2012_iPres_DepMgmtForCovertersEmulators.pdf) • YannisTzitzikas, Christina Lantzaki, DimitrisZeginis: Blank Node Matching and RDF/S Comparison Functions. International Semantic Web Conference 2012: 591-607 • Christina Lantzaki, YannisTzitzikas, DimitrisZeginis: Demonstrating Blank Node Matching and RDF/S Comparison Functions. International Semantic Web Conference (Posters & Demos) 2012

Thanks for your attention

EXTRA SLIDES

Is it only for software? • The proposed approach is not confined to software. Various interoperability objectives that concern documents and datasets can also be captured. • For example for the case where a user wants to render a MSOffice document on his smart phone, the reasoning approach can infer that this is possible through various ways (e.g. by running the SuiteOffice on his smart phone, or by running MicrosoftOfficeWord.exe over an emulator, or by converting the document to PDF, etc). • For the case of datasets, consider that we want to preserve datasets containing experimental results and would like to preserve their provenance. Suppose that for us provenance means ability to answer questions of the form: who derived the dataset, when this dataset was derived, how it was derived? We can model provenance as a task (that has dependencies) and we can use the dependency reasoning approach for checking for which datasets we have provenance and for which we have not. We could also exploit the reasoning services in order to discover provenance information that was not evident (e.g. result of tools that extract embedded metadata).

Example: Documents#1 (focus: render)

Example (from the domain of documents) Consider a user who has received on his smart phone the document “secret.doc”, and he wants to read it. Questions: • What can he do? Answer (according to Task 2520) • It is worth investigating if it is already possible to view it on his android by using or “combining” existing software! secret.doc

Cont. • The user can read secret.docin multiple ways: • By running the Android SuiteOffice on his smart phone • Running over an emulator of windows executable over android, the MicrosoftOfficeWord.exe • Converting the secret.doc to a pdf file, and then run the Pdf Viewer(SuiteOffice) in the smart phone The work done in Task 2520 shows how we can model our information in a way that allows inferring automatically these choices

Example: Dataset#1 (focus: provenance)

Example (from the domain of datasets) Context: • Consider that we want to preserve datasets containing experimental results. We want to preserve their provenance, and suppose that for us provenance means ability to answer questions of the form: • Who derived the dataset? • When this dataset was derived? • How it was derived ? Key points • We can model provenance as a task (that has dependencies) • We can use the dependency reasoning approach for checking for which datasets we have provenance and for which we have not. • We could also exploit the reasoning services in order to discover provenanceinformation that is not evident, e.g. provenance information stored in the embedded (in the file) metadata

Cont. Suppose we have two datasets, one stored in CVS, another as a MSExcel file. Questions: • How to define and then check if the provenance requirements are met? • How to harmonize this check? • Without having to decide a new metadata schema or to set up a new schema or system? • How to have control without having to harmonize everything? Answer (according to Task 2520) • Achieve uniformity (in checking and management) at the dependency mgmt level • Exploit automated reasoning for obtaining provenance information that is already (directly or indirectly) available or extractable • Exploit automated reasoning for checking if the provenance is complete

APARSEN Webinar on Interoperability and Intelligibility Nov 8, 2013

APARSEN Webinar on Interoperability and Intelligibility Nov 8, 2013

Presentation Transcript

Digital Design Daily Plans Nov 6-8, 2013

Drawing and Painting Daily Plans Nov 6-8, 2013

syntax 8 On-line processing DAY 37 – nov 22, 2013

Fri day , Nov. 8 , 2013

The importance of interoperability and intelligibility in digital preservation

Nov. 8-12

2013-14 Studio Art Daily Plans Nov 6-8, 2013

8 Nov

Nov. 8

APARSEN - WP2200 Identifiers and Citability Interoperability Framework for PI systems

APARSEN Metadata for preservation, curation and interoperability

8 NOV 2013 AGENDA

Fri, Nov 8, 2013

Announcements – Nov 8

Lesson 33 Nov. 4-8, 2013

APARSEN Webinar on Interoperability and Intelligibility Nov 8, 2013

APARSEN WP22 Identifiers and Citability

Nov 8, 2001

Announcements – Nov 8

Webinar 4 — May 8, 2013

Nov. 8, 2017

8 NOV 2013 AGENDA