slide1
Download
Skip this Video
Download Presentation
Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Loading in 2 Seconds...

play fullscreen
1 / 35

Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev - PowerPoint PPT Presentation


  • 72 Views
  • Uploaded on

Unifying mediation of knowledge, data and services in a subject domain for problem solving over heterogeneous information resources. Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev Institute of Informatics Problems, Russian Academy of Science.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev' - ursula-heath


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Unifying mediation of knowledge, data and services in a subject domain for problem solving over heterogeneous information resources

Leonid Kalinichenko, Sergey Stupnikov, Victor Zakharov, Vladimir Budzko, Vadim Korolev

Institute of Informatics Problems, Russian Academy of Science

Declaration of Intent Draft by IPI RAN

SkTech.RC/IT/Madnick

outline
Outline
  • State of the art in subject mediation reached at IPI RAN
  • Directions of research and development suggested for use in the proposal SkTech.RC/IT/Madnick
    • Investigation of application driven approach for scientific problem solving in the subject mediator environment
    • Heterogeneous multidialect mediator infrastructure for data, knowledge and services semantic integration
    • Mediation of data bases with nontraditional data models
    • Storage of very large volumes of data [Zakharov]
    • Cyber security issues [Budzko, Korolev]
  • Self-certification
  • Coverage by the DoI of a content of the three themes (Scientific Dataspace, Data Quality and Big Data) declared by Prof Stuart Madnick
basic principles
Basic principles
  • Subject mediation technology is aimed to fill the widening gap between the users (applications) and heterogeneous distributed information resources
  • independence of definition of problem domain (the mediator definition) of the existing information resources
  • definition of a mediator as a result of consolidated efforts of the respective scientific community
  • independence of user interfaces of the multiple information resources involved
  • information about new resources can be published at any time independently of mediators acting at that time
  • GLAV-based setting for relevant information resources integration at the mediator
  • integrated access to the information resources in process of problem solving
  • recursive structure of a mediators
canonical information model synthesis

refines

R1

E1

Kernel

E2

refines

R2

E3

refines

R3

Canonical Model

Resource information models

Canonical information model synthesis
resources identification and integration
Resources identification and integration
  • Identification relevant resources
    • metadata model (capabilities)
    • ontological model (concepts and their relationships)
    • canonical model (structure and behavior)
  • Integration of relevant resources in a mediator (registration)
    • GLAV = Local As View (LAV) + Global As View (GAV)
    • GAV: provide for reconciliation of various conflicts between resource and mediator specifications
    • LAV: resource schemas are registered in mediator as materialized views over virtual classes of a mediator
    • stability of application problem specification during any modifications of resources is provided
    • scalability of mediators w.r.t. the number of resources is provided
subject mediation results obtained at ipi ran i
Subject mediation: results obtained at IPI RAN (I)
  • A prototype of the subject mediation infrastructure used for problem solving over multiple distributed information resources (specifically, in the astronomy problem domain) [slide 8]
  • Methods and tools for mapping and transformation of information models of heterogeneous resources intended for their unification in mediation middleware
    • The Model Unifier prototype tool aimed at partial automation of heterogeneous information models unification has been implemented
    • First version is based on term-rewriting technology
    • The second version as an Eclipse platform application based on model transformation languages is under implementation [slide 9]
  • Methods for information resources semantic interoperability support in a context of application problem domain
    • Tools for identification of resources relevant to a problem on the basis of ontological descriptions of problem domain
    • Tools for registration of the relevant resources in the mediator
subject mediation results obtained at ipi ran ii
Subject mediation: results obtained at IPI RAN (II)
  • Methods and tools for rewriting of non-recursive mediator programs into resource partial programs oriented on object schemas of resources and mediators and typed GLAV-views
  • A method for optimizing planning of resource partial programs execution over distributed environment
    • takes into account capabilities of the resources
    • assigns places of operation’s execution on the basis of estimative samples
  • Methods for dispersed organization of problem solving in the mediation environment
    • An implementation of a problem in mediation environment may be dispersed among programming systems, mediators, GLAV-views, wrappers and resources
    • Methods and tools for representation, manipulation and estimation of efficiency of dispersed organization
    • Algorithms for construction of efficient dispersed organization
  • An original approach for binding of programming languages with declarative mediator rule language
    • The approach combines static and dynamic binding overcoming impedance mismatch and allowing dynamic result types
directions of research and development
Directions of research and development

Application-driven approach for scientific problem solving

application driven approach for scientific problem solving
Application-driven approach for scientific problem solving
  • Approaches to the integrated representation of multiple information resources for problem solving:
    • Resource-driven: an integrated representation of multiple resources is created independently of the problem
    • Application-driven: a description of a problem class subject domain is created, into which the relevant to the problem resources are mapped
  • Application-driven approach assumes creation of a subject mediator that supports an interaction between a user and resources
experience of applying the application driven approach
Experience of applying the application driven approach
  • The problem of secondary standards search for photometric calibration of optical components of gamma-ray bursts formulated by the Institute of Space Research of RAS
  • The problem was formalized and implemented applying the subject mediation:
    • A glossary of the problem domain was manually extracted from the textual specification
    • An ontology required for problem solving was constructed
    • Data structures, methods and functions constituting problem domain schema were defined
    • Resources relevant to the problem were identified in the Astrogrid and VizieR information grids
      • SDSS, USNO B-1, 2MASS, GSC, UCAC, VSX, ASAS, GCVS, NSVS
    • Resources were registered in the mediator and corresponding GLAV-views were obtained
    • The problem was formulated as a program consisted of a set of declarative rules over the mediator schema
    • The implemented mediator is used for an application monitoring in real time the e-mails informing about the gamma-ray bursts. The application extracts standards located in the area of a burst and e-mails them to subscribers.
issues requiring further investigations
Issues requiring further investigations
  • Semantic identification of resources relevant to a mediator
  • Construction of semantic source to target schema mapping in the presence of constraints reflecting specificity of various data models
  • Development of mediator program rewriting algorithms in presence of source and mediator constraints over the classes of objects
directions of research and development1
Directions of research and development

Heterogeneous multidialect mediator infrastructure for data, knowledge and services semantic integration

an approach for the infrastructure
An approach for the infrastructure
  • Recently W3C adopted Rule Interchange Format (RIF) standard oriented on interoperability of declarative programs
  • Objective
    • integration of
      • multilanguage knowledge representations and rule-based declarative programs,
      • heterogeneous databases and services
    • built on the basis of unified languages and multidialect mediation infrastructure
  • Idea
    • Combining RIF standard paradigm and
    • GLAV approach built on the extensible canonical information model
modular mediator infrastructure
Modular mediator infrastructure
  • The multidialectal construction of the canonical model
    • Mediators are represented as a functional composition of declarative specification of modules
    • Each module is based on its own dialect with an appropriate semantics
  • Mediator modules as peers:
    • Rule-based modules become the mediator components alongside with the GLAV-based modules
    • Interoperability of the modules is based on P2P and W3C RIF techniques.
  • Combination of integration and interoperability
    • The information resource integration can be provided in the scope of an individual mediator module
    • The integration approaches in different modules can be different.
  • Rule-based specifications on different levels of the infrastructure
    • Declarative programming over the mediators
    • Various modules of a mediator
    • Schema mapping for semantic integration of the information resources in the mediator
    • etc
example of a problem solving in the multidialect mediation infrastructure
Example of a problem solving in the multidialect mediation infrastructure
  • A problem of finding an optimal assignment of applicants among universities
    • A set of n applicants is to be assigned among m universities, where qi is the quota of the i-th college
    • Applicants (universities) rank the universities (the applicants) in the order of their preference
    • The aim is to find optimal assignment from the quotas of the colleges and the two sets of orderings
    • An assignment is unstable if there are two applicants α and β who are assigned to colleges A and B, respectively, although β prefers A to B and A prefers β to α, otherwise an assignment is stable
    • A stable assignment is called optimal if every applicant is at least as well off under it as under any other stable assignment
  • Program calculating assignment is defined in DLV (ASP)
  • The required information resources are integrated in a subject mediator
  • OntoBroker communicates with the users and applying its ontologies, formulates the queries to the mediator and after collecting the required data, initiates a program in DLV
optimal assignment problem infrastructure
Optimal assignment problem infrastructure

Requests

1. OB2DLV: GetProgram(Loc, Name [Params])

2. OB2SYNTH: GetSchema(Loc, Name [Params])

3. OB2SYNTH: SendExec(Loc,Name,Prog [Pars])

4. OB2DLV: SendExec(Loc, Name, Prog [Pars])

OntoBroker

Ontologies

BLD → OB

OB → BLD

Responses

1. DLV2OB: DLV Program (without IDB)

2. SYNTH2OB: Synthesis Schema

3. SYNTH2OB: Result of OB program execution.

4. DLV2OB: Result of DLV program execution.

Multi-Layered Broker

Req. 2, 3

Resp. 1, 4

Resp. 2, 3

Req. 1, 4

RIF-BLD (via XML)

BLD → Synthesis

Synthesis → BLD

BLD → DLV

DLV → BLD

Synthesis Mediation Environment

DLV (ASP facilities)

Resources

issues to be investigated and prototyped
Issues to be investigated and prototyped
  • Approaches for constructing of the rule-based dialect mappings
  • Methods for justification of semantic preservation by the mappings
  • Approaches for modular representation of knowledge in the multidialect mediation environment
  • Approaches for providing of interoperability of the mediator multidialect modules
  • Infrastructure design and prototyping
  • Real problems solving in a scientific subject domains chosen
  • Expansion of the experience into the Semantic Web area
directions of research and development2
Directions of research and development

Mediation of data bases with nontraditional data models

non traditional data models
Non-traditional data models
  • NoSQL data models oriented on the support of extra large volumes of data applying a “key-value” technology for vertical storage
    • Dynamo, BigTable, HBase, Cassandra, MongoDB, CouchDB.
  • Graph data models
    • Neo4j, InfiniteGraph, DEX, InfoGrid, HyperGraphDB, Trinity, supporting flexible data structures.
  • Triple-based data model (expressible in RDF, RDFS)
    • Virtuoso, OWLIM, 5Store, Bigdata.
  • OWL QL profile oriented on a support of ontological modeling over relational databases and expressed by data dependencies used together with Datalog
  • “Scientific” data models
    • SciDB applying a multidimensional array data model
  • Prof. Pentland Connection science-oriented data models
  • Most of these data models the standards still do not exist
  • Most of these data models and systems are oriented on “big data” support applying massive parallel technique of the MapReduce kind
the results of research planned to obtain
The results of research planned to obtain
  • Information preserving methods of mapping and transformations of various classes of non-traditional data models into the canonical one
  • Mappings and transformations for specific data models and of adequate extensions of the canonical data model
  • Techniques for interpretation of canonical model DML in the DMLs of different classes of non-traditional data models and approaches for their implementation
  • Architectural decisions on implementation of the massive parallel techniques on the level of mediators, evaluation of performance growth that can be reached
  • Evaluation of suitability and efficiency of integration of non-traditional data models of different classes in the GLAV mediation infrastructure for various problem domains
directions of research and development3
Directions of research and development

Storage of very large volumes of data [Zakharov]

storage of very large volumes of data zakharov
Storage of very large volumes of data [Zakharov]
  • The objective is to develop a novel distributed parallel fault-tolerant file system possessing the following capabilities:
    • storage of data volumes of petabyte scale
    • unlimited period of storage
    • scalability
    • efficient multiuser access support in different kinds of networks
    • usage of different storage types (e.g., HDD and flash memory)
  • The experience of existing file systems vendors should be taken onto account:
    • ReFS (Windows Server 8) by Microsoft
    • VMFS by VMware
    • Lustre
    • ZFS by Sun Microsystems
    • zFS (z/OS) by IBM
    • OneFS by Isilon
directions of research and development4
Directions of research and development

Cyber security issues [Budzko, Korolev]

cyber security issues budzko korolev
Cyber security issues [Budzko, Korolev]
  • Information integrity and availability support for large-scale data gathering & mining
  • Technical architectures security analysis (network protocols, architectures, operating systems, DBMSs, etc.)
  • Vulnerability analysis
  • Development of threat models
  • Protection from insiders in personal information data centers
self assessment i
Self-assessment (I)
  • Relevance
    • Semantic integration of resources in the context of an application
    • Mediation of knowledge
    • Mediation of non-traditional databases
    • Semantic Web and Big Data orientation.
  • Novelty
    • An intellectual executable level for declarative conceptual level specification of the problems in terms of the application domain for problem solving over diverse resources
    • Methods for information preserving data model mappings and for their implementation
    • Schema mapping and query rewriting methods in presence of constraints reflecting specificity of diverse data models, etc.
  • Breadth of scope
    • Relevant to a broad area of application domains, technologies and research issues.
self assessment ii
Self-assessment (II)
  • Challengability
    • Hard theoretical and implementation problems need to be overcome
  • Entrepreneurship possibilities
    • Areas of possible application are very diverse
    • To reach a proper commercialization level serious investments are required
  • Educational potential
    • Very broad, various courses can be proposed for master students
    • Many challenging research topics for PhD research
scientific dataspace
Scientific DataSpace
  • Large-scale federated data architecture
  • Semantic integration of heterogeneous information
  • Context mediation
  • Semantic web
  • Architecture for semantic mediation and integration of heterogeneous resources
  • Infrastructures: semantic layer for grids and clouds, P2P heterogeneous knowledge-based mediator infrastructures
  • Data model transformation, data model unification, declarative canonical model extension and synthesis
  • Justification of correctness of data model transformation, sets of dependencies (constraints) extending canonical model core should be decidable and tractable
  • Information resources: semantic description, canonical modeling, wrappers, registries, metadata
  • Problem domains: conceptual description, ontologies, metadata, multidomains, context mediation
  • Semantic based information resource discovery
  • Semantic schema mapping for data exchange and integration
data quality
Data Quality
  • Recognizing and resolving heterogeneous data semantics
  • Effective integration of data from multiple and disparate data sources
  • Semantic schema mapping
  • Justification of correctness of data model (schemas and operations) transformation
  • Dispersed implementation of problems in subject mediation environment
big data
Big Data
  • Data extraction and gathering from the web
  • Federated data systems
  • Parallel infrastructures for high-performance big data manipulation and analysis
  • Large-scale and novel “big data” applications
  • Novel approaches to development of large-scale data warehouses
  • Mediation infrastructure including Grids and clouds
  • Non-traditional data models integration in the canonical data model
  • Parallel infrastructures at the mediation level
  • Distributed parallel fault-tolerant file system
international cyber security
International Cyber Security
  • Secure information architectures
  • Techniques for assessment of threats and vulnerabilities
  • Cyber security issues
ad