Semantic web semantic web processes
This presentation is the property of its rightful owner.
Sponsored Links
1 / 80

Semantic Web & Semantic Web Processes PowerPoint PPT Presentation


  • 85 Views
  • Uploaded on
  • Presentation posted in: General

Semantic Web & Semantic Web Processes. A course at Universidade da Madeira, Funchal, Portugal June 16-18, 2005 Dr. Amit P. Sheth Professor, Computer Sc., Univ. of Georgia Director, LSDIS lab CTO/Co-founder, Semagix , Inc. Special Thanks: Cartic Ramakrishnan , Karthik Gomadam.

Download Presentation

Semantic Web & Semantic Web Processes

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Semantic web semantic web processes

Semantic Web & Semantic Web Processes

A course at Universidade da Madeira, Funchal, Portugal

June 16-18, 2005

Dr. Amit P. Sheth

Professor, Computer Sc., Univ. of Georgia

Director, LSDIS lab

CTO/Co-founder, Semagix, Inc

Special Thanks: Cartic Ramakrishnan, Karthik Gomadam


Agenda 1

Agenda 1

Part I

  • What is Semantic Web?

  • What makes the Semantic web

    • Ontologies – importance of relationships and knowledge

  • Representation and Languages

    • Why XML is not enough

    • Describe semantic web resources- RDF and RDFS

    • OWL

  • Query processing and storage

    Part II

  • Metadata, Enabling techniques and technologies

    • Ontology and knowledge engineering: ontology design, ontology population maintaining, ontology freshness

    • Automated metadata extraction and annotation

    • Computation and reasoning with focus on relationships

    • Example commercial Semantic Web platform


Agenda 2

Agenda 2

Part III

  • Semantic web applications: search, integration, analysis

    • Pan-Web and consumer-centric

    • Enterprise

      Part IV

  • Semantic Web Services and Processes

    • What are Web Services ?

    • What are Web processes ?

    • Creating Web processes: Annotation, discovery, composition, etc.

  • Semantic Web Service/Process tools


Part i

Part I

  • What is Semantic Web?

  • What makes the Semantic web

    • Ontologies – importance of relationships and knowledge

      • Types and examples of ontologies

    • Metadata and Semantic Annotation -- metadata classifications

  • Representation and Languages

    • Why XML is not enough

    • RDF - Describe semantic web resources and RDFS - RDF as a triple, RDF as a graph (show example RDF/S)

    • OWL

  • RDF Query processing and storage


Semantic web semantic web processes

Semantics (Ontology, Context, Relationships, KB)

Generation III

2000s

MediaAnywhere

InfoQuilt,

OBSERVER

Semagix Freedom

, Semantic Web technologies and platforms

Metadata (Domain model)

VisualHarness

InfoHarness

AdaptX/Harness

Generation II

1990s

Metadata based integration, Mediator Systems, Digital Libraries

Data (Schema, “semantic data modeling)

Generation I

1980s

Mermaid

DDTS

Intervisio

Heterogeneous databases/

Federated Databases Research

Three generation of Information Systems:

Where we have come from, where we are going


Semantic web semantic web processes

Broad Scope of Semantic (Web) Technology

Current Semantic Web Focus

Formal

Semantic Web Processes

Semi-Formal

Degree of Agreement

Agreement About

Qos

Informal

Execution

Scope of Agreement

Function

Common

Sense

Gen. Purpose,Broad Based

Domain

Industry

Data/

Info.

Task/

App

Lots of

Useful

Semantic

Technology

(interoperability,

Integration)

Other dimensions:

how agreements are reached,

Cf: Guarino, Gruber


What is the semantic web

What is the Semantic Web?

  • "The Semantic Web is an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." -- Tim Berners-Lee, James Hendler, Ora Lassila, The Semantic Web, Scientific American, May 2001

  • Ontologies

  • RDF/RDFS or OWL Syntax – machine processable

  • Semantic Metadata – annotation of web resources


An ontology is a specification of a conceptualization t gruber

“An ontology is a specification of a conceptualization” (T. Gruber)

  • A conceptualization is the way we think about a domain

  • A specification provides a formal way of writing it down

Building Ontologies from the Ground Up When users set out to model their professional activity – Mark Mussen


Conceptualization and ontology

Everything that can

be expressed in the

language

Ontology

Constraining

Possible

Interpretations

Of what can

Be expressed

Conceptualization and Ontology

http://www.w3c.it/events/minerva20040706/guarino.pdf


Central role of ontology

Central Role of Ontology

  • Ontology represents agreement, represents common terminology/nomenclature

  • Ontology is populated with extensive domain knowledge or known facts/assertions

  • Key enabler of semantic metadata extraction from all forms of content:

    • unstructured text (and 150 file formats)

    • semi-structured (HTML, XML) and

    • structured data

  • Ontology is in turn the center piece that enables

    • resolution of semantic heterogeneity

    • semantic integration

    • semantically correlating/associating objects and documents


Types of ontologies or things close to ontology

Types of Ontologies (or things close to ontology)

  • Upper ontologies: modeling of time, space, process, etc

  • Broad-based or general purpose ontology/nomenclatures: Cyc, CIRCA ontology (Applied Semantics), SWETO, WordNet ;

  • Domain-specific or Industry specific ontologies

    • News: politics, sports, business, entertainment

    • Financial Market

    • Terrorism

    • Pharma

    • GlycO, ProPreO

    • (GO (a nomenclature), UMLS inspired ontology, …), MGED

  • Application Specific and Task specific ontologies

    • Anti-money laundering

    • Equity Research

    • Repertoire Management

    • Financial irregularity

Fundamentally different approaches in developing ontologies

at the two end of the above spectrum


Building ontology

Building ontology

Three broad approaches:

  • social process/manual: many years, committees

    • Can be based on metadata standard

  • automatic taxonomy generation (statistical clustering/NLP): limitation/problems on quality, dependence on corpus, naming

  • Descriptional component (schema) designed by domain experts; Description base (assertional component, extension) using automated processes from trusted knowledge sources

    Option 2 is being investigated in several research projects;

    Option 3 is currently supported by Semagix Freedom


Semantic web semantic web processes

SUMO -- http://ontology.teknowledge.com/


Part of the cyc upper ontology

Part of the CYC Upper Ontology

http://www.cyc.com/cyc/technology/whatiscyc_dir/whatdoescycknow


Sweto semantic web testbed ontology current status

SWETO (Semantic Web Testbed Ontology) Current Status

  • Developed using Semagix technology for free non-commercial usage by the SW community; some initial users

  • V1.4 population includes over 800,000 entities and over 1,500,000 explicit relationships among them

  • Continue to populate the ontology with diverse sources thereby extending it in multiple domains, new smaller and larger release due soon; RDF and OWL versions

  • Significant information for provenance/trust support [UMBC partnership]

  • 97% of disambiguation performed automatically, 2% manually; not quite a high-quality as an evaluation testset (e.g., low connectivity)

  • Working on test harness, quality measures, and benchmarks


Expressiveness range knowledge representation and ontologies

TAMBIS

BioPAX

EcoCyc

Expressiveness Range: Knowledge Representation and Ontologies

KEGG

Thesauri

“narrower

term”

relation

Disjointness, Inverse,part of…

Frames

(properties)

Formal

is-a

CYC

Catalog/ID

DB Schema

UMLS

RDF

RDFS

DAML

Wordnet

OO

OWL

IEEE SUO

Formal

instance

General

Logical

constraints

Informal

is-a

Value Restriction

Terms/

glossary

GO

SWETO

GlycO

SimpleTaxonomies

ExpressiveOntologies

Pharma

Ontology Dimensions After McGuinness and Finin


Gene ontology go

Gene Ontology (GO)

  • Comprises three independent “ontologies”

    • molecularfunction of gene products

    • cellularcomponent of gene products

    • biological process representing the gene product’s higher order role.

  • Uses these terms as attributes of gene products in the collaborating databases (gene product associations)

  • Allows queries across databases using GO terms, providing linkage of biological information across species

http://www.geneontology.org/


Go three ontologies

GO = Three Ontologies

  • Molecular Function

    • elemental activity or task

    • example: DNA binding

  • Cellular Component

    • location or complex

    • example: cell nucleus

  • Biological Process

    • goal or objective within cell

    • example: secretion

http://www.geneontology.org/


Glyco

GlycO

  • GlycO: a domain Ontology embodying knowledge of the structure and metabolisms of glycans

    • Contains 770 classes – describe structural features of glycans

    • URL: http://lsdis.cs.uga.edu/projects/glycomics/glyco is a focused ontology for the description of glycomics

  • models the biosynthesis, metabolism, and biological relevance of complex glycans

  • models complex carbohydrates as sets of simpler structures that are connected with rich relationships


Glyco statistics ontology schema can be large and complex

GlycO statistics: Ontology schema can be large and complex

  • 770 classes

  • 142 slots

  • Instances Extracted with Semagix Freedom:

    • 69,516 genes (From PharmGKB and KEGG)

    • 92,800 proteins (from SwissProt)

    • 18,343 publications (from CarbBank and MedLine)

    • 12,308 chemical compounds (from KEGG)

    • 3,193 enzymes (from KEGG)

    • 5,872 chemical reactions (from KEGG)

    • 2210 N-glycans (from KEGG)


Glyco taxonomy

GlycO taxonomy

The first levels of the GlycO taxonomy

Most relationships and attributes in GlycO

GlycO exploits the expressiveness of OWL-DL.

Cardinality constraints, value constraints, Existential and Universal restrictions on Range and Domain of properties allow the classification of unknown entities as well as the deduction of implicit relationships.


Query and visualization

Query and visualization


A biosynthetic pathway

N-glycan_beta_GlcNAc_9

N-glycan_alpha_man_4

GNT-Vattaches GlcNAc at position 6

N-acetyl-glucosaminyl_transferase_V

UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=>

UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2

UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021

A biosynthetic pathway

GNT-Iattaches GlcNAc at position 2


The impact of glyco

The impact of GlycO

  • GlycO models classes of glycans with unprecedented accuracy

  • Implicit knowledge about glycans can be deductively derived

  • Experimental results can be validated according to the model


N glycosylation process ngp

N-GlycosylationProcess (NGP)

Cell Culture

By N-glycosylation Process, we mean the identification and quantification of glycopeptides

extract

Glycoprotein Fraction

proteolysis

Glycopeptides Fraction

1

Separation technique I

n

Glycopeptides Fraction

PNGase

n

Peptide Fraction

Separation technique II

n*m

Peptide Fraction

Mass spectrometry

ms data

ms/ms data

Data reduction

Data reduction

ms peaklist

ms/ms peaklist

binning

Peptide identification

Glycopeptide identification

and quantification

N-dimensional array

Peptide list

Data correlation

Signal integration


Propreo experimental proteomics process ontology

ProPreO - Experimental Proteomics Process Ontology

  • ProPreO models the phases of proteomics experiment using five fundamental concepts:

    • Data: (Example: a peaklist file from ms/ms raw data)

    • Data_processing_applications: (Example: MASCOT* search engine)

    • Hardware: embodies instrument types used in proteomics (Example: ABI_Voyager_DE_Pro_MALDI_TOF)

    • Parameter_list: describes the different types of parameter lists associated with experimental phases

    • Task: (Example: component separation, used in chromatography)

*http://www.matrixscience.com/


Semantic annotation of scientific data

Semantic Annotation of Scientific Data

<ms/ms_peak_list>

<parameter instrument=micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer

mode = “ms/ms”/>

<parent_ion_mass>830.9570</parent_ion_mass>

<total_abundance>194.9604</total_abundance>

<z>2</z>

<mass_spec_peak m/z = 580.2985 abundance = 0.3592/>

<mass_spec_peak m/z = 688.3214 abundance = 0.2526/>

<mass_spec_peak m/z = 779.4759 abundance = 38.4939/>

<mass_spec_peak m/z = 784.3607 abundance = 21.7736/>

<mass_spec_peak m/z = 1543.7476 abundance = 1.3822/>

<mass_spec_peak m/z = 1544.7595 abundance = 2.9977/>

<mass_spec_peak m/z = 1562.8113 abundance = 37.4790/>

<mass_spec_peak m/z = 1660.7776 abundance = 476.5043/>

<ms/ms_peak_list>

830.9570 194.9604 2

580.2985 0.3592

688.3214 0.2526

779.4759 38.4939

784.3607 21.7736

1543.7476 1.3822

1544.7595 2.9977

1562.8113 37.4790

1660.7776 476.5043

ms/ms peaklist data

Annotated ms/ms peaklist data


Semantic web semantic web processes

Semantic annotation of Scientific Data

<ms/ms_peak_list>

<parameter

instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer”

mode = “ms/ms”/>

<parent_ion_mass>830.9570</parent_ion_mass>

<total_abundance>194.9604</total_abundance>

<z>2</z>

<mass_spec_peak m/z = 580.2985 abundance = 0.3592/>

<mass_spec_peak m/z = 688.3214 abundance = 0.2526/>

<mass_spec_peak m/z = 779.4759 abundance = 38.4939/>

<mass_spec_peak m/z = 784.3607 abundance = 21.7736/>

<mass_spec_peak m/z = 1543.7476 abundance = 1.3822/>

<mass_spec_peak m/z = 1544.7595 abundance = 2.9977/>

<mass_spec_peak m/z = 1562.8113 abundance = 37.4790/>

<mass_spec_peak m/z = 1660.7776 abundance = 476.5043/>

<ms/ms_peak_list>

Annotated ms/ms peaklist data


Syntax for onologies and metadata

Syntax for Onologies and Metadata

  • Why not use XML?

  • Why use OWL?

  • Or for that matter why RDF?

  • So many questions …


From xml to owl

From XML to OWL

NO SEMANTICS

  • XML

    • surface syntax for structured documents

    • imposes no semantic constraints on the meaning of these documents.

  • XML Schema

    • is a language for restricting the structure of XML documents.

  • RDF

    • is a datamodel for objects ("resources") and relations between them,

    • provides a simple semantics for this datamodel

    • these datamodels can be represented in an XML syntax.

  • RDF Schema

    • is a vocabulary for describing properties and classes of RDF resources

    • with a semantics for generalization-hierarchies of such properties and classes.

  • OWL

    • adds more vocabulary for describing properties and classes:

      • relations between classes (e.g. disjointness),

      • cardinality (e.g. "exactly one"),

      • equality, richer typing of properties,

      • characteristics of properties (e.g. symmetry), and enumerated classes.

Expressive Power

Relationships as

first class objects–

key to Semantics

SEMANTICS

http://en.wikipedia.org/wiki/Semantic_web#Components_of_the_Semantic_Web


From an alphabet to a language

From an alphabet to a Language

  • XML

    • “XML is only the first step to ensuring that computers can communicate freely. XML is an alphabet for computers and as everyone traveling in Europe knows, knowing the alphabet doesn’t mean you can speak Italian of French.” – Business Week, March 18th 2002

    • Example cited by Nicola Guarino in http://www.w3c.it/events/minerva20040706/guarino.pdf

  • RDF/RDFS and OWL would therefore be akin to the language computers use to communicate

  • And ontologies represented in these languages would be akin to the exact interpretations of the concepts being communicated


Syntax for onologies and metadata1

Syntax for Onologies and Metadata

  • RDF

    • A simple W3C standard used to describe Web resources

    • Relationships in RDF (Properties), are binary relationships between two resources or a resource and a literal

    • Resources take on the roles of Subject and Object respectively.

    • The Subject, Predicate and Object compose an RDF statement

http://www.w3.org/RDF/


What is rdf

What is RDF?

  • Resource Description Framework

  • Proposed as the base semantic web language

  • Data model for describing properties of resources

  • Statements about properties and values of web resources

  • Machine-understandable metadata


Rdf elements

RDF Elements

  • Resource:

    • Something that can be described/referenced

    • Identified by a URI

  • Property:

    • Relationship from a resource to a value:

      • Another resource

      • An atomic value/literal

  • Statement:

    • resource -> property -> value


Rdf statement

RDF Statement


Rdf model

RDF Model

  • Formal Data Model

    • Directed labeled graph

      • Nodes: resources or literals

      • Edges: properties (relationships/attributes)

      • Labels: URIs of nodes and edges

    • Collection of triples

      • subject (resource)

      • predicate (property)

      • object (resource or literal)

  • W3C recommendation


Graph model

Graph Model


Triple model

Triple Model


Rdf syntax

RDF Syntax

  • Formal syntax

  • Encoded in XML

  • Unambiguous property names and values

  • RDF adds rules for interpretation

  • W3C recommendation


Example

Example

<sample:Athlete rdf:about="&sample;Kobe_Bryant">

<rdfs:label xml:lang="en">Kobe Bryant</rdfs:label>

<sample:plays_for rdf:resource="&sample;LA_Lakers"/>

</sample:Athlete>

<sample:Athlete rdf:about="&sample;Shaquille_ONeal">

<rdfs:label xml:lang="en">Shaquille O'Neal</rdfs:label>

<sample:plays_for rdf:resource="&sample;Miami_Heat"/>

</sample:Athlete>

<sample:Team rdf:about="&sample;LA_Lakers"

<rdfs:label xml:lang="en">LA Lakers</rdfs:label>

</sample:Team>

<sample:Team rdf:about="&sample;Miami Heat"

<rdfs:label xml:lang="en">Miami Heat</rdfs:label>

<sample:competes_with rdf:resource="&sample;LA_Lakers"/>

</sample:Team>

<sample:Coach rdf:about="&sample;sample1_Instance_8"

<rdfs:label xml:lang="en">sample1_Instance_8</rdfs:label>

<sample:coaches rdf:resource="&sample;LA_Lakers"/>

</sample:Coach>


What is rdfs

What is RDFS?

  • RDF Vocabulary Description Language

  • (RDF Schema)

  • Extension of RDF: same data model

    • graph or triples

  • A hierarchy of classes

  • A hierarchy of properties relating classes

  • W3C recommendation


Semantic web semantic web processes

RDF Schema

RDF Instances


Semantic web semantic web processes

“Abdulaziz”

“Marwan”

“Alomari”

“Al-Shehhi”

typeOf(instance)

String

purchased

Passenger

Ticket

subClassOf(isA)

fname

for

String

subPropertyOf

number

lname

forflight

String

paidby

purchased

no

creditedto

Flight

Bank

Account

String

Customer

Payment

amount

holder

float

ffid

FFlyer

fflierno

FFNo

String

CCard

Cash

Client

&r4

ffid

“XYZ123”

&r11

holder

fflierno

“M’mmed”

fname

purchased

&r2

paidby

&r3

&r1

“Atta”

creditedto

lname

paidby

purchased

fname

for

&r5

&r6

lname

fname

paidby

&r7

purchased

&r8

&r9

holder

lname


Rdfs core classes

RDFS Core Classes

  • rdfs:Class

    • Class of resources that are RDF classes

    • Instance of rdfs:Class

  • rdfs:Resource

    • All things being described

    • The class type of everything in RDF(S)

    • Instance of rdfs:Class

  • rdf:Property

    • Class of RDF properties

    • Instance of rdfs:Class

http://www.w3.org/TR/rdf-schema/


Rdfs core properties

RDFS Core Properties

  • rdfs:type

    • A resource is an instance of a class

    • Instance of rdf:Property

  • rdfs:subClassOf

    • All instances of a class are also instances of another class

    • Instance of rdf:Property

  • rdfs:subPropertyOf

    • All resources related by one property are also related by another property

    • Instance of rdf:Property


Rdf core properties

RDF Core Properties

  • rdfs:range

    • All values of a property are instances of one or more class

      • The value MUST be an instance of all range classes

    • Instance of rdf:Property

  • rdfs:domain

    • All resources with the given property are instances of one or more class

      • The resource MUST be an instance of all domain classes

    • Instance of rdf:Property


Owl w3c definition

OWL, W3C definition

  • “language for defining structured, Web-based ontology

    which enables richer integration

    and interoperability of data

    across

    application boundaries”

http://www.w3.org/2004/OWL/


Owl use cases

OWL Use Cases

  • Web portals

  • Multimedia Collections

  • Corporate web site management

  • Design documentation

  • Agents and services

  • Ubiquitous computing


Owl design goals

OWL Design Goals

  • Shared ontologies

  • Ontology evolution

  • Ontology interoperability

  • Inconsistency detection

  • Expressivity vs. scalability

  • Ease of use

  • Compatibility with other standards

  • Internationalization


What s in owl but not in rdf

What’s in OWL, but not in RDF

  • Ability to be distributed across many systems

    • By means of owl:imports (similar to ‘include’ in C/C++)

  • Scalable to Web needs (?)

  • Compatible with Web standards for:

    • accessibility, and

    • Internationalization

  • Open and extensible


Owl open and extensible

OWL open and extensible

  • RDF Schema (meta-modeling facilities, i.e. classes of classes)

  • OWL Full

  • OWL DL (Description Logics)

  • OWL Lite

    • targeting tool builders


Owl class

owl:Class

  • Sub class of Class in RDF

  • Better to forget about classes of classes

  • Top-most class: owl:Thing


Owl properties

Object

Properties

Ana  owns  Cuba

Is range a

literal / typed value ?

then ERROR

Data type

Properties

Ana  age  25

XML Schema data types supported

DB people happy

OWL Properties


Transitivity of properties

Transitivity of properties

X  p1  Y

Y  p1  Z

implies X  p1  Z

  • Transitivity existed already in RDF

    • “subClassOf”, and “subPropertyOf”

  • Example: located_in

located_in

located_in

Georgia

U.S.A.

Atlanta

located_in


Symmetric properties

Symmetric properties

X  p1  Y

implies X  p1 Y

has_border_with

Portugal

Spain

has_border_with

Spain

Portugal


Functional properties

Functional Properties

X  p1  Y

X  p1  Z

imply Z is the same as Y

(they describe the same)

  • example, p1 = has_name

Result: &r1 and &r2 represent the same entity

has_capital

&r1

Portugal

has_capital

&r2

Portugal


Inverse functional properties

Inverse Functional Properties

Y  p1  A

Z  p1  A

imply Z is the same as Y

(they describe the same)

  • example, p1 = has_email

Result: &r1 and &r2 represent the same entity

has_email

[email protected]

&r1:Tim Finin

has_email

&r2:Timothy Finin


Owl cardinality

OWL Cardinality

  • min Cardinality

  • max Cardinality

  • “Cardinality”

    • When min = max

  • has Value

    • belongs to the class if it has the value


Owl tools

OWL Tools

  • Pellet (umd.edu)

    • DLbased reasoner implemented in Java

  • Euler

    • an inference engine supporting logic based proofs. Finds out whether a given set of facts support a given conclusion

  • FaCT (Ian Horrocks)

    • DL classifier that can also be used for modal logic satisfiability testing


Rdf storages

Jena

Sesame

Redland

Triple

3store

RDFSuite

RDFStore

Kowari

Yars

Brahms

developed at LSDIS

Variety of available storages

Different APIs and languages

Support from RDF to OWL-full

even reasoning

Storage and query approach: graph Vs. triple-centric

RDF Storages

http://www.w3.org/2001/sw/Europe/reports/rdf_scalable_storage_report/


Semantic web semantic web processes

Jena

  • Implemented in Java by HP Laboratories

  • Support for RDF, RDFS and OWL

  • Reasoning / inference engine

  • Support for reified statements

  • In-memory and persistent storage

    (Oracle, MySQL, PostgreSQL)

  • Query language: RDQL, SPARQL

  • Read/write RDF in RDF/XML, N3 and N-Triples format

  • Triple-centric organization and API


Jena graph abstraction

Graph interface is separated from (persistent) triple storage layer

Special support for different types of graphs - optimized for performance

Support operations like add, delete, find.

Jena – graph abstraction

“Efficient RDF Storage and Retrieval in Jena2”

Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds


Jena query processing

Jena – query processing

  • Converting multiple patterns in query into one query to DB

  • Use DB query optimizer instead of executing multiple queries from Jena level

  • Cluster properties that are likely to be accessed together - optimize for common patterns

  • Associate a table with pattern (best) or span pattern between tables (requires join operation)

  • Query may span between different graphs, but it can be optimized only if they are in the same database


Redland rasqual raptor

Redland, Rasqual, Raptor

  • Storage for RDF triples - do not implement any language by itself

  • This is the main module to include in RDF manipulation system

  • Implemented in pure C for portability

  • Rich API enables to build modules on top of it

  • Rasqual - RDF query module

    • RDQL

    • SPARQL

  • Raptor - a fast RDF parser


Redland

Redland

  • API available in different languages

    • C, C#, Java, Perl, Python, PHP, Ruby, Tcl

  • API for manipulating

    • triples, URI/literals, graphs

  • Portable - can built in most OSes

  • Scalable to handle millions of triples

    • while using of persistent storage

    • but indexing is very space-consuming

  • Support for context and hierarchy of models


Redland model

Abstraction of model to support different storages

In-memory and persistent models

BerkeleyDB, 3store, MySQL

Rich, triple-centric API

Redland - model

„The Design and Implementation of the Redland RDF Application Framework” - David Beckett


Sesame

Sesame

  • Implemented in Java

  • Database independent

    • idea of SAIL (Storage Abstraction Interface Layer)

  • Scalable architecture

  • Implementation of remote models

    • can query different models over network

  • Graph-centric approach

  • Language: RQL


Sesame architecture

Sesame - architecture

  • RAL - Repository Abstraction Layer

    • makes Sesame storage independent

    • API supportes RDF Schema semantics (e.g. subsumption reasoning)

    • can be stacked one on another

    • interface oriented for persistance storage (DBMS, Object-Relational DB)

    • data returned as streams

    • can even use net-based RDF services (!)

  • Due to poor performance, implemented cache as one of RALs

    • cache mainly for RDFS, as it needs code support in reasoning (subClassOf, ...)

“Sesame: An Architecture for Storing and Querying

RDF Data and Schema Information” - Jeen Broekstra, Arjohn Kampman, Frank van Harmelen


Sesame query module

Sesame – query module

  • Query module

    • query plan and optimizer similar to already known DB solutions

    • query is translated to a set of simple RAL calls

    • each leaf of the query plan can ‘evaluate itself’ and pull data from RAL

    • data are returned as streams

    • lack of optimization on storage level


Brahms

Brahms

  • Implemented in C++(bindings for Java also available)

  • Graph-centric approach

  • Designed to support large in-memory RDF graphs

  • Optimized for speed and memory usage

    • other storages do not offer optimized in-memory implementation for large graphs

    • only main memory offers fastest access - usage of persistent storage decreases performance

  • In-memory storage with fast precomputed graph snapshot loading

    • minimize cold-start time


Brahms1

Brahms

  • Framework for fast discovery of long association paths in large RDF bases

    • memory and CPU intense algorithms

  • Rich API, but no query language supported

    • higher level query languages do not support variable length association path queries

    • association path discovery algorithms operate on low-level graph API

  • Overperformed Jena, Sesame and Redland during tests for association discovery

    • also was able to work efficiently on much larger in-memory graphs than other storages did not handle

“BRAHMS: A WorkBench RDF Store And HighPerformance Memory System for Semantic AssociationDiscovery” (Technical report) - Maciej Janik, Krys Kochut


Why rdf languages

Why RDF languages?

  • Find resources based on predicates, values, labels or associations

  • SQL is not good for querying RDF data

    • different models: relational and graph

  • XML query languages cannot deal with graph data

  • Syntactic approach is not enough

  • Required semantic querying

  • Inferencing is desirable


Available query languages

RQL

RDQL

SeRQL

Triple

SPARQL – (latest)

SquishQL

Versa

N3

RxPath

RDFQL

Majority of languages have roots in SQL

No single standard as SQL

Some languages are tightly coupled with specific storages

Available query languages


Semantic web semantic web processes

RQL

  • Based on OQL

  • Utilizes functional approach with support for generalized path expressions

    • both nodes and edges can become variables

  • Not completely compatible with RDF specification

    • has some additional restrictions

  • Return bindings to variables (no closure)

  • Implemented in RDFSuite and partially in Sesame

    select Res from {Res} ns:label {x} where x=“foo” using namespace ns=…

“RQL: A Declarative Query Language for RDF” - Greg Karvounarakis, Sofia Alexaki, Vassilis Christophides, Dimitris Plexousakis, Michel Scholl


Semantic web semantic web processes

RDQL

  • SQL-like syntax

    • easy to adopt for DB users

  • Can specify patterns of triples to select

  • Schema is not interpreted

  • Not closed under queries

    • output as bindings to selected variables

  • Implemented in Jena

    select ?p, ?q where (?p <rdfs:label> “foo”) (?p <rdf:type> ?q)

“RDQL - A Query Language for RDF” (W3C Member Submission) - Andy Seaborne (HP Labs Bristol)


Serql

SeRQL

  • Sesame RDF Query Language

  • Based on RQL and RDQL

  • Support for generalized path expressions and optional matching

  • Query filters

    • select-from-where – return variable bindings and is not closed

    • construct-from-where – return matching subgraph that can be queried (closure)

“SeRQL: Sesame RDF query language” - Jeen Broekstra


Triple

Triple

  • Derived from F-logic

    • should be easy to adopt for logic programmers

  • Triples are logic expressions

    • S [ P  O ]

  • Queries and triples have the same logic representation

  • Reasoning is a part of language

  • Does not fulfill closure property

  • Implemented in Triple system

    FORALL X <- ( X[rdfs:label -> “foo”] )@default:ln.

“TRIPLE-A Query, Inference, and Transformation Language for the Semantic Web” - Michael Sintek, Stefan Decker


Sparql

SPARQL

  • W3C effort to standarize query language

    • best experience and requirements from different languages (like RQL, RDQL)

  • Based on matching graph patterns

    • triples, paths, subgraphs

    • optional blocks and matching

    • matching alternatives (union) and disjunction

  • Many additional operators

    • grouping, sorting, limit results

      PREFIX foaf: <http://xmlns.com/foaf/0.1/>

      SELECT ?name ?mbox

      WHERE { ?x foaf:name ?name . OPTIONAL { ?x foaf:mbox ?mbox } }

“SPARQL Query Language for RDF” (W3C Working Draft) - Eric Prud'hommeaux , Andy Seaborne


Sample path query

Sample path query

“A Comparison of RDF Query Languages” - Peter Haase, Jeen Broekstra, Andreas Eberhart, Raphael Volz


Expressive power of rdf languages

Expressive power of RDF languages

“Ontology Storage and Querying” (Technical Rreport No 308) - Aimilia Magkanaraki et al.


  • Login