Sparqler extended sparql for semantic association discovery
Download
1 / 34

- PowerPoint PPT Presentation


  • 216 Views
  • Uploaded on

SPARQLeR: Extended Sparql for Semantic Association Discovery. Krzysztof Kochut and Maciej Janik. ESWC 2007, Innsbruck, Austria June 4 , 2007. Work supported by the National Science Foundation Grant No. IIS-0325464, entitled “SemDIS: Discovering Complex Relationships in the Semantic Web”.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - nay


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Sparqler extended sparql for semantic association discovery l.jpg

SPARQLeR: Extended Sparql for Semantic Association Discovery

Krzysztof Kochut and Maciej Janik

ESWC 2007, Innsbruck, Austria

June 4, 2007

Work supported by the National Science Foundation Grant

No. IIS-0325464, entitled “SemDIS: Discovering Complex Relationships in the Semantic Web”.


Paths in rdf l.jpg
Paths in RDF

child

child

older

works_for

child

Directed path

child

child

Undirected path,

but with specific properties and

directionality

Undirected path


Why are paths interesting l.jpg
Why are paths interesting ?

  • A path describes how entities are related.

    • Relationships on the path define meaning of this connection.

    • Entities on the path specify the content.

  • Do you have migraine? Try taking magnesium!

    • Path discovered by Dr. D.R.Swanson from partial information available in PubMed publications

      • stress can lead to loss of magnesium in the human body

      • migraine patients seem to be experiencing stress

        … that’s why …

      • migraine could lead to a loss of magnesium, so … take magnesium to fight migraine!

Swanson, R.D. Migraine and Magnesium: Eleven Neglected Connections.

Perspectives in Biology and Medicine, 31 (4). 526-557.


Formally what is a simple path l.jpg
Formally, what is a simple path ?

  • Simple directed path between resources r0 and rn in a description base R:

    • sequence r0 p1 r1 p2 r2 , … , pn-1 rn-1 pn rn (n>0)

    • r0 p1 r1, r1 p2 r2 , … , rn-2 pn-1 rn-1, rn-1 pn rn (n>0) are triples in R.

    • all of the resources ri (0 ≤i ≤ n) in the path are distinct

  • Simple undirected path between resources r0 and rn in R:

    • sequence r0 p1 r1 p2 r2 , … , pn-1 rn-1 pn rn (n>0)

    • for each ri-1 pi ri (0 < i ≤ n) in the path, either ri-1 pi ri or ri pi ri-1 is a triple in R

    • all of the resources ri (0 ≤i ≤ n) in the path are distinct


Paths and sparql l.jpg
Paths and SPARQL

  • SPARQL query can express only static graph patterns.

    • Some flexibility is introduced by an OPTIONAL part, but it does not solve path problems.

  • No support for flexible length path expressions.

    • Glycan biosynthesis pathway in biology has a specific pattern (properties), but its length may be unknown.

    • Path discovery may be of unknown length and pattern, like in Dr. Swanson’s example.


What we need to discover paths l.jpg
What we need to discover paths?

  • Knowledge discovery needs more flexible patterns.

    • Patterns may be partially known or even unknown (unrestricted path).

    • Properties on the path, their order and directionality create a specific meaning.

    • Entities on the path provide content.

    • Relationships to entities outside of the path give an additional context.


Proposed e xtension s l.jpg
Proposed extensions

  • A path may have a flexible length

    • For computational reasons, length is limited.

  • Constraints on properties

    • Specific properties must appear in the path.

    • Their order and directionality is meaningful.

    • They can form a repeating pattern.

  • Constraints on resources

    • Specific resources must be on the path.

    • They can be anywhere on the path or at specific positions.


Sparqler l.jpg
SPARQLeR

  • Extension of SPARQL for semantic association discovery.

  • Seamlessly integrated into the SPARQL syntax.

  • Graph patterns incorporating simple paths with constraints.

  • Constraints are based on regular expressions over properties.


What is a path in sparqler l.jpg
What is a path in SPARQLeR ?

  • Path is a meta-property that connects two resources.

    • Defined as a sequence of interleaving properties and resources.

    • Starts and ends with properties (endpoint resources are not included).

    • A path of length 1 is a sequence with just one property.

      <rdf:Class rdf:about="http://meta.org/rdf-meta-schema#Path">

      <rdfs:isDefinedBy rdf:resource="http://meta.org/rdf-meta-schema#"/>

      <rdfs:subClassOf rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/>

      <rdfs:subClassOf rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Seq"/>

      <rdfs:label>Path</rdfs:label>

      <rdfs:comment>The class of RDFMS paths.</rdfs:comment>

      </rdf:Class>


Path patterns in sparqler l.jpg
Path patterns in SPARQLeR

  • Meta-property – similar concept to a property

    • Resource –[property] Resource

    • Resource –[path] Resource

  • Path as a Sequence

    • Test if a resource is in the path:

      • rdfs:member

    • Test if a resource is at a specific position in the path:

      • rdf:_2, rdf:_4, ...

  • SPARQLeR-specific path properties

    • Test all resources or all properties in the path:

      • rdfms:entityResource and rdfms:propertyResource

        Example: all resources on a path must be of type foo:Person


Path pattern anatomy l.jpg

Path patterns

(match of path variable)

p1

p2

p1

p2

rdfms:entityResource

rdfs:member

rdf:_6

length: 4

elements: 7

1

3

5

7

2

4

6

rdf:_3

p3

p2

rdfs:member

p1

rdfms:propertyResource

Path pattern anatomy

p1

p1

p2

p3


Path types in sparqler l.jpg
Path types in SPARQLeR

  • Directionality of relationships in the path defines its specific semantics.

  • SPARQLeR allows definition of the following path types

    • As defined in graph theory

      • Directed

      • Undirected

    • SPARQLeR specific extension

      • Defined directionality path (includes directed path)


Directionality of properties in path l.jpg
Directionality of properties in path

  • Defined directionality paths:

    • Neither directed nor undirected

    • Each property in a path has a specified directionality

  • Example: simple graph with p relationship

    (a) X p* Y, directed path

    (b) X p* Y, undirected path

    (c) X ( pp-1 )* Y, directional path

(b)

(c)

(a)

p

p

p

p

X

Y

p

p

p

p


Inverse property operator l.jpg
Inverse property operator

  • In standard SPARQL there is no need for inverse property operator

    • Pattern syntax is based on individual statements, so it is easy to reverse direction.

  • Defining path constraints requires the inverse operator

    • A pPath expression defines constraints on properties, not on individual statements.

    • Without the inverse property operator some paths constraints would be impossible to express (as shown in the previous example).


Regexp in path constraints l.jpg
RegExp in path constraints

  • Path constraints on properties are based on regular expressions

    • Uses syntax similar to lex

    • Easy for grep users

  • Examples:

    a c* d a+ (b|c) a

    [abc] c? d ( b a-1 )+ c


Path constraints in sparqler l.jpg
Path constraints in SPARQLeR

  • Defined as regular path expressions

    • Can specify patterns of properties in the path

    • Directionality requirement needs the inverse operator  (‘-’ minus) –p

  • Supported regular expressions

    p (single property)

    -p (the inverse of p)

    [p1 p2 ... pn] (class of properties)

    -[p1 p2 ... pn] (class of inverse properties)

    [^p1 p2 .. pn] (complement of properties)

    -[^p1 p2 .. pn] (inverse of complement of properties)

. (wildcard)

x | y (alternative)

xy (concatenation)

x* (Kleene star);

x+ (one or more repetition)

(x) (match a path matched by x)


Path constraints cont d l.jpg
Path constraints (cont’d)

  • Class of properties and inverse operator

    • Complement operator can be applied only to defined properties, not their inverses

    • Inverse operator

      • Not allowed inside class of properties

      • Inverses set created from defined properties

    • Example:

      properties: q r s t

      [^rt]  q s

      –[^qr]  t-1 s-1 (inverses)

      ([^st] | –[^t])  q r q-1 r-1 s-1


Integratin g path s into sparql l.jpg
Integrating paths into SPARQL

  • Path variable binds a path

    • Name begins with ‘%’ instead of ‘?’

  • Simple patterns – path between two resources

    SELECT ?prop WHERE {<r> ?prop <s>}

    SELECT %path WHERE {<r> %path <s>}

  • Single source path

    SELECT %path, ?res WHERE {<r> %path ?res}


Integratin g path s into sparql19 l.jpg
Integrating paths into SPARQL

  • Resources on the path

    SELECT %path WHERE{<r> %path <s> . %path rdfs:member <e>}

    SELECT %path WHERE{<r> %path <s> . %path rdf:_1 <p>}

  • Listing path elements – listoperator

    SELECT list(%path) WHERE {<r> %path <s>}


Expressing path constraints l.jpg
Expressing path constraints

  • Bounded path length

    • only constants allowed

      FILTER(length(%path)<5)

      FILTER(length(%path)>3 && length(%path)<7)


Expressing path constraints21 l.jpg
Expressing path constraints

  • Constraints added as a regular expression filter (existing syntax in SPARQL)

    regex( pathvariable, pathexpr, pathflags )

    FILTER(regex(%path,”.*foo:prop.*”,”uis”))

    • Flags: i(instances) s (schema) l (literals) h (match using hierarchy)d (set directionality) u (undirected)

    • Default flags: d i


Some examples l.jpg
Some examples

SELECT list(%path), ?res WHERE

{<r> %path ?res .

%path rdfs:member ?x .

?x foo:locatedInwiki:Europe

FILTER(regex(%path,”foo:prop+”)}

SELECT list(%path) WHERE

{<r> %path <s> .

%path rdfms:entityResource ?x .

?x rdf:type foo:Person

FILTER(regex(%path,”(foo:prop|foo:rel)+”,”u”)}

SELECT list(%path) WHERE

{<r> %path <s>

FILTER(length(%path)<=6 && length(%path)>=4 &&

regex(%path,”(foo:prop -foo:rel)+”)}


Sparqler prototype implementation l.jpg
SPARQLeR Prototype Implementation

  • Prototype implementation is based on BRAHMS – RDF/S main memory storage.

  • Path search based on a bi-directional BFS for simple paths.

  • Checking of path constraints in regex is implemented as a simulation of DFAs.

Janik, M. and Kochut, K., BRAHMS: A WorkBench RDF Store And High Performance

Memory System for Semantic Association Discovery. ISWC 2005


Implementation details l.jpg
Implementation details

  • Each path expression (FILTER regex) is translated into a DFA.

    • For path between two resources, partial constraints are checked while building the search trie from both endpoints – forward and reverse DFAs

    • When a path is connected,the forward DFA used to check the full (path) constraint.


Experiments biology pathway l.jpg
Experiments: biology pathway

  • Biosynthesis paths in biology (glycomics)

  • How specific glyco peptide is created from a basic structure?

    • Find pathway between dolichol phosphate and glyco peptide G00009

      • Path has 15 reactions (30 hops, as each reaction is represented by its substrates and products)

      • Only undirected path connects the endpoint resources, but a specific directionality pattern is present

        RDF representation: sample reactions in the path


Experiments biology pathway26 l.jpg
Experiments : biology pathway

  • Functionality test - proof of concept

    N-glycan biosynthesis pathway

SELECT list(%path) WHERE {

glyco:dolichol_phosphate %path glyco:glyco_peptide_G00009 .

%path rdfs:member enzyo:R05969

FILTER ( length(%path) <= 30 &&

regex(%path, "((-glyco:has_acceptor_substrate|

-glyco:has_reactant) glyco:has_product)*" ) ) }

Ontology: GlycO

Length: 30 hops

Consists of: 15 reactions

Search time: milliseconds (less than 1 tick)...

courtesy of Dr. Alison Vandersall-Nairn, University of Georgia


Experiments l.jpg
Experiments

  • Scalability

    • Modified DBLP datasets in RDF (added random citations)

    • Test on increasing dataset (adding older years of publications)

    • Search for cited publications (transitive)

      PREFIX opus:<http://lsdis.cs.uga.edu/projects/semdis/opus#>

      SELECT ?end_publication WHERE {

      <http://dblp.uni-trier.de/rec/bibtex/journals/ai/Huber06>%path ?end_publication

      FILTER ( length(%path)<=26 &&regex(%path, "(opus:cites_publication)*" ) ) }

B. Aleman-Meza et. al. Semantic Analytics on Social Networks:

Experiences in Addressing the Problem of Conflict of Interest Detection. (WWW2006)



Experiments results single source paths l.jpg
Experiments – results: single source paths

Search paths up to length 26



More complex uses of path expressions l.jpg

C

A

B

D

More complex uses of path expressions

  • Discover connecting paths with a shared node

    • Path between A and B, length up to 4

    • Path between C and D, length up to 4

    • Both paths have a shared resource

A %path_1 B

length(%path_1) <= 4

?x

C %path_2 D

length(%path_2) <= 4

%path_1 rdfs:member ?x

%path_2 rdfs:member ?x

Potential subgraph discovery


Sparqler summary l.jpg
SPARQLeR summary

  • Path expressions

    • use of regular expressions over properties

  • Flexible path specification

    • Undirected

    • Defined directionality paths

      • Directed

    • Length restricted

  • Complex path patterns

    • Test of resources and properties on the path

    • Intersecting paths


Conclusion and future work l.jpg
Conclusion and future work

  • SPARQLeR extension fits seamlessly into the current SPARQL syntax.

  • Performance of path queries is acceptable (if defined expression is highly selective).

  • Optimization of path queries, complex expressions and multiple paths in query.

  • Inclusion of context.


Sparqler krys kochut maciej janik l.jpg
SPARQLeRKrys Kochut, Maciej Janik

Thank you


ad