Knowledge streams stream processing of semantic web content
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Knowledge Streams: Stream Processing of Semantic Web Content PowerPoint PPT Presentation


  • 86 Views
  • Uploaded on
  • Presentation posted in: General

Knowledge Streams: Stream Processing of Semantic Web Content. Mike Dean Principal Engineer Raytheon BBN Technologies [email protected] Assumptions. Technology – Intermediate Familiarity with RDF and OWL Interest in Stream processing Scalability. Presenter Background.

Download Presentation

Knowledge Streams: Stream Processing of Semantic Web Content

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Knowledge streams stream processing of semantic web content

Knowledge Streams: Stream Processing of Semantic Web Content

Mike Dean

Principal Engineer

Raytheon BBN Technologies

[email protected]


Assumptions

Assumptions

  • Technology – Intermediate

    • Familiarity with RDF and OWL

  • Interest in

    • Stream processing

    • Scalability


Presenter background

Presenter Background

  • Principal Engineer at Raytheon BBN Technologies (1984-present)

  • Principal Investigator for DARPA Agent Markup Language (DAML) Integration and Transition (2000-2005)

    • Chaired the Joint US/EU Committee that developed DAML+OIL and SWRL

  • Developer and/or Principal Investigator for many Semantic Web tools, datasets, and applications (2000-present)

  • Member of the W3C RDF Core, Web Ontology, and Rule Interchange Format Working Groups

    • Co-editor of the W3C OWL Reference

  • Local co-chair for ISWC2009

  • Other SemTech presentations

    • Semantic Query: Solving the Needs of a Net-Centric Data Sharing Environment (2007, w/ Matt Fisher)

    • Semantic Queries and Mediation in a RESTful Architecture (2008, w/ John Gilman and Matt Fisher)

    • Use of SWRL for Ontology Translation (2008)

    • Semantic Web @ BBN: Application to the Digital Whitewater Challenge (2009, w/ John Hebeler)

    • How is the Semantic Web Being Used? An Analysis of the Billion Triples Challenge Corpus (2009)

    • Finding a Good Ontology: The Open Ontology Repository Initiative (2010, w/ Peter Yim and Todd Schneider)


Outline

Outline

  • Motivation

  • Vision

  • Building Blocks

  • Demonstration


Motivations

Motivations

  • Timeliness

  • Performance


Timeliness

Timeliness

  • Streaming minimizes latency

    • Processing elements see events as they occur

    • Resources are expended only when an event occurs

  • This is in contrast to polling

    • Latency averages half the polling interval

    • Resources are expended on every poll

    • Popular web syndication mechanisms such as RSS and Atom involve polling


Performance

Performance

  • Many Semantic Web tools provide streaming parsers rather than, or in addition to, model access

    • Analogous to XML SAX vs. DOM

  • For suitable applications, this can be 10x faster than loading all statements into memory or a KB


2 streaming stories

2 Streaming Stories

  • dumpont of OpenCyc (circa 2003)

    • HTML-based ontology visualization tool periodically bogged down daml.org server

    • Reimplementation using event-based Jena ARP parser yielded 10x performance and scalability improvements

  • Billion Triples Challenge 2009

    • Streaming analysis of the 2009 corpus was performed at an overall rate of 103K statements/sec on a Mac laptop with a portable external disk

    • Compare to loading 10-20K statements/second on a server


Stream processing examples

Stream Processing Examples

  • Unix pipes

  • Dataflow architectures

  • Streambase

  • IBM System S/InfoSphere Streams


Knowledge streams stream processing of semantic web content

Semantic Web

Sensor

Network

Gazetteer

Imagery

Database

Archive

Sensor

IM

Vision: Knowledge Streams

Users

Community of Interest 1

Data

Sources

  • Processing elements

  • Consume and produce subgraphs

  • Multiple functions may be combined

aggregation

context

filter

augmentation

inference

User 1

Community of Interest 2

  • Persistent pipelines

  • Streams of statements comprising object subgraphs

  • URI naming allows drill-down

  • Provenance, timestamps

User 2

distribution

correlation

persistent

queries

translation

alerts

CEP

NLP

RSS

User 3

Distribution And Processing Elements


Goals

Goals

  • Web-scale

    • Decentralized among multiple sites

    • Heterogenous implementations

  • Long-lived, persistent connections

    • User accountability

  • Introspection over the processing network for control and optimization

    • E.g. aggregating subscriptions

    • Balance with security, privacy, and autonomy concerns


Building blocks

Building Blocks

  • RDF Content

  • Existing stream processing frameworks

  • Workflow systems

  • Publish/subscribe message oriented middleware


Rdf payloads

RDF Payloads

  • Malleable data

    • Standards-based graph structure

    • Can easily add, remove, and transform statements

  • Self-describing

    • Unique naming via URIs

    • References to vocabularies and ontologies

  • Potential for inference


Workflow systems

Workflow Systems

  • Graphical environments for developing processing pipelines

    • Yahoo Pipes, DERI Pipes, SPARQLMotion

    • Nice user interfaces for development and execution

http://pipes.deri.org


Semantic complex event processing

Semantic Complex Event Processing

  • Complex Event Processing

    • One of the leading edges of rules technology

    • Formal specification of higher-level events in terms of lower-level events

      • E.g. alert if the moving average increases 15% within a 10 minute window

    • Engine can be compiled/optimized for a specific rule set

    • High-volume deployments in finance and other industries

    • Most implementations focus on self-contained tuples

  • Semantic Complex Event Processing

    • Enrich CEP using Semantic Web technology

    • Emerging topic at recent conferences

  • Early implementations

    • Wrappers around open source CEP engines

    • Native implementation

  • Provides a powerful set of operators and engines for Knowledge Streams


Implementation approach

Implementation Approach

  • Well-defined APIs for implementing operators

  • Operator execution containers

    • Could encapsulate existing engines

  • Start with manual processing network configuration, then automate


Use cases

Use Cases

  • Dissemination of metadata for new satellite imagery

  • Social network changes

  • Alerting of friends’ new publications


Knowledge streams stream processing of semantic web content

Demo

  • Processing using DERI Pipes with new operators

    • Ingest of #SemTechBiz tweets using Twitter Streaming API

    • Conversion of JSON to RDF

    • Mapping to SIOC vocabulary using SWRL rules

    • Enrich by matching Twitter @handles with contacts

    • Persistent buffering using Java Message Service

    • Monitoring


  • Login