Knowledge streams stream processing of semantic web content
1 / 18

Knowledge Streams: Stream Processing of Semantic Web Content - PowerPoint PPT Presentation

  • Uploaded on

Knowledge Streams: Stream Processing of Semantic Web Content. Mike Dean Principal Engineer Raytheon BBN Technologies Assumptions. Technology – Intermediate Familiarity with RDF and OWL Interest in Stream processing Scalability. Presenter Background .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Knowledge Streams: Stream Processing of Semantic Web Content' - charo

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Knowledge streams stream processing of semantic web content

Knowledge Streams: Stream Processing of Semantic Web Content

Mike Dean

Principal Engineer

Raytheon BBN Technologies

Assumptions Content

  • Technology – Intermediate

    • Familiarity with RDF and OWL

  • Interest in

    • Stream processing

    • Scalability

Presenter background
Presenter Background Content

  • Principal Engineer at Raytheon BBN Technologies (1984-present)

  • Principal Investigator for DARPA Agent Markup Language (DAML) Integration and Transition (2000-2005)

    • Chaired the Joint US/EU Committee that developed DAML+OIL and SWRL

  • Developer and/or Principal Investigator for many Semantic Web tools, datasets, and applications (2000-present)

  • Member of the W3C RDF Core, Web Ontology, and Rule Interchange Format Working Groups

    • Co-editor of the W3C OWL Reference

  • Local co-chair for ISWC2009

  • Other SemTech presentations

    • Semantic Query: Solving the Needs of a Net-Centric Data Sharing Environment (2007, w/ Matt Fisher)

    • Semantic Queries and Mediation in a RESTful Architecture (2008, w/ John Gilman and Matt Fisher)

    • Use of SWRL for Ontology Translation (2008)

    • Semantic Web @ BBN: Application to the Digital Whitewater Challenge (2009, w/ John Hebeler)

    • How is the Semantic Web Being Used? An Analysis of the Billion Triples Challenge Corpus (2009)

    • Finding a Good Ontology: The Open Ontology Repository Initiative (2010, w/ Peter Yim and Todd Schneider)

Outline Content

  • Motivation

  • Vision

  • Building Blocks

  • Demonstration

Motivations Content

  • Timeliness

  • Performance

Timeliness Content

  • Streaming minimizes latency

    • Processing elements see events as they occur

    • Resources are expended only when an event occurs

  • This is in contrast to polling

    • Latency averages half the polling interval

    • Resources are expended on every poll

    • Popular web syndication mechanisms such as RSS and Atom involve polling

Performance Content

  • Many Semantic Web tools provide streaming parsers rather than, or in addition to, model access

    • Analogous to XML SAX vs. DOM

  • For suitable applications, this can be 10x faster than loading all statements into memory or a KB

2 streaming stories
2 Streaming Stories Content

  • dumpont of OpenCyc (circa 2003)

    • HTML-based ontology visualization tool periodically bogged down server

    • Reimplementation using event-based Jena ARP parser yielded 10x performance and scalability improvements

  • Billion Triples Challenge 2009

    • Streaming analysis of the 2009 corpus was performed at an overall rate of 103K statements/sec on a Mac laptop with a portable external disk

    • Compare to loading 10-20K statements/second on a server

Stream processing examples
Stream Processing Examples Content

  • Unix pipes

  • Dataflow architectures

  • Streambase

  • IBM System S/InfoSphere Streams

Knowledge streams stream processing of semantic web content

Semantic Web Content









Vision: Knowledge Streams


Community of Interest 1



  • Processing elements

  • Consume and produce subgraphs

  • Multiple functions may be combined






User 1

Community of Interest 2

  • Persistent pipelines

  • Streams of statements comprising object subgraphs

  • URI naming allows drill-down

  • Provenance, timestamps

User 2










User 3

Distribution And Processing Elements

Goals Content

  • Web-scale

    • Decentralized among multiple sites

    • Heterogenous implementations

  • Long-lived, persistent connections

    • User accountability

  • Introspection over the processing network for control and optimization

    • E.g. aggregating subscriptions

    • Balance with security, privacy, and autonomy concerns

Building blocks
Building Blocks Content

  • RDF Content

  • Existing stream processing frameworks

  • Workflow systems

  • Publish/subscribe message oriented middleware

Rdf payloads
RDF Payloads Content

  • Malleable data

    • Standards-based graph structure

    • Can easily add, remove, and transform statements

  • Self-describing

    • Unique naming via URIs

    • References to vocabularies and ontologies

  • Potential for inference

Workflow systems
Workflow Systems Content

  • Graphical environments for developing processing pipelines

    • Yahoo Pipes, DERI Pipes, SPARQLMotion

    • Nice user interfaces for development and execution

Semantic complex event processing
Semantic Complex Event Processing Content

  • Complex Event Processing

    • One of the leading edges of rules technology

    • Formal specification of higher-level events in terms of lower-level events

      • E.g. alert if the moving average increases 15% within a 10 minute window

    • Engine can be compiled/optimized for a specific rule set

    • High-volume deployments in finance and other industries

    • Most implementations focus on self-contained tuples

  • Semantic Complex Event Processing

    • Enrich CEP using Semantic Web technology

    • Emerging topic at recent conferences

  • Early implementations

    • Wrappers around open source CEP engines

    • Native implementation

  • Provides a powerful set of operators and engines for Knowledge Streams

Implementation approach
Implementation Approach Content

  • Well-defined APIs for implementing operators

  • Operator execution containers

    • Could encapsulate existing engines

  • Start with manual processing network configuration, then automate

Use cases
Use Cases Content

  • Dissemination of metadata for new satellite imagery

  • Social network changes

  • Alerting of friends’ new publications

Knowledge streams stream processing of semantic web content
Demo Content

  • Processing using DERI Pipes with new operators

    • Ingest of #SemTechBiz tweets using Twitter Streaming API

    • Conversion of JSON to RDF

    • Mapping to SIOC vocabulary using SWRL rules

    • Enrich by matching Twitter @handles with contacts

    • Persistent buffering using Java Message Service

    • Monitoring