knowledge streams stream processing of semantic web content
Download
Skip this Video
Download Presentation
Knowledge Streams: Stream Processing of Semantic Web Content

Loading in 2 Seconds...

play fullscreen
1 / 18

Knowledge Streams: Stream Processing of Semantic Web Content - PowerPoint PPT Presentation


  • 148 Views
  • Uploaded on

Knowledge Streams: Stream Processing of Semantic Web Content. Mike Dean Principal Engineer Raytheon BBN Technologies [email protected] Assumptions. Technology – Intermediate Familiarity with RDF and OWL Interest in Stream processing Scalability. Presenter Background .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Knowledge Streams: Stream Processing of Semantic Web Content' - charo


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
knowledge streams stream processing of semantic web content

Knowledge Streams: Stream Processing of Semantic Web Content

Mike Dean

Principal Engineer

Raytheon BBN Technologies

[email protected]

assumptions
Assumptions
  • Technology – Intermediate
    • Familiarity with RDF and OWL
  • Interest in
    • Stream processing
    • Scalability
presenter background
Presenter Background
  • Principal Engineer at Raytheon BBN Technologies (1984-present)
  • Principal Investigator for DARPA Agent Markup Language (DAML) Integration and Transition (2000-2005)
    • Chaired the Joint US/EU Committee that developed DAML+OIL and SWRL
  • Developer and/or Principal Investigator for many Semantic Web tools, datasets, and applications (2000-present)
  • Member of the W3C RDF Core, Web Ontology, and Rule Interchange Format Working Groups
    • Co-editor of the W3C OWL Reference
  • Local co-chair for ISWC2009
  • Other SemTech presentations
    • Semantic Query: Solving the Needs of a Net-Centric Data Sharing Environment (2007, w/ Matt Fisher)
    • Semantic Queries and Mediation in a RESTful Architecture (2008, w/ John Gilman and Matt Fisher)
    • Use of SWRL for Ontology Translation (2008)
    • Semantic Web @ BBN: Application to the Digital Whitewater Challenge (2009, w/ John Hebeler)
    • How is the Semantic Web Being Used? An Analysis of the Billion Triples Challenge Corpus (2009)
    • Finding a Good Ontology: The Open Ontology Repository Initiative (2010, w/ Peter Yim and Todd Schneider)
outline
Outline
  • Motivation
  • Vision
  • Building Blocks
  • Demonstration
motivations
Motivations
  • Timeliness
  • Performance
timeliness
Timeliness
  • Streaming minimizes latency
    • Processing elements see events as they occur
    • Resources are expended only when an event occurs
  • This is in contrast to polling
    • Latency averages half the polling interval
    • Resources are expended on every poll
    • Popular web syndication mechanisms such as RSS and Atom involve polling
performance
Performance
  • Many Semantic Web tools provide streaming parsers rather than, or in addition to, model access
    • Analogous to XML SAX vs. DOM
  • For suitable applications, this can be 10x faster than loading all statements into memory or a KB
2 streaming stories
2 Streaming Stories
  • dumpont of OpenCyc (circa 2003)
    • HTML-based ontology visualization tool periodically bogged down daml.org server
    • Reimplementation using event-based Jena ARP parser yielded 10x performance and scalability improvements
  • Billion Triples Challenge 2009
    • Streaming analysis of the 2009 corpus was performed at an overall rate of 103K statements/sec on a Mac laptop with a portable external disk
    • Compare to loading 10-20K statements/second on a server
stream processing examples
Stream Processing Examples
  • Unix pipes
  • Dataflow architectures
  • Streambase
  • IBM System S/InfoSphere Streams
slide10

Semantic Web

Sensor

Network

Gazetteer

Imagery

Database

Archive

Sensor

IM

Vision: Knowledge Streams

Users

Community of Interest 1

Data

Sources

  • Processing elements
  • Consume and produce subgraphs
  • Multiple functions may be combined

aggregation

context

filter

augmentation

inference

User 1

Community of Interest 2

  • Persistent pipelines
  • Streams of statements comprising object subgraphs
  • URI naming allows drill-down
  • Provenance, timestamps

User 2

distribution

correlation

persistent

queries

translation

alerts

CEP

NLP

RSS

User 3

Distribution And Processing Elements

goals
Goals
  • Web-scale
    • Decentralized among multiple sites
    • Heterogenous implementations
  • Long-lived, persistent connections
    • User accountability
  • Introspection over the processing network for control and optimization
    • E.g. aggregating subscriptions
    • Balance with security, privacy, and autonomy concerns
building blocks
Building Blocks
  • RDF Content
  • Existing stream processing frameworks
  • Workflow systems
  • Publish/subscribe message oriented middleware
rdf payloads
RDF Payloads
  • Malleable data
    • Standards-based graph structure
    • Can easily add, remove, and transform statements
  • Self-describing
    • Unique naming via URIs
    • References to vocabularies and ontologies
  • Potential for inference
workflow systems
Workflow Systems
  • Graphical environments for developing processing pipelines
    • Yahoo Pipes, DERI Pipes, SPARQLMotion
    • Nice user interfaces for development and execution

http://pipes.deri.org

semantic complex event processing
Semantic Complex Event Processing
  • Complex Event Processing
    • One of the leading edges of rules technology
    • Formal specification of higher-level events in terms of lower-level events
      • E.g. alert if the moving average increases 15% within a 10 minute window
    • Engine can be compiled/optimized for a specific rule set
    • High-volume deployments in finance and other industries
    • Most implementations focus on self-contained tuples
  • Semantic Complex Event Processing
    • Enrich CEP using Semantic Web technology
    • Emerging topic at recent conferences
  • Early implementations
    • Wrappers around open source CEP engines
    • Native implementation
  • Provides a powerful set of operators and engines for Knowledge Streams
implementation approach
Implementation Approach
  • Well-defined APIs for implementing operators
  • Operator execution containers
    • Could encapsulate existing engines
  • Start with manual processing network configuration, then automate
use cases
Use Cases
  • Dissemination of metadata for new satellite imagery
  • Social network changes
  • Alerting of friends’ new publications
slide18
Demo
  • Processing using DERI Pipes with new operators
    • Ingest of #SemTechBiz tweets using Twitter Streaming API
    • Conversion of JSON to RDF
    • Mapping to SIOC vocabulary using SWRL rules
    • Enrich by matching Twitter @handles with contacts
    • Persistent buffering using Java Message Service
    • Monitoring
ad