query language constructs for provenance n.
Skip this Video
Loading SlideShow in 5 Seconds..
Query Language Constructs for Provenance PowerPoint Presentation
Download Presentation
Query Language Constructs for Provenance

Loading in 2 Seconds...

play fullscreen
1 / 9

Query Language Constructs for Provenance - PowerPoint PPT Presentation

  • Uploaded on

Query Language Constructs for Provenance. Murali Mani, Mohamad Alawa , Arunlal Kalyanasundaram University of Michigan, Flint Presented at IDEAS 2011. Provenance Metadata. Data about origins of data Applications: Check whether data item is valid – in health records

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Query Language Constructs for Provenance' - inigo

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
query language constructs for provenance

Query Language Constructs for Provenance

Murali Mani, MohamadAlawa, ArunlalKalyanasundaram

University of Michigan, Flint

Presented at IDEAS 2011.

provenance metadata
Provenance Metadata
  • Data about origins of data
  • Applications:
    • Check whether data item is valid – in health records
    • How much do we trust an inference/observation – scientific computation
    • Audit trails – manufacturing/shipping/trading
    • Database community found provenance could be useful in
      • updating views
      • maintenance of materialized views
      • interpretation of query results
      • querying probabilistic/uncertain data
    • In short, numerous applications …
opm open provenance model http openprovenance org
OPM (Open Provenance Model) http://openprovenance.org/
  • Developed by several researchers who have been involved with provenance
  • Describes a logical representation of provenance information for a wide variety of applications.
    • Provenance information represented as a directed graph consisting of:
      • Nodes (can be artifact, process, or agent)
      • Edges or dependencies. There are 5 types of edges
        • Used: a process used an artifact
        • wasGeneratedBy: an artifact generated by a process
        • wasControlledBy: a process controlled by an agent
        • wasTriggeredBy: a process trigged by another process
        • wasDerivedFrom: an artifact derived from another artifact
      • Nodes and edges have annotations (attribute-value pairs)
opm a simple example
OPM: A Simple Example

A1, A2 are artifacts

P = a process that is performing division (A1/A2) – note the used edges between P and A1, A2

A3, A4 are artifacts generated by P (representing quotient, remainder) – note the wasGeneratedBy edges between P and A3, A4













Example taken from http://openprovenance.org/tutorial/

queries for opm
Queries for OPM
  • We can write complex “multi-step inference” queries using Datalog/SQL based on the different edges in OPM
    • Example: find artifacts directly or indirectly derived from another artifact (recursive query using wasDerivedFrom edges)
  • However, is it sufficient? We may need to express
    • Sub-graph isomorphism (given a graph query pattern, check whether the pattern appears in a provenance graph)
      • Studied in graph query languages ([Graph-QL]), [OPQL] …
    • Shortest path queries (using some notion of distance)
      • Typically not studied in graph query languages
our approach
Our approach
  • Two sets of constructs
    • Constructs for Querying Content
      • Select nodes, edges based on annotations (attribute values) associated with them
      • Operators include typical relational algebra operators: select, project, union,
    • Constructs for Querying Structure
      • 6 basic functions
        • from (e)/to (e): node from where e starts/e ends
        • from-1 (n)/to-1 (n): edges that start at node n/end at node n
        • next (n): nodes to where is an edge from n
        • prev (n): nodes from where there is an edge to n
      • Generalized selection operator, specified as
        • specifies what nodes in G must appear in the result
        • specifies what edges in G must appear in the result
        • Result: , is a sub-graph of G (i.e., , )
examples of generalized selection operator
Examples of Generalized Selection Operator
  • descendant graph given a set of nodes S
    • = set of nodes, n | there is a path from s S to n
    • = set of edges between the nodes selected by
  • shortest path graph between s and t
    • = set of edges on the shortest path between s and t
    • = set of nodes adjacent to an edge selected by
  • Note: The constructs for querying content and for querying structure can be integrated to yield a powerful query model, that can express a wide range of queries.
conclusions and future work
Conclusions and Future Work
  • Observation: Provenance query language should not be restricted to Datalog/SQL.
  • Developed a query model that provides constructs for querying structure and for querying content.
  • Using our query model, we can express a wide range of queries including shortest path (not expressible using SQL/Datalog).
  • [Graph-QL]: He, H., and Singh, A. K. 2008. Graphs-at-a-time: Query Language and Access Methods for Graph Databases. ACM SIGMOD (2008).
  • [OPQL]: Lim, C., Lu, S., Chebotko, A., and Fatouhi, F. 2011. OPQL: A First OPM-Level Query Language for Scientific Workflow Provenance. IEEE SCC (2011).
  • [OPM]: The OPM Provenance Model (OPM), available at http://openprovenance.org/