1 / 8

Provenance Maintenance and Querying on Log-structured Databases

Provenance Maintenance and Querying on Log-structured Databases. A data-centric platform for analyzing distributed systems. Route r 2. An Example Scenario. Route r 1. Why did my route to foo.com change?!. D. E. A. foo.com. Innocent Reason?. Software Bugs?. Alice. C. B.

busters
Download Presentation

Provenance Maintenance and Querying on Log-structured Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Provenance Maintenance and Querying on Log-structured Databases A data-centric platform for analyzing distributed systems

  2. Route r2 An Example Scenario Route r1 Why did my route to foo.com change?! D E A foo.com Innocent Reason? Software Bugs? Alice C B Malicious Attack? • An example scenario: network routing • The route to foo.com has suddenly changed • Alice wants to understand the exact cause

  3. Anomalies in Distributed Systems • For network routing … • “YouTube blames Pakistan ISP for global outage” (Feb 2008) • “A Chinese ISP momentarily hijacks the Internet” (March 2010) • “Unknown fault darkens Australia’s Internet” (Feb 2012) • … but also for other application scenarios • Distributed hash table: Eclipse attack • Cloud computing: misbehaving machines • Online multi-player gaming: cheating • Goal: To understand and debug behavior of distributed systems 3

  4. A Data-centric Perspective foo.com Alice D E route(A, foo.com) route(A, B) A link(A, B) route(B, foo.com) route(A, D) link(A, B) route(C, foo.com) …… link(A, D) B C link(C, foo.com) link(B, C) • We assume a general distributed system • Network consists of nodes (routers, middleboxes, ...) • The state of a node is a set of tuples (routes, config, ...) • Idea: Explanation as reasoning of state dependencies

  5. Provenance as Explanations foo.com Alice route(D, foo.com) route(E, foo.com) D E link(D, E) link(E, B) route(A, foo.com) A link(A, B) route(B, foo.com) route(C, foo.com) B C link(C, foo.com) link(B, C) • Provenance for encoding state dependencies • Explains the derivation of tuples • Captures the dependencies between tuples as a graph • Explanation of a tuple is a tree rooted at the tuple • Route r1 disappeared due to a link failure between B and C

  6. Proposal: Provenance Maintenance • In traditional database systems • Provenance deltasbetween adjacent system state • Logs of all non-deterministic events • Replay the events to reconstruct provenance • Problem: storage overhead • In log-structured databasesystems • Only maintain logs of events, but not the latest system state • Natural for provenance support (with no additional cost) • Example: Hyder [CIDR 2011]build upon SSDs for web services

  7. Proposal: Provenance Querying route(A,B,3) route(A,B,7) route(A,B,5) • An efficient data structure for provenance querying • Backward pointers to most recent update to the same state • Chained pointers for reconstructing the a specific state • Optimization of provenance querying • Naïve approach: reconstruct the complete provenance graph • Optimization: only reconstruct the provenance as necessary

  8. Project Arrangement • Project plan • Develop the provenance system on log-structure databases • Evaluate the provenance system against several applications • Performance impact on primary system • Performance (latency) of provenance queries • Budget • Time frame • System development: 6 months • Performance evaluation: 3 months • Cost: $75,000

More Related