1 / 20

PQLite : An Overly S implistic Q uery L anguage for D ata Provenance

PQLite : An Overly S implistic Q uery L anguage for D ata Provenance. Michael {Leece, Sevilla}. mleece@soe.ucsc.edu msevilla@soe.ucsc.edu CMPS203 Final Project University of California, Santa Cruz Jack Baskin School of Engineering. Overview. Introduction Current Work

sulwyn
Download Presentation

PQLite : An Overly S implistic Q uery L anguage for D ata Provenance

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PQLite:An Overly Simplistic Query Language for Data Provenance Michael {Leece, Sevilla} mleece@soe.ucsc.edu msevilla@soe.ucsc.edu CMPS203 Final Project University of California, Santa Cruz Jack Baskin School of Engineering

  2. Overview • Introduction • Current Work • Design and Implementation • Conclusions

  3. Terminology Terminology • Provenance: history + ancestry of an object [1] • Processes • Data • Provenance Aware Storage (PASS) • Transparent collection • PQL: Path Query Language • Useful for provenance Ancestry Graph

  4. Applications Applications • Security • File System Search • The Cloud • New Hierarchical File Systems • Yan Li’s Photo Album

  5. PQL Broken PQL Broken • Obtained PASSv2 • Ran PQL query on provenance database • Infinite loops • {}

  6. PQL Broken PQL Broken • Obtained PASSv2 • Ran PQL query on provenance database • Infinite loops • {} • “The problem with PQL and Sage is that the implementation… is really slow, and it’s perhaps too easy to generate PQL queries that do not return any data.” • PASS Team

  7. PQL Undocumented PQL Undocumented

  8. Overview Waldo Database Dump Overview App2 App1 User Space PASSv2 Modules Kernel Space Lasagna FS BDB .twig VFS

  9. Use Case Use Case • What we have • [ P ] 1.0 INODE 4 INODE 12[ P ] 1.0 NAME 9 "/file.txt"[ P ] 1.0 TYPE 4 "FILE"[ P ] 1.0 FREEZETIME 8 TIME 1329510432.493134083[ P ] 1.0 FREEZETIME 8 TIME 1329510618.420311721[ P ] 1.0 FREEZETIME 8 TIME 1329510676.040716382[AP ] 1.1 INPUT 12 --> 2.1[AP ] 1.2 INPUT 12 --> 8.1[AP ] 1.3 INPUT 12 --> 16.2[ PT] 2.0 ARGV 4 [1]"cat"[ PT] 2.0 ENV 64 [2]"SHELL=/bin/bash" [3]"TERM=xterm" [4]"XDG_SESSION_COOKIE=06c3f2775eb071081dfacb984bf6c364-1329508695.722050-291519720" [5]"USER=root" [6]"LS_COLORS=no=00:fi=00:di=01;34:ln=01;36:pi=40;33:so=01;35:do=01;35:bd=40;33;01:cd=40;33;01:or=40;31;01:su=37;41:sg=30;43:tw=30;42:ow=34;42:st=37;44:ex=01;32:*.tar=01;31:*.tgz=01;31:*.svgz=01;31:*.arj=01;31:*.taz=01;31:*.lzh=01;31:*.lzma=01;31:*.zip=01;31:*.z=01;31:*.Z=01;31:*.dz=01;31:*.gz=01;31:*.bz2=01;31:*.bz=01;31:*.tbz2=01;31:*.tz=01;31:*.deb=01;31:*.rpm=01;31:*.jar=01;31:*.rar=01;31:*.ace=01;31:*.zoo=01;31:*.cpio=01;31:*.7z=01;31:*.rz=01;31:*.jpg=01;35:*.jpeg=01;35:*.gif=01;35:*.bmp=01;35:*.pbm=01;35:*.pgm=01;35:*.ppm=01;35:*.tga=01;35:*.xbm=01;35:*.xpm=01;35:*.tif=01;35:*.tiff=01;35:*.png=01;35:*.svg=01;35:*.mng=01;35:*.pcx=01;35:*.mov=01;35:*.mpg=01;35:*.mpeg=01;35:*.m2v=01;35:*.mkv=01;35:*.ogm=01;35:*.mp4=01;35:*.m4v=01;35:*.mp4v=01;35:*.vob=01;35:*.qt=01;35:*.nuv=01;35:*.wmv=01;35:*.asf=01;35:*.rm=01;35:*.rmvb=01;35:*.flc=01;35:*.avi=01;35:*.fli=01;35:*.gl=01;35:*.dl=01;35:*.xcf=01;35:*.xwd=01;35:*.yuv=01;35:*.aac=00;36:*.au=00;36:*.flac=00;36:*.mid=00;36:*.midi=00;36:*.mka=00;36:*.mp3=00;36:*.mpc=00;36:*.ogg=00;36:*.ra=00;36:*.wav=00;36:" [7]"MAIL=/var/mail/root" [8]"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" [9]"PWD=/test" [10]"LANG=en_US.UTF-8" [11]"SHLVL=1" [12]"HOME=/root" [13]"LOGNAME=root" [14]"LESSOPEN=| /usr/bin/lesspipe %s" [15]"LESSCLOSE=/usr/bin/lesspipe %s %s" [16]"_=/bin/cat" [17]"OLDPWD=/"[ ] 2.0 EXECTIME 8 TIME 1329510428.104272662[ P ] 2.0 TYPE 4 "PROC"[ ] 2.0 PID 4 INT 13739[ P ] 2.0 NAME 8 "/bin/cat"[A ] 2.0 FORKPARENT 12 --> 14762.0[ P ] 2.0 FREEZETIME 8 TIME 1329510428.104272662 • What we want • A list of files or processes that are one-step ancestors of “/file.txt”

  10. Use Case Use Case (cont.) Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Query Parser Evaluator Waldo Database Dump 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Ancestry Graph Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Dump Parser

  11. Use Case Use Case (cont.) Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Query Parser Evaluator Waldo Database Dump 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Ancestry Graph Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Dump Parser

  12. Use Case Use Case (cont.) Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Query Parser Evaluator Waldo Database Dump 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Ancestry Graph Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Dump Parser

  13. Language Specification Select Statement

  14. Language Specification Select Statement

  15. Language Specification Expression

  16. Language Specification Expression

  17. Use Case Use Case (cont.) Query: SELECT r FROM Graph AS r WHERE r.child = "/file.txt" Select "r" [From [Alias "Graph" "r"]] [Duo Eq (PathType "r" ["child"]) (Str "/file.txt")] Abstract Syntax Tree Query Parser Evaluator Waldo Database Dump 1 -> file.txt 2 -> jazz.jpg 3 -> bacon.txt … Label Map Ancestry Graph Response: [(MyNode "/usr/bin/pico" 1,1,[2]), (MyNode "/usr/bin/vi” 2,3,[17,16,15]), (MyNode "/bin/cat" 1,4,[0])] Dump Parser

  18. What We Did Well What we did well • Functional • It works. (PQLite > PQL) • Easy to use • Intuitive (SQL-like) way of querying a provenance graph • Getting stuff we care about

  19. Lessons Learned Lessons Learned • Infinite recursion in parsing • Left recursion in a recursive descent parser • Refined syntax • Began coding too soon • Monads are useful • IO(), Maybe, State, Parsec

  20. References References • Margo Seltzer, Kiran-Kumar Muniswamy-Reddy, David A. Holland, Uri Braun, and Jonathan Ledlie. Provenance-Aware Storage Systems. (PDF) Harvard University Computer Science Technical Report TR-18-05, July 2005 • Stephanie Jones, Christina Strong, Darrell D. E. Long, Ethan L. Miller, Tracking Emigrant Data via Transient Provenance, Proceedings of the 3rd USENIX Workshop on the Theory and Practice of Provenance (TaPP '11), June 2011. • Kiran-Kumar Muniswamy-Reddy, Uri Braun, David A. Holland, Peter Macko, Diana Maclean, Daniel Margo, Margo Seltzer, and Robin Smogor.Layering in Provenance Systems. In proceedings of the 2009 USENIX Annual Technical Conference, San Diego, CA, June 2009. • PQL Language Guide and Reference

More Related