1 / 31

Special Interest Activity

Special Interest Activity. Trio – A System for Integrated Management of Data, Uncertainty, and Lineage. ITK478 Yan Cui. Agenda. Understand Trio new database system System requirements of installing Trio system Source codes/packages for Trio system Procedure of Trio system installation

fancy
Download Presentation

Special Interest Activity

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Special Interest Activity Trio – A System for Integrated Management of Data, Uncertainty, and Lineage ITK478 Yan Cui

  2. Agenda • Understand Trio new database system • System requirements of installing Trio system • Source codes/packages for Trio system • Procedure of Trio system installation • Configuration of Trio System • Experiment Trio DBMS Using TrioExplorer and TrioPlus(demo) • Trio API Integrated Into Other Python Scripts • Trio API and Translator (Python) • Trio Query Language (TriQL) Structure (demo) • Advantage and Disadvantage of using Trio DBMS • Documenting Trio system report/bugs

  3. Understand Trio new database system • Basic concept of new Trio DBMS • Trio is a new kind of database system (DBMS), which was developed by Stanford University Lab at Dec, 2006. • An extended relational model called Uncertainty Lineage Database (ULDB) , and also supports Trio’s query language Called TriQL [1]. • Handles structured data, uncertainty of data, and data lineage together in a fully integrated manner. • Trio System architecture (components) • Uncertainty-Lineage Database (ULDB) • TriQL: The Trio Query Language

  4. Trio System architecture • Four primary components of Trio DBMS • command-line client (TrioPlus) • TrioExplorer • Trio API and translator (Python) • standard relational DBMS (PostgreSQL) • Uncertainty-Lineage of Database • Encoded data table • Lineage table • Trio metadata • Trio Stored Procedures

  5. Uncertainty-Lineage Database (ULDB) • Alternatives • ‘?’ (Maybe) Annotations • Numerical Confidences • Lineage • Sample: Drives (person, color, car) and Saw (witness, color, car) uncertainty tables with/no with confidence.

  6. Uncertainty-Lineage Database (ULDB) • Alternatives • Definition: Alternatives are presenting uncertainty about the contents of a tuple [2] • ‘||’ annotation • Drives(person, color, car) and Saw (witness, color, car) uncertainty tables • ‘Select * from Drives”

  7. Uncertainty-Lineage Database (ULDB) • ‘?’ (Maybe) Annotations • Definition: ‘?’ annotation present the existence of a tuple on the x-tuple, also called maybe x-tuple [2] • Drives(person, color, car) and Saw (witness, color, car) uncertainty tables • ‘select * from Drives’

  8. Uncertainty-Lineage Database (ULDB) • Numerical Confidences • Definition: Numerical confidence also was considered as probability [2]. • Drives(person, color, car) and Saw (witness, color, car) uncertainty tables with confidence • ‘select * from Drives’

  9. Uncertainty-Lineage Database (ULDB) • Lineage • Definition: “recorded at the granularity of tuple alternatives: Lineage connects as x-tuple alternative to other x-tuple alternative.” in [2] • Drives(person, color, car) and Saw (witness, color, car) uncertainty tables with confidence • ‘select person from Drives’

  10. TriQL: The Trio Query Language • Two major parts of TriQL • built-in functions and predicates for querying confidence values and lineage • regular SQL syntax

  11. System requirements of installation • Operating systems - as Linux, Mac OS X, and Win-32 (XP, Vista, and 32-bit Server) • PostgreSQL database (version 8.2.5, 8.1.10, 8.0.14, and 7.4.18)- Linux and Win32 • Python API - windows, Linux/Unix, Mac OS X, OS/2, and Amiga

  12. Source codes/packages for Trio system Listing source codes • Python 2.4 can be downloaded from http://www.python.org/ . • Easy_install can be downloaded from http://peak.telecommunity.com/DevCenter/EasyInstall and the file called ez_setup.py. • Readline 1.7.win32 can be downloaded from http://www.python.org/ . • ctypes-1.0.2.win32-py2.4 can be download from http://www.python.org/ . • PostgreSQL 8.1 can be downloaded from http://www.postgresql.org/ • Graphviz 2.14 is the only version compatible with Trio API. It is available in http://infolab.stanford.edu/trio/code/graphviz-2.14.1.exe . • PyGreSQL can be downloaded from http://www.pygresql.org/ . • Pylons 0.9.5 can be downloaded from http://pylonshq.com/ . • PLY 2.2 can be downloaded from http://www.dabeaz.com/ply/ . • PyParsing can be downloaded from http://pyparsing.wikispaces.com/ . • PyDot can be downloaded from http://code.google.com/p/pydot/downloads/list . • Trio API 1.0 can be downloaded from http://infolab.stanford.edu/~theobald/sources/TRIO.zip .

  13. Procedure of Trio system installation • Python • Python 2.4 windows version (python-2.4.4.msi). • Install Python in C:/Python directory • Set path=c:/Python24; in environment variables • Readline • Readline-1.7.win32-py2.4.exe and install into Python directory • Ctypes • Ctypes-1.0.2.win32-py2.4.exe and install into Python directory • PostgreSQL • PostgreSQL 8.1 windows version (postgresql-8.1.msi ) • Install PostgreSQL 8.1 and set path C:\Program Files\PostgreSQL\8.1\bin; after completed the installation. • Graphviz • Graphviz 2.14 version, install to your workstation and set path C:\PROGRA~1\ATT\Graphviz\bin; in environment variables after completed the installation

  14. Procedure of Trio system installation (cont) • Easy_install- Download ez_setup.py in C:/ directory • PyGreSQL– In command line, cd\ to c: directory, and run python ez_setup.py PyGreSQL to install components. • Pylons– In command line, cd\ to c: directory, and run python ez_setup.py Pylons==0.9.5 to install Pylons. Set path c:\python24\Scripts in environment variables. • PLY– In command line, cd\ to c: directory, and run python ez_setup.py Ply==2.2 or easy_install Ply==2.2. • PyParsing– In command line, cd\ to c: directory, and run python ez_setup.py PyParsing. • PyDot– download the source from website. Access to folder in command line, and then install manually by running ‘python setup.py install’. • Trio API • Download source code in any directory • Copy Trio-1.0\spi\triospi_win32.dll to PostgreSQL’s lib directory and renamed as triospi.dll

  15. Procedure of Trio system installation (cont) PostgreSQL installation

  16. Configuration of Trio System • Windows superuser authentication to access PostgreSQL • TrioExplorer • TrioPlus

  17. Configuration of Trio System • Windows superuser authentication for PostgreSQL • Open pgadmin III for PostgreSQL • Create new login role • Role name ‘myname’ (as same as windows login account) • Set password (Password can be any) and check all role privileges and click ok. • Create new database • Database name ‘myname’ (as same as username) • Owner is ‘myname’ and click ok • Initialize Trio schema information • In Trio-1.0\setup, open setup.py with notepad to comment out the last three codes and put the following. os.system("psql %s %s < setup.sql" % (pgdbname, username)) os.system("psql %s %s < setup_triospi.sql" % (pgdbname, username)) os.system("psql %s %s < trio_get_conf.sql" % (pgdbname, username)) • Save the file, and at the command line, cd \Trio-1.0\setup, and run ‘python setup.py myname myname’

  18. Configuration of Trio System (cont) • pgadmin III for PostgreSQL 8.1

  19. Configuration of Trio System (cont) • TrioExplorer • Make sure PostgreSQL is working. • Running TrioExplore – Ensure path ‘c:\python24\Scripts;’ in environment variables. And double click ‘start_te_server.bat’ under Trio-1.0\explorer. • At the command line, you are now prompted for an admin user login to PostgreSQL, which should have been created along with your PostgreSQL installation and which will be used by TrioExplorer to create new user roles and database instances. • TrioExplorer should now be reachable from your browser using http://localhost:8080/. For new users can now press ‘Create a new user’ and create their own Trio login and database instances, which are then managed by the PostgreSQL server.

  20. Configuration of Trio System (cont) • TrioPlus • Create new PostgreSQL user role and database instance • Run ‘createuser demo’ • Run ‘createdb demo’, the name must be the same as username • Initialize Trio schema information for new user by access as same as windows superuser authentication to access PostgreSQL. Use TrioExplorer will be easily just press ‘Create new role’ in Web. • Connect to new Trio database using the command line clients by running ‘python trioplus.py –u demo –d demo –p’

  21. Trio API Integrated Into Other Python Scripts • TrioCnx • TrioCnx(pgdb) –PyGreSQL connection pgdb • cursor() – return a new TrioCursor object for the current connection. • commint() – commits the current transaction. • rollback() – performs a rollback for the current transaction. • close() – closes the Trio connection • TrioCursor • execute(triql) - Executes a TriQL statement triql for this cursor object. • fetchone() - Fetches a single XTuple object from the current cursor position. • fetchall() - Fetches and returns a list of all XTuple objects beginning from the current cursor position.

  22. Trio API Integrated Into Other Python Scripts (cont) • XTuple • len() - Returns the number of Alternative objects contained in this XTuple object. • getAlternative(idx) - Returns the Alternative object at the designated index idx. • getConfidence() - Returns the confidence value (if any) of this XTuple object as the sum of its Alternative objects' confidence values. • getQuestionMark() - Returns whether this XTuple object has a question mark or not. • Alternative • getLineage() – returns a list of immediate lineage information • traceLineage() – performs a transitive lineage traversal for this alternative back to the base data • getConfidence() - Returns the confidence value of this alternative. • computeConfidence() - Computes the confidence value

  23. Trio API And Translator (Python)

  24. Demo • TrioExplorer and TrioPlus • Trio Query Language (TriQL) Structure • Drop index/table • Create Trio table/index • TriQL language • TriQL manual: http://infolab.stanford.edu/~widom/triql.html • Sample: TriQL Script

  25. Current Supported for TriQL

  26. Table of TriQL contents

  27. Advantage and Disadvantage • Advantage • Open source and free support for any non-benefit users to experience new Trio DBMS • Advanced components in relational DBMS • Computing confidences • Efficient, Convenient, safe, Multi-User storage of and access to, Massive, Persistent • Disadvantage • Time cost for query • Dependency • On development stage

  28. Advantage and Disadvantage • Disadvantage • Time cost for query Using ‘SELECT attr-list FROM X1, X2, ..., Xn WHERE predicate’ as a query example in [6] for a comparison between relational database and ULDB. Over standard relational database: • For each tuple in cross-product of X1, X2, ..., Xn • Evaluate the predicate • If true, project attr-list to create result tuple Over ULDB: • For each tuple in cross-product of X1, X2, ..., Xn • Create “super tuple” T from all combinations of alternatives • Evaluate predicate on each alternative in T ; keep only the true ones • Project attr-list on each alternative to create result tuple • Details: ‘?’, lineage, confidences

  29. Documenting Trio system report/bugs • The install instruction in website http://dbpubs.stanford.edu:8011/doku.php/trio:installation , indicated unclearly the version of Graphviz for Trio system. In Graphviz website http://www.graphviz.org/ only has version 2.16 but not compatable except version 2.14. Graphviz version 2.14 is available for download in http://infolab.stanford.edu/trio/code/graphviz-2.14.1.exe . • The windows authentication supperuser needs to be created first in the PostgreSQL in order to connect to database. After established the connection, TrioExplorer and TrioPlus can use the supperuser’s login and password as windows authentiction to access to database system. However, it doesn’t mention at all in the installation procedure on how to create this typle of new user. The only way to solve it is to use PostgreSQL->pgadmin III manually. • After created the supperuser, I have to modify some codes in setup.py in Trio-1.0->setup directory in order to run ‘python setup.py –u myname –d myname –p’. • TriQL query statements in http://infolab.stanford.edu/~widom/triql.html#options, there are many samples queries not working properly as desired.

  30. References • M. Mutsuzaki, M. Theobald, A. de Keijzer, J. Widom, P. Agrawal, O. Benjelloun, A. Das Sarma, R. Murthy, and T. Sugihara. Trio-One: Layering Uncertainty and Lineage on a Conventional DBMS. Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR '07), Pacific Grove, California, January 2007. Demonstration description. • O. Benjelloun, A. Das Sarma, C. Hayworth, and J. Widom. An Introduction to ULDBs and the Trio System. IEEE Data Engineering Bulletin, Special Issue on Probabilistic Databases, 29(1):5-16, March 2006. • Trio: A System for integrated Management of Data, Uncertainty, and Lineage. Retrieved on November, 18, 2007 from http://infolab.stanford.edu/trio/ . • PostgreSQL. Retrieved on November, 20, 2007 from http://www.postgresql.org/. • Python. Retrieved on November, 15, 2007 from http://www.python.org/. • Trio: A System for Data, Uncertainty, and Lineage. given by Jennifer at various venues, 2006-07. Ppt.

  31. Questions?

More Related