1 / 14

The Structured Advanced Query Page

The Structured Advanced Query Page (SAQP) is a web-based tool that allows users to interactively construct advanced and precise queries to PGDBs. It is built on the BioVelo query language and provides a structured interface for querying rich data representations stored in PGDBs. Users can select databases and classes, add conditions, and choose output attributes to generate customized query results. SAQP is a powerful tool for accessing and analyzing data in a more structured and efficient manner.

roseholmes
Download Presentation

The Structured Advanced Query Page

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Structured Advanced Query Page Mario Latendresse TomerAltman Bioinformatics Research Group SRI International March, 2014 SRI International Bioinformatics 1

  2. Introduction • Structured Advanced Query Page (SAQP)‏ • Web page for interactively constructing advanced and precise queries to PGDBs • SAQP is not available on Ptools desktop • Queries are translated to BioVelo and sent to the server for processing • Top Menu Bar command Search -> Advanced Search • http://biocyc.org/query.shtml • Documentation: http://biocyc.org/webQueryDoc.shtml • BioVelo is a query language • Like SQL but simpler and no updates allowed • Documentation: http://biocyc.org/bioveloLanguage.shtml • Free-Form Advanced Query Page (FFAQP) allows Web submission of BioVelo queries SRI International Bioinformatics 2

  3. Why a query interface? • Allow a structured way to access the rich data representation stored in a PGDB. • Most advanced databases have a high-level, declarative method of access (i.e., SQL). • Provides an intermediate level of access between graphically browsing the PGDB and programmatically processing the data using Lisp. SRI International Bioinformatics 3

  4. The Structured Advanced Query Page • 'Structured’: it is a dynamic HTML form, that provides greater ease in crafting queries, but trades flexibility and power for simplicity (FFAQP). • 'Advanced’: it allows to write more precise queries than the basic search interface. • 'Page’: it is accessed via the Web interface for BioCyc (www.biocyc.org/query.shtml), or from your own Pathway Tools web server. SRI International Bioinformatics 4

  5. SAQP Architecture • The SAQP is built on top of a high-level functional declarative language called BioVelo, which is built on top of Pathway Tools. • BioVelo was designed at SRI • On every result page, you will see the equivalent BioVelo code that was generated from the SAQP, which, in turn, generated the results. • You don't need to know anything about BioVelo to use the SAQP, but it might be helpful later if you need the ability to write even more complicated queries using the Free Form Advanced Query Page (FFAQP). SRI International Bioinformatics 5

  6. How to Use the SAQP • 1. Database and class selection then adding conditions • 2. Selection of attributes to output (columns) • 3. Select the output data format (HTML vs TXT)‏ • 4. Click the “Submit Query” button • Documentation about each attribute is displayed by mousing over its name once selected SRI International Bioinformatics 6

  7. Example #1: • A simple query usually consists of querying a particular database about a particular class • Find all the proteins in E. coli K-12 • Display the protein names SRI International Bioinformatics 7

  8. Structure of the Results • A line that shows the equivalent BioVelo expression that the SAQP generated to answer the query • A button to create a SmartTable from the result • A HTML table of the results, with the corresponding entries hyperlinked to the matching Pathway Tools Web pages • Sorting can be applied on each column • If a text data format was requested, then a tab-delimited text file is generated, with just the table data SRI International Bioinformatics 8

  9. Example #2, • We will add a condition to example #1 • Find all the proteins of E. coli K-12 for which the DNA-FOOTPRINT-SIZE is smaller than 10 • Display the protein name, and the DNA footprint size. SRI International Bioinformatics 9

  10. Example #3 • In EcoCyc, display polypeptides constrained by experimentally determined molecular weight and isoelectric point • The experimental molecular weight should be between 50 and 100 kD • The pI should be less than 7 • Display the polypeptide name, the experimental molecular weight, and the pI SRI International Bioinformatics 10

  11. Example #4: • The SAQP allows for specifying quantifiers on relations between PGDB classes • Extending example #3: only proteins where at least one of the genes that encodes the protein to be within the first 500 kilobases of the E. coli chromosome. SRI International Bioinformatics 11

  12. Exercises 1) Find all genes of E. coli that contain “trp” in their name. 2) Find all genes in MetaCyc that have more than one product. Output the gene names and product names. 3) Find all reactions in E. coli which have the reactant (i.e., the left side) “acetaldehyde”. 4) Find all monomers in E. coli. A monomer has no components. 5) Find all reactions in MetaCyc that have more than 4 reactants. 6) Find all metabolic pathways, in MetaCyc, that have more than 5 reactions. Output the reaction lists as well as the pathway names. SRI International Bioinformatics 13

  13. Introduction to BioVelo • BioVelo is based on set and listcomprehension. • In Mathematics, a set comprehension describes a set of values as in:{x | x in Prime, x > 100} • The output is 'x', the body has a generator 'x in Prime' and a condition 'x > 100'. Several conditions and several generators could be used. • BioVelo used a concise syntax: 1) [ output-expression : generator, condition, ... ] 2) a generator has the form v ← database^^class 3) a condition uses logical and relational operators SRI International Bioinformatics 14

  14. Examples of BioVelo Queries • [r : r <- ecoli^^reactions] • [p^name : p <- ecoli^^proteins] • [p^?name : p<- ecoli^^proteins] • [p^?name : p <- ecoli^^proteins, p^dna-footprint-size < 10] • [(g^?name, g^left-end-position): g <- ecoli^^genes, g^left-end-position < 153000] • [(g^?name, k): g<- ecoli^^genes, k := abs(g^left-end-position – g^right-end-position)+1, k < 200 ] • [(r^?name, c^?name) : r<- ecoli^^reactions, c<- r^left, c in r^right] SRI International Bioinformatics 15

More Related