1 / 47

The Pathway Tools Schema

The Pathway Tools Schema. Motivations for Understanding Schema. Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB

cole-rose
Download Presentation

The Pathway Tools Schema

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Pathway Tools Schema

  2. Motivations for Understanding Schema • Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB • When writing complex queries to PGDBs, those queries must name classes and slots within the schema • A Pathway/Genome Database is a web of interconnected objects; each object represents a biological entity

  3. Reference • Pathway Tools User’s Guide, Volume I • Appendix A: Guide to the Pathway Tools Schema

  4. Web of Relationships for One Enzyme Succinate + FAD = fumarate + FADH2 Enzymatic-reaction Succinate dehydrogenase Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhC sdhD sdhA sdhB TCA Cycle

  5. Frame Data Model • Frame Data Model -- organizational structure for a PGDB • Knowledge base (KB, Database, DB) • Frames • Slots • Facets • Annotations

  6. Knowledge Base • Collection of frames and their associated slots, values, facets, and annotations • AKA: Database, PGDB • Can be stored within • An Oracle DB • A disk file • A Pathway Tools binary program

  7. Frames • Entities with which facts are associated • Kinds of frames: • Classes: Genes, Pathways, Biosynthetic Pathways • Instances (objects): trpA, TCA cycle • Classes: • Superclass(es) • Subclass(es) • Instance(s) • A symbolic frame name (id, key) uniquely identifies each frame

  8. Frame IDs • Naming conventions for frame IDs • Uniqueness of frame IDs • Frame IDs must be unique within a PGDB • Goal: Same frame ID within different PGDBs should refer to the same biological entity • Because many frames are imported from MetaCyc, this helps ensure consistency of frame names • Frame IDs for newly created frames (not imported) are generated by Pathway Tools • Those frame IDs contain a PGDB-specific identifier • Example: CPLXzz-nnnn CPLXB3-0035

  9. Slots • Encode attributes/properties of a frame • Integer, real number, string, symbols • Represent relationships between frames • The value of a slot is the identifier of another frame • Every slot is described by a “slot frame” in a KB that defines meta information about that slot

  10. Slot Links Succinate + FAD = fumarate + FADH2 Enzymatic-reaction Succinate dehydrogenase Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhC sdhD sdhA sdhB TCA Cycle in-pathway reaction catalyzes component-of product

  11. Slots • Number of values • Single valued • Multivalued: sets, bags • Slot values • Any LISP object: Integer, real, string, symbol (frame name) • Slotunits define properties of slots: datatypes, classes, constraints • Two slots are inverses if they encode opposite relationships • Slot Product in class Genes • Slot Gene in class Polypeptides

  12. Representation of Function EC# Keq Succinate + FAD = fumarate + FADH2 Cofactors Inhibitors Enzymatic-reaction Molecular wt pI Succinate dehydrogenase Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2 sdhC sdhD sdhA sdhB TCA Cycle Left-end-position

  13. Monofunctional Monomer Pathway Reaction Enzymatic-reaction Monomer Gene

  14. Bifunctional Monomer Pathway Reaction Reaction Enzymatic-reaction Enzymatic-reaction Monomer Gene

  15. Monofunctional Multimer Pathway Reaction Enzymatic-reaction Multimer Monomer Monomer Monomer Monomer Gene Gene Gene Gene

  16. Pathway and Substrates Reactant-1 Pathway left in-pathway Reactant-2 Reaction Reaction Reaction Reaction Product-1 right Product-2

  17. Transcriptional Regulation trp Int005 apoTrpR Int001 TrpR*trp site001 pro001 Int003 RpoSig70 trpL trpLEDCBA trpE trpD trpC trpB trpA

  18. Principle Classes • Class names are capitalized, plural, separated by dashes • Genetic-Elements, with subclasses: • Chromosomes • Plasmids • Genes • Transcription-Units • RNAs • rRNAs, snRNAs, tRNAs, Charged-tRNAs • Proteins, with subclasses: • Polypeptides • Protein-Complexes

  19. Principle Classes • Reactions, with subclasses: • Transport-Reactions • Enzymatic-Reactions • Pathways • Compounds-And-Elements

  20. Slots in Multiple Classes • Common-Name • Synonyms • Comment • Citations • DB-Links

  21. Genes Slots • Component-Of (links to replicon, transcription unit) • Left-End-Position • Right-End-Position • Centisome-Position • Transcription-Direction • Product

  22. Proteins Slots • Molecular-Weight-Seq • Molecular-Weight-Exp • pI • Locations • Modified-Form • Unmodified-Form • Component-Of

  23. Polypeptides Slots • Gene

  24. Protein-Complexes Slots • Components

  25. Reactions Slots • EC-Number • Left, Right • DeltaG0 • Keq • Spontaneous?

  26. Enzymatic-Reactions Slots • Enzyme • Reaction • Activators • Inhibitors • Physiologically-Relevant • Cofactors • Prosthetic-Groups • Alternative-Substrates • Alternative-Cofactors

  27. Pathways Slots • Reaction-List • Predecessors • Primaries

  28. GKB Editor • Browse class hierarchy and slot definitions • Tools -> Ontology Browser • GKB Editor described at • http://www.ai.sri.com/~gkb/user-man.html

  29. Pathway Tools Data Access Mechanisms

  30. Introduction • MANY ways to access and update PGDBs • APIs in Java, Perl, and Lisp • Import/export of files in many formats • Registry of Pathway/Genome Databases • Import PGDB data into BioWarehouse • Updating a PGDB from an external genome DB

  31. Pathway Tools APIs • Support programmatic queries and updates to PGDBs • APIs in Java, Perl, and Lisp all provide access to a common set of procedures: • Generic Frame Protocol -- Ocelot object database API • Additional Pathway Tools functions • For more information see • http://bioinformatics.ai.sri.com/ptools/ptools-resources.html

  32. Generic Frame Protocol (GFP) • A library of procedures for accessing Ocelot DBs • GFP specification: • http://www.ai.sri.com/~gfp/spec/paper/paper.html • A small number of GFP functions are sufficient for most complex queries • Knowledge of Pathway Tools schema is critical for using the APIs: • Appendix I of Pathway Tools User’s Guide, Vol I

  33. Generic Frame Protocol • get-class-all-instances (Class) • Returns the instances of Class • Key Pathway Tools classes: • Genetic-Elements • Genes • Proteins • Polypeptides (a subclass of Proteins) • Protein-Complexes (a subclass of Proteins) • Pathways • Reactions • Compounds-And-Elements • Enzymatic-Reactions • Transcription-Units • Promoters • DNA-Binding-Sites

  34. Generic Frame Protocol • Notation Frame.Slot means a specified slot of a specified frame • get-slot-value(Frame Slot) • Returns first value of Frame.Slot • get-slot-values(Frame Slot) • Returns all values of Frame.Slot as a list • slot-has-value-p(Frame Slot) • Returns T if Frame.Slot has at least one value • member-slot-value-p(Frame Slot Value) • Returns T if Value is one of the values of Frame.Slot • print-frame(Frame) • Prints the contents of Frame • Note: Frame and Slot must be symbols!

  35. Generic Frame Protocol • coercible-to-frame-p (Thing) • Returns T if Thing is the name of a frame, or a frame object • save-kb • Saves the current KB

  36. Generic Frame Protocol –Update Operations • put-slot-value(Frame Slot Value) • Replace the current value(s) of Frame.Slot with Value • put-slot-values(Frame Slot Value-List) • Replace the current value(s) of Frame.Slot with Value-List, which must be a list of values • add-slot-value(Frame Slot Value) • Add Value to the current value(s) of Frame.Slot, if any • remove-slot-value(Frame Slot Value) • Remove Value from the current value(s) of Frame.slot • replace-slot-value(Frame Slot Old-Value New-Value) • In Frame.Slot, replace Old-Value with New-Value • remove-local-slot-values(Frame Slot) • Remove all of the values of Frame.Slot

  37. Additional Pathway Tools Functions –Semantic Inference Layer • Semantic inference layer defines built-in functions to compute commonly required relationships in a PGDB • http://bioinformatics.ai.sri.com/ptools/ptools-fns.html

  38. Internal note • Note: Refer to local copy of ptools-fns.html to go through the semantic inference layer fns

  39. File Import/Export Capabilities • PGDBs can be exported in whole or part to: • SBML – Systems Biology Markup Language – sbml.org • Import supported by many simulation packages • File -> Export -> Selected Reactions to SBML File • Pathway Tools Attribute-Value format and column-delimited format files • http://brg.ai.sri.com/ptools/flatfile-format.shtml • Dump entire PGDB to a suite of files: File -> Export -> Entire DB to Flat Files • Dump selected frames to a single file: File -> Export -> Selected Frames to File

  40. Import/Export • Import from attribute-value or column-delimited files • File -> Import -> Frames From File • Import/Export to/from internal Pathway Tools format that allows pathways, reactions, enzymes, and compounds to be easily moved between Pathway Tools installations • Edit -> Add Pathway to File Export List • File -> Export -> Selected Pathways to File • File -> Import -> Pathways from File • Import/Export to/from MDL molfile format • Edit -> Import compound structure from molfile • Edit -> Export compound structure to molfile

  41. Miscellaneous Exports • Overview -> Highlight -> Save to File • Overview -> Highlight -> Load from File • Gene / Protein Sequence / Save to file • Chromosome -> Show Sequence of a Segment of Replicon

  42. Napster Comes to Bioinformatics • Public sharing of Pathway/Genome Databases • PGDB registry maintained by SRI at URL http://biocyc.org/registry.html • Registry operations • List contents of registry • Download PGDBs listed in the registry • Register PGDBs you have created

  43. Registry Details • Why register your PGDB? • Declare existence of your PGDB in a central location • Facilitate download by other scientists • Why download a PGDB? • Desktop Navigator provides more functionality than Web • Comparative operations • Programmatic querying and processing of PGDB • Registration process • Registered PGDBs have open availability by default • Authors can provide their own license agreements • Registered PGDBs reside on authors’ FTP site

  44. BioWarehouse • Biospice.org

  45. New Import/Export Tools • Suggestions? • Volunteers?

  46. Updating a PGDB From anExternal Genome DB • Example: AraCyc forms a pathway module to the TAIR DB • TAIR is authoritative source for gene and gene-product information • Update AraCyc to reflect updates in TAIR

  47. Proposed Approach • Export TAIR to PathoLogic files • Build AraCyc2 from those PathoLogic files – automated PathoLogic only • Compare AraCyc1 (A1) to AraCyc2 (A2) A. Import new genes/proteins from A2 to A1 B. Delete from A1 genes/proteins not found in A2 C. Rename genes/proteins whose names changed from A2 to A1 • Run name matcher on A1’ • Check for pathways with no enzymes and report them so user can keep any that otherwise PathoLogic will delete • What about enzymes that were assigned to a pathway by the hole filler? • Re-run pathway predictor • Remember what pathways user deletes so they are not re-predicted by PathoLogic • Consider movement of genes from contig to chromosome

More Related