290 likes | 419 Views
This guide covers the essential tools and methods for editing and curating pathway/genome databases, focusing on the importance of accurate data curation and the effective use of Instant Patch tools. It outlines the functionalities of various editors, including Compound, Reaction, and Pathway Editors, along with their operations. Key practices for saving changes, managing database connections, and creating links to external databases are discussed. The guide emphasizes maintaining data integrity and offers practical exercises on the Hb. pylori database.
E N D
Update your computers! • To install a patch: Tools => Instant Patch => Download and Activate All Patches
Editing Pathway/Genome Databases Ron Caspi Part I: Compounds, Reactions and Pathways
Why Curation is Important! • Database curation greatly enhances the usefulness of the data • “in silico” information less solid than experimental evidence
Pathway Tools Paradigms • Separate database from user interface • Navigator provides one interface to the DB • Editors provide an alternative interface to the DB • Reuse information whenever possible! • A PGDB should not describe the same biological or chemical entity more than once • Compounds are the building blocks of reactions • Reactions are the building blocks of pathways
List of Editors • Compound Editor • Compound Structure Editor • Reaction Editor • Pathway Editor • Synonym Editor • Protein Editor • Gene Editor • Intron Editor (Eukaryotes only) • Transcription Unit Editor • Publication Editor • Frame Editor • Relationships Editor • Ontology Editor
Invoking the Editors Use the “New” command Or: Right-Click on an Object Handle
Saving Changes • The user must save changes explicitly with Save DB • To discard changes made since last save • File => Revert Current DB
The File Menu: DB commands • List Unsaved Changes in Current DB • Revert Current DB • Refresh All Current DBs • Checkpoint Current DB • Revert to Checkpoint in Current DB • Delete a DB • Save Current DB • Attempt to Reconnect to Oracle
Editing rules: Support Policy • Do not alter DB schema • e.g. do not add or remove classes or slots • Do not modify the EcoCyc or MetaCyc datasets
Compound Editor • Create or edit a compound • Invoke by: New: Compound => New Existing: Right-Click compound name, select Compound Editor • Common name and synonyms • links to other DBs
More Compound Editing • Compound Structure Editor • Mol files • Exporting to other DBs • Merging
Reaction Editor • Create or edit a reaction • Invoke by: New: Reaction => New Existing: Right-Click reaction name, select Reaction Editor • Entering Reaction Equation • Compound Resolver
Pathway Editor • Graphically create and modify pathways • Two tools: • Connections Editor: to add reactions, remove reactions, alter connections • Segment Editor: to enter a linear pathway segment(s) • Invoking the pathway editor: New: Pathway => New Existing: Right-Click pathway name, select Pathway Editor command
Connections Editor Operations • Two main display panes: • left: unconnected pathway reactions • right: draws connected reactions (looks like the regular Pathway Tools window) • Connecting reactions: • select initial reaction (in either pane) ===> red and green reactions • select a green reaction • Additional Commands: • Exit: keep changes, abort changes • Reaction: add reaction, add reaction(s) from history, create new reaction frame, clone a reaction frame, add connection, delete predecessor/successor link, disconnect reaction, delete reaction from pathway, choose main compounds for reaction, edit reaction frame • Pathways: enter a linear pathway segment, guess pathway predecessor list, disconnect all reactions, invoke relationships editor, add subpathway by name, add subpathway by substring, add subpathway by class, delete subpathway
Connections Editor Limitations • Ambiguity in some complicated situations on ordering: • link may be ignored • dialog box for disambiguating • pathway drawn in bizarre arrangement • Fix: • try removing offending link and add links in different order • Pathway editor does not handle polymerization pathways • In circular pathways, Pathway editor does not permit specification which compound should be at the top
Pathway Segment Editor • To enter linear sequence of reactions (arguably) faster than with the Connections Editor • Reactions are specified by EC numbers or reaction substrates • One segment may contain up to 7 reactions
Creating Links with External Databases • Creating links from a pathway/genome db to an external database • To define a new external database: • Tools => Ontology Browser • View => Browse from new root / type Databases • Highlight Databases • Frame => Create => Instance • Enter frame name, frame edit • Enter Common Name, Static-Search-URL e.g. http:/gene.pharma.com/dbquery? • Creating links to a pathway/genome db see http://biocyc.org/linking.shtml
Make sure that… You perform all exercises on the Hb. pylori database, not on your own!!!
Creating New Reactions Create the following five reactions: • ascorbate + H2O = 3-keto-L-gulonate • 3-keto-L-gulonate + ATP = 3-keto-L-gulonate-6-phosphate + ADP • 3-keto-L-gulonate-6-phosphate = L-xylulose-5-phosphate + CO2 • L-xylulose-5-phosphate = L-ribulose-5-phosphate* • L-ribulose-5-phosphate = xylulose-5-phosphate
Define a New Pathway • Define the pathway L-ascorbate degradation to xylulose-5-phosphate by connecting the reactions together • Assign class: (Pathways -> Degradation/Utilization/Assimilation -> Carboxylates, Other) • Add a link to non-oxidative branch of the pentose phosphate pathway (Generation of precursor metabolites and energy => Pentose phosphate pathways =>) • Add a reverse link from non-oxidative branch of the pentose phosphate pathway to the new pathway
Pathway Curation • Class • Common Name • Synonyms • Evidence code • Citations • Comments • Links • Hypothetical reactions
Evidence Codes for Pathways • http://brg.ai.sri.com/ptools/evidence-ontology.html • EV-AS: Author statement • NAS – non-traceable • TAS - traceable • EV-COMP: Inferred from computation • AINF - Artificial inference • HINF - Human inference • EV-EXP: Inferred from experiment • IDA - inferred from direct assay • IEP - inferred from expression pattern • IGI - inferred from genetic interaction • IMP- inferred from mutant phenotype • IPI - inferred from physical interaction • EV-IC: Inferred by curator
Super Pathways • Create more complex metabolic networks using superpathways • Example: superpathway of alanine biosynthesis composed of alanine biosynthesis I alanine biosynthesis II alanine biosynthesis III
Pathway Export • Export • Edit => Add Pathway to File Export List • File => Export => Selected Pathways to File
Constraint Checking • General rules that constrain the valid relationships among instances • Constraints are checked when new facts are asserted to assure that the DB remains logically consistent • Constraints on slots: • Domain violation checks to make sure the slots are in instances of the appropriate class • Range violation : • value type • value cardinality • Inverse • Cardinality • Lisp-predicate
Consistency Checking (correctify-kb) • Removes newlines from names • Converts “<“ to “|” in string citations • Checks isozyme sequence similarity • Fixes references from polypeptides to genes • Changes compound names to ids in a variety of slots • Matches physiological regulators to other regulators • Cross-references compounds to reactions • Checks pathways predecessors/reactions/subs • Checks reaction balancing • Checks compound structures • Calculates sub- and super-pathways • Finds missing sub-pathways links • Verifies chromosome components and positions
Run (correctify-kb) • Open the database Hb. pylori (HypCyc) • Run (correctify-kb) • Analyze output