1 / 21

EGAN Tutorial: Loading Network Data

EGAN Tutorial: Loading Network Data. October, 2009 Jesse Paquette UCSF Helen Diller Family Comprehensive Cancer Center jesse.paquette@cc.ucsf.edu. Preamble. This document has many slides with multi-step animations Best viewed in Slide Show mode

chaney
Download Presentation

EGAN Tutorial: Loading Network Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EGAN Tutorial:Loading Network Data October, 2009 Jesse Paquette UCSF Helen Diller Family Comprehensive Cancer Center jesse.paquette@cc.ucsf.edu

  2. Preamble • This document has many slides with multi-step animations • Best viewed in Slide Show mode • The EGAN graphical user interface is evolving • Icons may change • Menus may change • Button/widget placement may change • This document probably won’t change as quickly • Please contact the developers if you notice major discrepancies between this and EGAN

  3. Loading network data: An overview The EGAN pre-collated network represents only a fraction of available data Additional data can be loaded as Gene sets/association nodes Pathways, annotation terms, articles, transcription factor targets, miRNA targets, conserved domains, significant gene sets/clusters from experiments, etc. Gene-gene edges Protein-protein interactions, literature co-occurrence, expression correlation, sequence homology, transcription factor targets, kinase targets, etc. This document will outline the steps for loading additional gene sets and gene-gene edges into EGAN

  4. Loading gene sets into EGAN

  5. Loading gene sets into EGAN:Gene set file formats • Two possible tab-delimited text formats • GMT • All default pre-collated gene sets in EGAN are all specified via GMT files • Each row represents a different gene set • GMX • Transposed GMT • Each column represents a different gene set • First two columns of GMT (or rows for GMX) specify • Gene set ID (first column) • Can potentially be used to link out to the gene set’s web page via URL • Gene set name (second column) • Can be empty or same as the ID • Subsequent columns list the genes in each set • Gene identifiers must be mappable to Entrez Gene IDs • EGAN provides a wide variety of mapping file options • Entrez Gene ID, HUGO Gene Symbol, assay-specific IDs, Ensembl, GenBank, UniProt, etc. • EGAN expects that all entity IDs are the same type for each file

  6. Loading gene sets into EGAN: An example Each row is a gene set Later columns: gene identifiers First column: gene set IDs Second column: gene set names

  7. Loading gene sets into EGAN: An example Save as tab-delimited text

  8. Loading gene sets into EGAN: An example • Download or construct a gene set file • This example will use c2.cgp.v2.5.symbols.gmt from MSigDB(download this file to follow along) • You’ll have to log-in with your email address to download MSigDB gene sets • Launch EGAN H. sapiens

  9. Loading gene sets into EGAN: An example Click “Browse…” Now specify that these gene sets are of type “MSigDB C2: chemical and genetic perturbations” by selecting that option from the drop-down menu. This MSigDB type has been pre-defined for EGAN, which is why it exists in this menu. Shown are the default pre-collated gene sets. We want to load a new one. This GMT file uses Gene Symbols for gene identifiers. Select “HUGO Gene Symbol” from the drop-down menu. Select your GMT file and click “Specify gene association set”. Click on “7) Association Data” When you are finished loading data, click “Finish – Launch EGAN”. Finally, click “Add Set”

  10. Loading gene sets into EGAN: An example Whenever you change the network configuration by adding or removing files, you will be given the option to save the new configuration to a tab-delimited text file. If you choose to save a .config file, next time you will only need to specify that file (item 3 in the Launch EGAN Wizard).

  11. Loading gene sets into EGAN: An example When EGAN finishes loading, your new set(s) will be available for exploration

  12. Loading gene-gene edges into EGAN

  13. Loading gene-gene edges into EGAN:File formats • Two possible tab-delimited text formats • SIF (Simple Interaction File) format commonly used in Cytoscape • .sif extension (required in EGAN) • Each line represents a gene-gene relationship • Three columns • First column is first gene • Middle column is ignored in EGAN • Third column is second gene • EGAN interaction file format • .txt file extension • Three columns, like SIF • Middle column is a PubMed ID • Gene identifiers must be mappable to Entrez Gene IDs • EGAN provides a wide variety of mapping file options • Entrez Gene ID, HUGO Gene Symbol, assay-specific IDs, Ensembl, GenBank, UniProt, etc. • EGAN expects that all entity IDs are the same type for each file

  14. Loading gene-gene edges into EGAN: An example Each row is a gene-gene relationship Third column: second gene First column: first gene

  15. Loading gene-gene edges into EGAN: An example Save as tab-delimited text

  16. Loading gene-gene edges into EGAN: An example • Download or construct a gene-gene edge file • This example will use HPN.sif, a set of kinase-target relationships available in the “.sif Gzip-ed files” link at NetworKIN (download this file to follow along) • You’ll have to accept the NetworKIN license in order to download data • Launch EGAN H. sapiens

  17. Loading gene-gene edges into EGAN: An example Click “Browse…” Now specify that these gene sets are of type “NetworKIN” by selecting that option from the drop-down menu. The NetworKIN type has been pre-defined for EGAN, which is why it exists in this menu. Shown are the default pre-collated gene-gene edge files. We want to load a new one. This SIF file uses Gene Symbols for gene identifiers. Select “HUGO Gene Symbol” from the drop-down menu. Select your SIF (or EGAN .txt) file and click “Specify gene-gene edge set” Click on “8) Gene Relationship Edges” When you are finished loading data, click “Finish – Launch EGAN” Finally, click “Add Set”

  18. Loading gene-gene edges into EGAN: An example Whenever you change the network configuration by adding or removing files, you will be given the option to save the new configuration to a tab-delimited text file. If you choose to save a .config file, next time you will only need to specify that file (item 3 in the Launch EGAN Wizard).

  19. Loading gene-gene edges into EGAN: An example When EGAN finishes loading, your new gene-gene edges will be available for exploration

  20. Loading network data: Tips and hints • Both the MSigDB and NetworKIN types were pre-defined in EGAN • This may not be the case for your new data • You can use the “Custom Node/Custom Edge” types as a default • You can specify your own type definitions in a Type Definition file • Give your added nodes and edges distinct colors and links • See item 4 in the Launch EGAN Wizard • Use this type definition file as a template – just add the appropriate lines for your new types • You can specify gene set, gene-gene edge and mapping files via URL (or .jar file, but that’s tricky) • Just type or paste the URL into the appropriate text field instead of clicking “Browse…” • Potential issues to consider • Identifiers used in your gene set/gene-gene edge file might not be found in the mapping file • Genes in your mapping file might not be present in the network • These issues are written (rather crudely) to the Log • Inspect the log file if you notice unexpected behavior

  21. Questions/comments? • Visit http://groups.google.com/group/ucsf-egan for downloads, documentation and discussion • Requires an account with Google Groups

More Related