EGAN tutorial: Loading experiment results. October, 2009 Jesse Paquette UCSF Helen Diller Family Comprehensive Cancer Center [email protected] Preamble. This document has many slides with multi-step animations Best viewed in Slide Show mode
EGAN tutorial:Loading experiment results
UCSF Helen Diller Family Comprehensive Cancer Center
EGAN is designed to help you interpret the results of exploratory assays
EGAN does not actually do the multivariate statistical analysis for your experiment
It picks up where many useful analysis programs stop: at the gene list
If the entities measured in an assay can be mapped to genes, the results can be loaded in EGAN
MS/MS peptide identifications
Genome-wide SNP/CNV assays
DNA methylation assays
EGAN works best when you load results for all entities measured in the assay
i.e., don’t apply a p-value cutoff on the results before loading into EGAN
Just because a gene missed the cutoff at p < 0.001, there’s still a good chance that it is a significant hit
Especially if it is related to other top hit genes
EGAN will allow you to adjust the statistic/p-value cutoff dynamically
Then you can directly observe how networks/enrichment scores change with different cutoff values
Of course you can still load post-cutoff experiment results
If that’s all you have…
Loading experiment results into EGAN
Easy to create in Excel from existing result files
Header line required
Header of statistic (second) column will become the experiment name in EGAN
1) Entity ID
i.e. probe set ID, UniProt ID, refSNP ID, etc.
You can use any IDs that can be mapped to Entrez Gene IDs
EGAN provides a wide variety of mapping file options
HUGO Gene Symbol, AffymetrixAgilent/Illumina IDs, GenBank, Ensembl, UniProt, etc.
EGAN expects that all entity IDs are the same type
2) Statistic (fold-change, regression coefficient, log-odds ratio, etc.)
EGAN visualization schemes are best when the statistic column is centered around 0
Ratio and fold-change data can be 0-centered by logarithm
3) P-value (unadjusted, adjusted or q-value)
Header line: the statistic (second) column header should be descriptive
Each row represents the analysis result for one entity in the experiment
Three columns: ID, statistic, p-value
Save as tab-delimited text
This experiment result file uses Affymetrix HG-U133A probe set identifiers.
Select “Affymetrix HG-U133A” from the drop-down menu.
Enrichment calculations in EGAN are dependent on how we define the background population of genes. In this case we only want genes to be in the background if they are present in all experiment results.
Select “intersection” from the drop-down list.
For simplicity’s sake, we’re not going to cover items 3-5 right now.
Click “Add Experiment”.
Select your experiment and click “Specify empirical data set”
Select the aCGH experiment and click “Specify empirical data set”
Now both experiment results are ready to be loaded.
There’s one more thing to consider before launching EGAN...click on “5) Gene Nodes”.
Select the mapping file and click “Specify mapping file”
For the aCGH clones we have a custom mapping file.
We want to load a new experiment
The expression results are ready to be loaded. Let’s load the aCGH results.
Click “New Data Set”.
Click “Add Experiment”
Click on “6) Experiments”
Finally, click “Finish – Launch EGAN”
Whenever you change the network configuration by adding or removing files, you will be given the option to save the new configuration to a tab-delimited text file.
If you choose to save a .config file, next time you will only need to specify that file (item 3 in the Launch EGAN Wizard).
Your experiments are now accessible in EGAN: as columns in the Entrez GeneNode Table and as rows in the Experiments Table.