210 likes | 350 Views
Analyzing extensive gene lists can be overwhelming, especially with limited resources. This guide focuses on prioritizing candidate genes that are promising based on related processes, known protein features, and high-throughput experiment overlaps. We delve into leveraging GEO datasets for expression queries and obtaining information through BioMart for homologs and protein domains. By utilizing GEO profiles and filtering options within BioMart, researchers can effectively identify and retrieve vital data about their genes of interest, facilitating informed follow-up studies.
E N D
Working with gene lists:Finding data using GEO& BioMart June 5, 2014
Analyzing a gene list • With hundreds of genes but a limited budget and lab personnel, you need to prioritize the gene list to candidate genes for follow-up • Pick ones that are “interesting” • Known to be involved in other related processes but not (yet) in your process of interest • Has protein features which suggest a function in your process, but it has not been characterized • No known function or domain, but it shows up in other, related high-throughput experiments suggesting a key role in your process of interest
Our approach Analyzing gene lists by: • Finding overlap with other high-throughput experiments • Finding additional information using BioMart • Mouse/human homologs • Protein domain content • GO classification
GEO (gene expression omnibus) • GEO Datasets • Curated gene expression datasets • i.e. there is backlog of experiments that haven’t made it into the database • Can search for experiments and conduct differential gene expression queries on some datasets • Can download datasets & do offline analyses • GEO Profiles • Profiles of expression data for genes
Why search GEO? • What other experiments have been done that are similar to yours? • GEO datasets • How do my genes of interest behave in other large scale experiments • GEO profiles
GEO Profile search Search on a gene name (C04F5.7):
GEO Dataset search “C. elegans”: 4434
Once dataset identified • Download data • SOFT format: tab-delimited data • Issues: • Not necessarily processed such that they have the ratios of experiment/control • If starting with raw data, may not be able to replicate exactly what authors did or lack expertise/software to generate a list of DE genes • Look for supplementary data from publication • Usually they provide a list of all DE genes
Choice of dataset for comparison In class demo
Biomart – EBI Ensembl • Use series of menus • Data source – organism (genes, variation, ect) • Filters -- reduce the number of results • Attributes – what data to return • Can set up very precise and multilayered queries • Can query across multiple organisms • Simple query: • Given a list of gene IDs, you can obtain attributes or sequences for the entire list • Tools • ID converter – very useful, easy to use
Two sites for BioMart access www.biomart.org
Biomart • Filters • C. elegans genes with a human homolog • Specify only genes with >= # isoforms • protein coding genes with a transmembrane domain • Attributes • Entrez Gene IDs, WormBase IDs, Affy IDs • Sequence data • transcript, protein, UTRs, flanking regions, ect.
BioMart • In class demo
Today’s exercise • Compare current dataset from PLoS Pathogens paper to data from a different dataset • Identify & retrieve additional information about C. elegans genes using BioMart