90 likes | 220 Views
ODD-Genes serves as a demonstrator showcasing how Grid technologies enhance e-Science, accelerating scientific discovery. It automates routine statistical conditioning of highly variable microarray results, facilitating effective data analysis for research projects such as the study of Wilms Tumor. By utilizing SunDCG’s TOG software and OGSA-DAI for data access and discovery, ODD-Genes enables researchers to simultaneously explore multiple data views, uncovering novel targets for investigation and potential therapies. The collaboration includes institutions like NeSC, EPCC, and GTI in Edinburgh.
E N D
ODD-Genes:Accelerating data-drivenscientific discovery NeSC Review 2003 NeSC 2003-09-30
Introduction • ODD-Genes Background • Science enabled by ODD-Genes • Automating routine statistical conditioning of highly variable microarray results. • Discovering related data sources • Querying discovered data sources for relevant data • Identifying significant targets for focussed investigation • Caveats & further work
ODD-Genes Background • ODD-Genes is a demonstrator • Demonstrates how Grid technologies enable e-Science, accelerating scientific discovery • SunDCG’s TOG software allows for job submission on remote compute resources • OGSA-DAI provides access, control and discovery of data resources • ODD-Genes used to investigate Wilms Tumour • Routine statistical conditioning of microarray results • Data-driven discovery of novel targets for investigation and potential therapy • Collaborative project • NeSC/EPCC, Edinburgh, UK • Scottish Centre for Genomic Technology and Informatics, Edinburgh, UK (GTI) • Human Genetics Unit at MRC, Western General Hospital, Edinburgh, UK (HGU)
SunDCG – Enabling Routine Statistical Conditioning • Choose analysis to perform • Automates analysis process • Provides predetermined workflow • Can run more than one analysis at a time • Multiple reproducible avenues for investigation • Reduces cost (human, machine), increases availability • TOG enables this by allowing access to HPC resources
SunDCG - Conditioning Results • Results of conditioning can be analysed and investigated • Researcher has potentially several views of data to explore, all presented simultaneously in parallel (cp traditional serialised, manual process) • Researcher can reproduce this initial condition for repeated analyses • Researcher need not perform each step manually and serially, or ask dedicated statistician to do so.
OGSA-DAI - Results Investigation • Multiple views of data • Raw • Heat Map • Cluster Map • Wilms Tumour study takes a new direction • two genes appear significant in early development • Researchers would like more info on these genes…
OGSA-DAI - Data Resource Discovery • OGSA-DAI uses keywords to locate relevant data resources • May return data resources previously unknown to researcher • Researcher selects most interesting data resource to query for information about gene • Researcher selects Mouse atlas – narrow, deep database of spatial gene expression in mice embryonic development • Contrast with GTI database of broad, shallow genome-wide gene expression across multiple organisms, stages & conditions
OGSA-DAI - Data Resource Query • OGSA-DAI returns data from query • Data and annotation displayed • Data contains references to related images • Researcher rapidly moves from numeric and textual description to spatial representation of relevant gene expression • These show that the genes are stem cell markers • Targets for focussed investigation, potential therapy
ODD-Genes Caveats & Further Work • ODD-Genes is a demonstrator • Need to develop production applications for both routine statistical processing and data resource discovery and query • Need to parameterise routine conditioning appropriately to complete automation • ODD-Genes requires GRID infrastructure • Participating researchers need to partner with centres who host application front-ends (or, host the infrastructure themselves) • However, alternatives often proprietary, expensive, less flexible • ODD-Genes requires registration by data-hosts • Critical mass of registered data sources.