1 / 19

Outline

Outline. Standardization - necessary components what information should be exchanged how the information should be exchanged common terms (ontologies) common ways of describing data processing how to query information ArrayExpress public repository for microarray data

joshua
Download Presentation

Outline

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline • Standardization - necessary components • what information should be exchanged • how the information should be exchanged • common terms (ontologies) • common ways of describing data processing • how to query information • ArrayExpress • public repository for microarray data • www.ebi.ac.uk/arrayexpress

  2. What information should be exchanged? • MIAME - Minimum Information About a Microarray Experiment • informal specification • paper published in Nature Genetics • goal - to initiate discussion: • which details are important and which may not be

  3. Ultimate dream Pointers to a well-establishedsample ontology Minimuminformationis the following table: Samples Genes Gene expression levels (in mRNAcounts/cell) Pointers to (a) well-establishedgene database(s)

  4. Currently: MIAME six parts 1. Experimental design: the set of the hybridisation experiments as a whole 2. Array design: each array used and each element (spot) on the array 3. Samples: samples used, the extract preparation and labeling 4. Hybridizations: procedures and parameters 5. Measurements: images, quantitation, specifications 6. Controls: types, values, specifications

  5. Create account Login Pending/New Experiment En En En En E1 E1 E1 E1 E2 E2 E2 E2 Samplen Sample1 Sample2 Sample3 Sample protocol Extracts 1…n Extracts 1…n Extracts 1…n Extracts 1…n Extraction protocol Hyb protocol Hybridisations Array1 Array2 Array3 Arrayn Scanning protocol Data1 Data2 Data3 Datan Image analysis protocol Transformation protocol Combined Experiment Data Submit Final free text comment MIAMExpresssubmission procedurehttp://www.ebi.ac.uk/miamexpress MAGE-ML

  6. How the information should be exchanged? • MAGE OM- MicroArray Gene Expression Object Model • formal specification - UML (Unified Modeling Language) model • described by a set of diagrams • standardized through Object Management Group • describes the domain of microarray data • can serve as a source for generating various software artifacts

  7. MAGE - brief history • August 1997 - Life Sciences Research group formed within the Object Management Group • March 2000 - gene expression RFP issued • December 2000 - initial submissions of proposals for gene expression data standards: • EBI (on behalf of MGED) - MAML • Rosetta (on behalf of GEML community) - GEML + some IDLs • NetGenics - IDLs

  8. MAGE - brief history (2) • Decision to proceed with a joint submission • Decision to base the standard on UML • Submitters’ meetings throughout 2001 • End of January 2002 - MAGE becomes an adopted specification • October 2002 - MAGE becomes an available specification • MAGE-ML - XML language - automatically derived from MAGE • (More than) MIAME-compliant; only subset can be used

  9. MAGE – an example diagram

  10. Use case of MAGE:ArrayExpress architecture MAGE-OM MAGE-ML (DTD) ArrayExpress (Oracle) data loader Tomcat object/ relational mapping Castor MAGE-ML (doc) MAGE-ML (doc) Java servlets MAGE-ML (doc) Velocity template engine MIAMEexpress Web page template Web page template Browser

  11. ArrayExpress Infrastructure Local MIAMExpress Installations EBI Submissions www MIAMExpress (MySQL) Array Manufacturers MAGE-ML Queries Datapipelines ArrayExpress (Oracle) www MAGE-ML LIMS Data analysis www Data Analysis software Expression Profiler MAGE-ML import, export Microarray software External Bioinformatics databases Other Microarray databases

  12. Common terms (ontologies) • What is an ontology? • formal model of some domain • simplest ontologies – controlled vocabularies • hierarchical, other relations, constraints, … • MGED Ontology • maintained by Chris Stoeckert, UPenn • enables: • unambiguous annotation • therefore, queries • currently sample description • experiment design description to come • multiple formats: RDFS, DAML+OIL

  13. Ontologies and ArrayExpress • Curation team • lead by Helen Parkinson • currently 5 curators • Curation tool under development • management of all relevant ontologies “under one roof” • support in distributed ontology development • submission tracking • accession numbers • ...

  14. Common ways of describing data processing • no “deliverables” yet • MAGE can describe data processing • just syntax, too much free text • Laboratory Activity Broker process within OMG - common points? • problem: • it is possible to come up with a universal framework that can describe all possible scenarios of data processing • however, how will it be used in real life?

  15. workflow in data type process type parameters out workflow enactment in data process instance parameter values out data filtering clustering pattern discovery visualization ...

  16. Benefits • compile “best practices” of data analysis • document what has been done to obtain final results • enable “high-throughput” data analysis work

  17. How to query information • again no “deliverables” • initial plan - MAGE will include query support • all methods were dropped - a data model • ArrayExpress - 2 large components: • repository - retrieve experiments as units, MAGE-based • warehouse - gene & data- oriented queries, work across experiments • G2G (Jason Stewart) - protocol + query language for distributed queries

  18. ratio absolute change confidence measure design element type name array design name platform type provider sample type species exper. type performer lab Properties Properties bioassay type Properties Properties Properties

  19. Summary

More Related