1 / 48

caarray.nci.nih/

caArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation. caArray overview & demo Mervi Heiskanen (15 min) caArray architecture Scott Gustafson (15 min) webCGH overview & demo David Hall (15 min). http://caarray.nci.nih.gov/.

reuben
Download Presentation

caarray.nci.nih/

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. caArray: Cancer Array InformaticsOpen Source Tools for Microarray Data Management, Analysis and Annotation caArray overview & demo Mervi Heiskanen (15 min) caArray architecture Scott Gustafson (15 min) webCGH overview & demo David Hall (15 min) http://caarray.nci.nih.gov/

  2. caArray Data Portal & Data Analysis Tools • Data Portal: Promotes data sharing, - submission of original, raw data files with associated experiment and sample information. • Data analysis and visualization tools: • webCGH (NCICB/RTI), XpressionWay (NCICB/SAIC) • caBIG tools: • caWorkbench - Columbia • DWD - UNC Lineberger • GenePattern - MIT/Broad ? • Magellan - UC San Francisco • VISDA – Georgetown • Cancer Molecular Pages – Burnham • Function Express – Wash U Siteman • GoMiner –NCI/CCR

  3. caArray version 1.0 • Key features: • MIAME 1.1 compliant data annotation forms • Support for Affymetrix and GenePix native files • MAGE-ML import and export • controlled vocabularies (MGED ontology) • access to data via MAGE-OM API • caArray installations: • NCICB caArray instance supports NCI funded programs. • Local installations at the cancer centers: caBIG funded caArray adopters (Lombardi, Wistar, NYU)

  4. caArray listservs: • caArray developers • caArray users • caArray team

  5. caArray: Compliance with Standardization Efforts • MIAME • Minimum Information About a Microarray Experiment • 1.1 Draft 6 (April 1, 2002) • http://www.mged.org/Workgroups/MIAME/miame_1.1.html • MAGE-ML • MicroArray and GeneExpression Object Model and Markup Language • 1.1 (October 2003) • http://www.omg.org/docs/formal/03-10-01.pdf • MGED Ontology • Microarray Gene Expression Data Ontology • 1.1.8 (April 2004) • http://mged.sourceforge.net/ontologies/MGEDontology.php caBIG compatibility guidelines http://cabig.nci.nih.gov/guidelines_documentation/caBIG_Compatibility_Document

  6. class TechnologyType • namespace: • http://mged.sourceforge.net/ontologies/MGEDOntology.daml# • documentation: • The technology type or platform of the reporters on the array. • type: • primitive • superclasses: • ArrayDesignPackage • used in classes: • FeatureGroup • used in individuals: • in_situ_oligo_featuresspotted_antibody_featuresspotted_colony_featuresspotted_ds_DNA_featuresspotted_protein_featuresspotted_ss_oligo_features • class CellLineDatabase • namespace: • http://mged.sourceforge.net/ontologies/MGEDOntology.daml# • documentation: • Database of cell line information. • type: • primitive • superclasses: • Database • used in classes: • CellLine • used in individuals: • ATCC_CulturesCABRI_Human_and_Animal_Cell_lines

  7. caArray Phase 2 • caArray 1.2 (June 2005) • Support for additional file formats via a software toolkit • Public search without login • Copy bio sample information • caArray 1.5 (September 2005) • XpressionWay, pathway visualization tool • Integration with caDSR 3.0 • caArray 1.7 (December 2005) • Store filtered and normalized data • User management user interface • caArray 2.0 (March 2006) • Embedded MAGE-ML validation All releases: Defect fixes and usability enhancements

  8. Acknowledgements • NCICB/SAIC • Development team: • Hangjiong Chen • Scott Gustafson • Juergen Lorenz • John Moy • Sumeet Muju • Beth Neuberger • Phu Tran • Jim Zhou • QA: • Durga Addepalli • Andrew Shinohara • Ye Wu • NCICB/TerpSys • Don Swan, Jamie Keller • Research Triangle Institute • David Hall (webCGH) NCICB Sue Dubman, Mervi Heiskanen, Xioapeng Bian, Subha Madhavan, Carl Schaefer, Gilberto Fragoso, Denise Warzel… and Ken Buetow

  9. caARRAY’s Architecture Credits to Sumeet Muju Phu Tran

  10. caArray Architecture TOMCAT WEB EJB CONTAINER CONTAINER caCORE ------------ VOCAB VOCAB caBIO MGR EJB INTERFACE caDSR EVS SECURITY SECURITY MGR EJB OBJECTS SERVLET DATA S T PROTOCOL TRANSFER U BROWSER MGR EJB R OBJECT T SECURITY S (DTO) JSP DB OBJECT EXPERIMENT RELATIONAL MAGE MGR EJB BRIDGE MANAGER (OJB) ) MAGE-ML Experiment and ArrayDesign S OTHER T C K MGR EJB E T S J caARRAY - B E O DB G E A G M A M ( MAGE-ML NATIVE DATA IMPORTER MDB FTP APPLET FTP STAGING AREA FILE NETCDF API FILE UPLOADER FILE SHARE MDB NETCDF API MAGE-OM API MAGE-OM MAGE-OM JAR OBJECTS RMI MGR MAGE-OM PERSISTENCE

  11. caArray Interfaces: caArray EJB API • caArrayEJB API: Provides transaction control, asynchronous processes,service location, common security and distributed capabilities for submission and retrieval of Microarray Experiments. • The caArray presentation layer utilizes the above functionality via the caArrayEJB API. • Data Transfer Objects (DTOs) utilized to transfer data between calling application and the EJBs. • APIs can be used for federated access and submission of transaction data.

  12. caArray Interfaces: Mage-OM API • MAGE-OM API :Provides fine grain search and retrieval of all caArray data via a caBIO-like RMI based API. • The MAGE-OM API maps the MAGE objects to the new caArray database schema. • RMI Security module incorporated for user/group level data access. • NetCDF API logic incorporated for faster retrieval of data • Built to be grid enabled

  13. caArray Middleware • Data Representation • Data Transfer Objects (DTO) • MicroArray Gene Expression Software Toolkit (MAGE-stk) • DTO - MAGE-stk Conversion • Data Persistence • Data Access Layer • ObJectRelationalBridge (OJB) • OJB Abstraction Layer and Data Access Objects (DAO) • EJB Layer • Stateless Session Façade • Bean-managed Persistence • NETCDF Files • Large Data Set • Fast Binary Access • MAGE-ML Import and Export • Message-Driven Beans

  14. MAGE-ML Import and Export: An Example <MAGE-ML identifier="gov.nih.nci.ncicb.caarray:MAGEML:123:1"> <AuditAndSecurity_package> <Contact_assnlist> <Person identifier="gov.nih.nci.ncicb.caarray:Person:456:1" lastName="Doe" firstName="John"> </Person> <Contact_assnlist> </AuditAndSecurity_package> <Experiment_package> <Experiment_assnlist> <Experiment identifier="gov.nih.nci.ncicb.caarray:Experiment:789:1" name=“Sample Experiment"> <Descriptions_assnlist> <Description text="This is a sample experiment."></Description> </Descriptions_assnlist> <Providers_assnreflist> <Person_ref identifier="gov.nih.nci.ncicb.caarray:Person:456:1"/> </Providers_assnreflist> </Experiment> </Experiment_assnlist> </Experiment_package> </MAGE-ML> Identifiable element Referenced Identifiable element to be resolved

  15. MAGE-ML Import and Export • Modified from the MAGE-stk’s MAGE-ML SAX-based parser to include a persistence mechanism to insert, update and resolve (look up) parsed objects • Any valid MAGE-ML can be imported. MAGE-ML is assumed valid. Validation is typically done using ArrayExpress’s MAGEValidator • Identifiable objects are first resolved from database by matching their identifier, and if resolved the in-coming object is updated against the existing one • Identifier represents the globally unique key of a MAGE object across domains for its entire lifecycle • Identifier is separate from persisted MAGE-stk object’s primary key which is only internal to caARRAY

  16. MAGE-ML Export • The entire object graph of an object, e.g., ArrayDesign, Experiment, is traversed to collect all Identifiable objects • The MAGE-stk’s MAGEJava object is utilized to contain all the Identifiable objects collected • When an Identifiable object is encountered, the appropriate method in the MAGEJava object is discovered and invoked using reflection to store the object into it • Ultimately MAGEJava.writeMAGEML(Writer) is invoked to recursively invoke the same method of all the contained Identifiable objects. • Xerces’s XMLSerializer pretty-formats the XML content as it is being written with appropriate new lines and indentations

  17. A caArray Configuration caArray 1 caWorkbench caBIO caArray caDSR / EVS schema Security caARRAY EJB MAGE-OM API JAVA GRID MAGE-ML APP (future) caARRAY EJB MAGE-OM API NCICB Security caDSR / EVS caArray schema caWorkbench caBIO NCICB

  18. webCGHA web application for the visualization and analysis of array-based CGH and gene expression data David Hall, Ph.D. Research Triangle Institute

  19. arrayCGH

  20. webCGH Functions • Visualization of copy number and gene expression levels • Interrogation of genome features • Data normalization and analysis • Virtual experiments

  21. Whole-genome View

  22. Ideograms

  23. Chromosome 17

  24. Chromosome 17

  25. Zoom

  26. Annotated Genes

  27. Gene List

  28. Gene Watch

  29. Data Flow Database Database Adaptor Adaptor Transformer Op Op Op Op X Analytical Pipeline Cache Plot Generator

  30. Analytical Pipelines

  31. Architecture

  32. Key Design Features

  33. Key Design Features

  34. Past, Present, Future • Dec. 2003 – Version 1.0 • Basic plots, analytics, GEDP • March 2005 – Version 2.0 • More plots, analytics, caArray • Late April 2005 – Version 2.1 • Mouse/human plots • CGH/gene expression • SKY/M-FISH&CGH integration

  35. webCGH Team • NCICB • Mervi Heiskanen • RTI • David Hall • Vesselina Bakalov • Ying Chen • Matt Westlake • Bing Liu • Laxminarayana Ganapathi • Sheping Li • Stuart Allen

More Related