CaArray: Cancer Array Informatics
Download
1 / 48

caarray.nci.nih/ - PowerPoint PPT Presentation


  • 146 Views
  • Uploaded on

caArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation. caArray overview & demo Mervi Heiskanen (15 min) caArray architecture Scott Gustafson (15 min) webCGH overview & demo David Hall (15 min). http://caarray.nci.nih.gov/.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' caarray.nci.nih/' - reuben


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

caArray: Cancer Array InformaticsOpen Source Tools for Microarray Data Management, Analysis and Annotation

caArray overview & demo

Mervi Heiskanen (15 min)

caArray architecture

Scott Gustafson (15 min)

webCGH overview & demo

David Hall (15 min)

http://caarray.nci.nih.gov/


caArray Data Portal &

Data Analysis Tools

  • Data Portal: Promotes data sharing, - submission of original, raw data files with associated experiment and sample information.

  • Data analysis and visualization tools:

    • webCGH (NCICB/RTI), XpressionWay (NCICB/SAIC)

    • caBIG tools:

      • caWorkbench - Columbia

      • DWD - UNC Lineberger

      • GenePattern - MIT/Broad ?

      • Magellan - UC San Francisco

      • VISDA – Georgetown

      • Cancer Molecular Pages – Burnham

      • Function Express – Wash U Siteman

      • GoMiner –NCI/CCR


Caarray version 1 0
caArray version 1.0

  • Key features:

  • MIAME 1.1 compliant data annotation forms

  • Support for Affymetrix and GenePix native files

  • MAGE-ML import and export

  • controlled vocabularies (MGED ontology)

  • access to data via MAGE-OM API

  • caArray installations:

  • NCICB caArray instance supports NCI funded programs.

  • Local installations at the cancer centers:

    caBIG funded caArray adopters (Lombardi, Wistar, NYU)



Caarray compliance with standardization efforts
caArray: Compliance with Standardization Efforts

  • MIAME

    • Minimum Information About a Microarray Experiment

    • 1.1 Draft 6 (April 1, 2002)

    • http://www.mged.org/Workgroups/MIAME/miame_1.1.html

  • MAGE-ML

    • MicroArray and GeneExpression Object Model and Markup Language

    • 1.1 (October 2003)

    • http://www.omg.org/docs/formal/03-10-01.pdf

  • MGED Ontology

    • Microarray Gene Expression Data Ontology

    • 1.1.8 (April 2004)

    • http://mged.sourceforge.net/ontologies/MGEDontology.php

caBIG compatibility guidelines

http://cabig.nci.nih.gov/guidelines_documentation/caBIG_Compatibility_Document


  • class TechnologyType

  • namespace:

    • http://mged.sourceforge.net/ontologies/MGEDOntology.daml#

  • documentation:

    • The technology type or platform of the reporters on the array.

  • type:

    • primitive

  • superclasses:

    • ArrayDesignPackage

  • used in classes:

    • FeatureGroup

  • used in individuals:

    • in_situ_oligo_featuresspotted_antibody_featuresspotted_colony_featuresspotted_ds_DNA_featuresspotted_protein_featuresspotted_ss_oligo_features

  • class CellLineDatabase

  • namespace:

    • http://mged.sourceforge.net/ontologies/MGEDOntology.daml#

  • documentation:

    • Database of cell line information.

  • type:

    • primitive

  • superclasses:

    • Database

  • used in classes:

    • CellLine

  • used in individuals:

    • ATCC_CulturesCABRI_Human_and_Animal_Cell_lines


Caarray phase 2
caArray Phase 2

  • caArray 1.2 (June 2005)

    • Support for additional file formats via a software toolkit

    • Public search without login

    • Copy bio sample information

  • caArray 1.5 (September 2005)

    • XpressionWay, pathway visualization tool

    • Integration with caDSR 3.0

  • caArray 1.7 (December 2005)

    • Store filtered and normalized data

    • User management user interface

  • caArray 2.0 (March 2006)

    • Embedded MAGE-ML validation

All releases:

Defect fixes and

usability

enhancements


Acknowledgements

  • NCICB/SAIC

  • Development team:

  • Hangjiong Chen

  • Scott Gustafson

  • Juergen Lorenz

  • John Moy

  • Sumeet Muju

  • Beth Neuberger

  • Phu Tran

  • Jim Zhou

  • QA:

  • Durga Addepalli

  • Andrew Shinohara

  • Ye Wu

  • NCICB/TerpSys

  • Don Swan, Jamie Keller

  • Research Triangle Institute

  • David Hall (webCGH)

NCICB

Sue Dubman, Mervi Heiskanen, Xioapeng Bian, Subha Madhavan, Carl Schaefer, Gilberto Fragoso, Denise Warzel…

and Ken Buetow


Caarray s architecture

caARRAY’s Architecture

Credits to

Sumeet Muju

Phu Tran


caArray Architecture

TOMCAT WEB

EJB CONTAINER

CONTAINER

caCORE

------------

VOCAB

VOCAB

caBIO

MGR EJB

INTERFACE

caDSR

EVS

SECURITY

SECURITY

MGR EJB

OBJECTS

SERVLET

DATA

S

T

PROTOCOL

TRANSFER

U

BROWSER

MGR EJB

R

OBJECT

T

SECURITY

S

(DTO)

JSP

DB

OBJECT

EXPERIMENT

RELATIONAL

MAGE

MGR EJB

BRIDGE

MANAGER

(OJB)

)

MAGE-ML

Experiment and

ArrayDesign

S

OTHER

T

C

K

MGR EJB

E

T

S

J

caARRAY

-

B

E

O

DB

G

E

A

G

M

A

M

(

MAGE-ML

NATIVE DATA

IMPORTER MDB

FTP APPLET

FTP STAGING AREA

FILE

NETCDF API

FILE UPLOADER

FILE SHARE

MDB

NETCDF API

MAGE-OM API

MAGE-OM

MAGE-OM

JAR

OBJECTS

RMI MGR

MAGE-OM

PERSISTENCE


caArray Interfaces: caArray EJB API

  • caArrayEJB API: Provides transaction control, asynchronous processes,service location, common security and distributed capabilities for submission and retrieval of Microarray Experiments.

    • The caArray presentation layer utilizes the above functionality via the caArrayEJB API.

    • Data Transfer Objects (DTOs) utilized to transfer data between calling application and the EJBs.

    • APIs can be used for federated access and submission of transaction data.


caArray Interfaces: Mage-OM API

  • MAGE-OM API :Provides fine grain search and retrieval of all caArray data via a caBIO-like RMI based API.

    • The MAGE-OM API maps the MAGE objects to the new caArray database schema.

    • RMI Security module incorporated for user/group level data access.

    • NetCDF API logic incorporated for faster retrieval of data

    • Built to be grid enabled


Caarray middleware
caArray Middleware

  • Data Representation

    • Data Transfer Objects (DTO)

    • MicroArray Gene Expression Software Toolkit (MAGE-stk)

    • DTO - MAGE-stk Conversion

  • Data Persistence

    • Data Access Layer

      • ObJectRelationalBridge (OJB)

      • OJB Abstraction Layer and Data Access Objects (DAO)

    • EJB Layer

      • Stateless Session Façade

      • Bean-managed Persistence

    • NETCDF Files

      • Large Data Set

      • Fast Binary Access

  • MAGE-ML Import and Export

    • Message-Driven Beans


Mage ml import and export an example
MAGE-ML Import and Export: An Example

<MAGE-ML identifier="gov.nih.nci.ncicb.caarray:MAGEML:123:1">

<AuditAndSecurity_package>

<Contact_assnlist>

<Person identifier="gov.nih.nci.ncicb.caarray:Person:456:1"

lastName="Doe"

firstName="John">

</Person>

<Contact_assnlist>

</AuditAndSecurity_package>

<Experiment_package>

<Experiment_assnlist>

<Experiment identifier="gov.nih.nci.ncicb.caarray:Experiment:789:1"

name=“Sample Experiment">

<Descriptions_assnlist>

<Description text="This is a sample experiment."></Description>

</Descriptions_assnlist>

<Providers_assnreflist>

<Person_ref identifier="gov.nih.nci.ncicb.caarray:Person:456:1"/>

</Providers_assnreflist>

</Experiment>

</Experiment_assnlist>

</Experiment_package>

</MAGE-ML>

Identifiable element

Referenced Identifiable element to be resolved


Mage ml import and export
MAGE-ML Import and Export

  • Modified from the MAGE-stk’s MAGE-ML SAX-based parser to include a persistence mechanism to insert, update and resolve (look up) parsed objects

  • Any valid MAGE-ML can be imported. MAGE-ML is assumed valid. Validation is typically done using ArrayExpress’s MAGEValidator

  • Identifiable objects are first resolved from database by matching their identifier, and if resolved the in-coming object is updated against the existing one

    • Identifier represents the globally unique key of a MAGE object across domains for its entire lifecycle

    • Identifier is separate from persisted MAGE-stk object’s primary key which is only internal to caARRAY


Mage ml export
MAGE-ML Export

  • The entire object graph of an object, e.g., ArrayDesign, Experiment, is traversed to collect all Identifiable objects

  • The MAGE-stk’s MAGEJava object is utilized to contain all the Identifiable objects collected

    • When an Identifiable object is encountered, the appropriate method in the MAGEJava object is discovered and invoked using reflection to store the object into it

  • Ultimately MAGEJava.writeMAGEML(Writer) is invoked to recursively invoke the same method of all the contained Identifiable objects.

  • Xerces’s XMLSerializer pretty-formats the XML content as it is being written with appropriate new lines and indentations


A caArray Configuration

caArray 1

caWorkbench

caBIO

caArray

caDSR / EVS

schema

Security

caARRAY EJB

MAGE-OM API

JAVA

GRID

MAGE-ML

APP

(future)

caARRAY EJB

MAGE-OM API

NCICB Security

caDSR / EVS

caArray

schema

caWorkbench

caBIO

NCICB


webCGHA web application for the visualization and analysis of array-based CGH and gene expression data

David Hall, Ph.D.

Research Triangle Institute



Webcgh functions
webCGH Functions

  • Visualization of copy number and gene expression levels

  • Interrogation of genome features

  • Data normalization and analysis

  • Virtual experiments










Data flow
Data Flow

Database

Database

Adaptor

Adaptor

Transformer

Op

Op

Op

Op

X

Analytical Pipeline

Cache

Plot Generator






Past present future
Past, Present, Future

  • Dec. 2003 – Version 1.0

    • Basic plots, analytics, GEDP

  • March 2005 – Version 2.0

    • More plots, analytics, caArray

  • Late April 2005 – Version 2.1

    • Mouse/human plots

    • CGH/gene expression

    • SKY/M-FISH&CGH integration


Webcgh team
webCGH Team

  • NCICB

    • Mervi Heiskanen

  • RTI

    • David Hall

    • Vesselina Bakalov

    • Ying Chen

    • Matt Westlake

    • Bing Liu

    • Laxminarayana Ganapathi

    • Sheping Li

    • Stuart Allen


ad