slide1
Download
Skip this Video
Download Presentation
caarray.nci.nih/

Loading in 2 Seconds...

play fullscreen
1 / 48

caarray.nci.nih/ - PowerPoint PPT Presentation


  • 147 Views
  • Uploaded on

caArray: Cancer Array Informatics Open Source Tools for Microarray Data Management, Analysis and Annotation. caArray overview & demo Mervi Heiskanen (15 min) caArray architecture Scott Gustafson (15 min) webCGH overview & demo David Hall (15 min). http://caarray.nci.nih.gov/.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' caarray.nci.nih/' - reuben


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

caArray: Cancer Array InformaticsOpen Source Tools for Microarray Data Management, Analysis and Annotation

caArray overview & demo

Mervi Heiskanen (15 min)

caArray architecture

Scott Gustafson (15 min)

webCGH overview & demo

David Hall (15 min)

http://caarray.nci.nih.gov/

slide2

caArray Data Portal &

Data Analysis Tools

  • Data Portal: Promotes data sharing, - submission of original, raw data files with associated experiment and sample information.
  • Data analysis and visualization tools:
      • webCGH (NCICB/RTI), XpressionWay (NCICB/SAIC)
      • caBIG tools:
        • caWorkbench - Columbia
        • DWD - UNC Lineberger
        • GenePattern - MIT/Broad ?
        • Magellan - UC San Francisco
        • VISDA – Georgetown
        • Cancer Molecular Pages – Burnham
        • Function Express – Wash U Siteman
        • GoMiner –NCI/CCR
caarray version 1 0
caArray version 1.0
  • Key features:
  • MIAME 1.1 compliant data annotation forms
  • Support for Affymetrix and GenePix native files
  • MAGE-ML import and export
  • controlled vocabularies (MGED ontology)
  • access to data via MAGE-OM API
  • caArray installations:
  • NCICB caArray instance supports NCI funded programs.
  • Local installations at the cancer centers:

caBIG funded caArray adopters (Lombardi, Wistar, NYU)

slide4

caArray listservs:

  • caArray developers
  • caArray users
  • caArray team
caarray compliance with standardization efforts
caArray: Compliance with Standardization Efforts
  • MIAME
    • Minimum Information About a Microarray Experiment
    • 1.1 Draft 6 (April 1, 2002)
    • http://www.mged.org/Workgroups/MIAME/miame_1.1.html
  • MAGE-ML
    • MicroArray and GeneExpression Object Model and Markup Language
    • 1.1 (October 2003)
    • http://www.omg.org/docs/formal/03-10-01.pdf
  • MGED Ontology
    • Microarray Gene Expression Data Ontology
    • 1.1.8 (April 2004)
    • http://mged.sourceforge.net/ontologies/MGEDontology.php

caBIG compatibility guidelines

http://cabig.nci.nih.gov/guidelines_documentation/caBIG_Compatibility_Document

slide11

class TechnologyType

  • namespace:
    • http://mged.sourceforge.net/ontologies/MGEDOntology.daml#
  • documentation:
    • The technology type or platform of the reporters on the array.
  • type:
    • primitive
  • superclasses:
    • ArrayDesignPackage
  • used in classes:
    • FeatureGroup
  • used in individuals:
    • in_situ_oligo_featuresspotted_antibody_featuresspotted_colony_featuresspotted_ds_DNA_featuresspotted_protein_featuresspotted_ss_oligo_features
  • class CellLineDatabase
  • namespace:
    • http://mged.sourceforge.net/ontologies/MGEDOntology.daml#
  • documentation:
    • Database of cell line information.
  • type:
    • primitive
  • superclasses:
    • Database
  • used in classes:
    • CellLine
  • used in individuals:
    • ATCC_CulturesCABRI_Human_and_Animal_Cell_lines
caarray phase 2
caArray Phase 2
  • caArray 1.2 (June 2005)
    • Support for additional file formats via a software toolkit
    • Public search without login
    • Copy bio sample information
  • caArray 1.5 (September 2005)
    • XpressionWay, pathway visualization tool
    • Integration with caDSR 3.0
  • caArray 1.7 (December 2005)
    • Store filtered and normalized data
    • User management user interface
  • caArray 2.0 (March 2006)
    • Embedded MAGE-ML validation

All releases:

Defect fixes and

usability

enhancements

slide21

Acknowledgements

  • NCICB/SAIC
  • Development team:
  • Hangjiong Chen
  • Scott Gustafson
  • Juergen Lorenz
  • John Moy
  • Sumeet Muju
  • Beth Neuberger
  • Phu Tran
  • Jim Zhou
  • QA:
  • Durga Addepalli
  • Andrew Shinohara
  • Ye Wu
  • NCICB/TerpSys
  • Don Swan, Jamie Keller
  • Research Triangle Institute
  • David Hall (webCGH)

NCICB

Sue Dubman, Mervi Heiskanen, Xioapeng Bian, Subha Madhavan, Carl Schaefer, Gilberto Fragoso, Denise Warzel…

and Ken Buetow

caarray s architecture

caARRAY’s Architecture

Credits to

Sumeet Muju

Phu Tran

slide23

caArray Architecture

TOMCAT WEB

EJB CONTAINER

CONTAINER

caCORE

------------

VOCAB

VOCAB

caBIO

MGR EJB

INTERFACE

caDSR

EVS

SECURITY

SECURITY

MGR EJB

OBJECTS

SERVLET

DATA

S

T

PROTOCOL

TRANSFER

U

BROWSER

MGR EJB

R

OBJECT

T

SECURITY

S

(DTO)

JSP

DB

OBJECT

EXPERIMENT

RELATIONAL

MAGE

MGR EJB

BRIDGE

MANAGER

(OJB)

)

MAGE-ML

Experiment and

ArrayDesign

S

OTHER

T

C

K

MGR EJB

E

T

S

J

caARRAY

-

B

E

O

DB

G

E

A

G

M

A

M

(

MAGE-ML

NATIVE DATA

IMPORTER MDB

FTP APPLET

FTP STAGING AREA

FILE

NETCDF API

FILE UPLOADER

FILE SHARE

MDB

NETCDF API

MAGE-OM API

MAGE-OM

MAGE-OM

JAR

OBJECTS

RMI MGR

MAGE-OM

PERSISTENCE

slide24

caArray Interfaces: caArray EJB API

  • caArrayEJB API: Provides transaction control, asynchronous processes,service location, common security and distributed capabilities for submission and retrieval of Microarray Experiments.
    • The caArray presentation layer utilizes the above functionality via the caArrayEJB API.
    • Data Transfer Objects (DTOs) utilized to transfer data between calling application and the EJBs.
    • APIs can be used for federated access and submission of transaction data.
slide25

caArray Interfaces: Mage-OM API

  • MAGE-OM API :Provides fine grain search and retrieval of all caArray data via a caBIO-like RMI based API.
    • The MAGE-OM API maps the MAGE objects to the new caArray database schema.
    • RMI Security module incorporated for user/group level data access.
    • NetCDF API logic incorporated for faster retrieval of data
    • Built to be grid enabled
caarray middleware
caArray Middleware
  • Data Representation
    • Data Transfer Objects (DTO)
    • MicroArray Gene Expression Software Toolkit (MAGE-stk)
    • DTO - MAGE-stk Conversion
  • Data Persistence
    • Data Access Layer
      • ObJectRelationalBridge (OJB)
      • OJB Abstraction Layer and Data Access Objects (DAO)
    • EJB Layer
      • Stateless Session Façade
      • Bean-managed Persistence
    • NETCDF Files
      • Large Data Set
      • Fast Binary Access
  • MAGE-ML Import and Export
    • Message-Driven Beans
mage ml import and export an example
MAGE-ML Import and Export: An Example

<MAGE-ML identifier="gov.nih.nci.ncicb.caarray:MAGEML:123:1">

<AuditAndSecurity_package>

<Contact_assnlist>

<Person identifier="gov.nih.nci.ncicb.caarray:Person:456:1"

lastName="Doe"

firstName="John">

</Person>

<Contact_assnlist>

</AuditAndSecurity_package>

<Experiment_package>

<Experiment_assnlist>

<Experiment identifier="gov.nih.nci.ncicb.caarray:Experiment:789:1"

name=“Sample Experiment">

<Descriptions_assnlist>

<Description text="This is a sample experiment."></Description>

</Descriptions_assnlist>

<Providers_assnreflist>

<Person_ref identifier="gov.nih.nci.ncicb.caarray:Person:456:1"/>

</Providers_assnreflist>

</Experiment>

</Experiment_assnlist>

</Experiment_package>

</MAGE-ML>

Identifiable element

Referenced Identifiable element to be resolved

mage ml import and export
MAGE-ML Import and Export
  • Modified from the MAGE-stk’s MAGE-ML SAX-based parser to include a persistence mechanism to insert, update and resolve (look up) parsed objects
  • Any valid MAGE-ML can be imported. MAGE-ML is assumed valid. Validation is typically done using ArrayExpress’s MAGEValidator
  • Identifiable objects are first resolved from database by matching their identifier, and if resolved the in-coming object is updated against the existing one
    • Identifier represents the globally unique key of a MAGE object across domains for its entire lifecycle
    • Identifier is separate from persisted MAGE-stk object’s primary key which is only internal to caARRAY
mage ml export
MAGE-ML Export
  • The entire object graph of an object, e.g., ArrayDesign, Experiment, is traversed to collect all Identifiable objects
  • The MAGE-stk’s MAGEJava object is utilized to contain all the Identifiable objects collected
    • When an Identifiable object is encountered, the appropriate method in the MAGEJava object is discovered and invoked using reflection to store the object into it
  • Ultimately MAGEJava.writeMAGEML(Writer) is invoked to recursively invoke the same method of all the contained Identifiable objects.
  • Xerces’s XMLSerializer pretty-formats the XML content as it is being written with appropriate new lines and indentations
slide30

A caArray Configuration

caArray 1

caWorkbench

caBIO

caArray

caDSR / EVS

schema

Security

caARRAY EJB

MAGE-OM API

JAVA

GRID

MAGE-ML

APP

(future)

caARRAY EJB

MAGE-OM API

NCICB Security

caDSR / EVS

caArray

schema

caWorkbench

caBIO

NCICB

slide31

webCGHA web application for the visualization and analysis of array-based CGH and gene expression data

David Hall, Ph.D.

Research Triangle Institute

webcgh functions
webCGH Functions
  • Visualization of copy number and gene expression levels
  • Interrogation of genome features
  • Data normalization and analysis
  • Virtual experiments
data flow
Data Flow

Database

Database

Adaptor

Adaptor

Transformer

Op

Op

Op

Op

X

Analytical Pipeline

Cache

Plot Generator

past present future
Past, Present, Future
  • Dec. 2003 – Version 1.0
    • Basic plots, analytics, GEDP
  • March 2005 – Version 2.0
    • More plots, analytics, caArray
  • Late April 2005 – Version 2.1
    • Mouse/human plots
    • CGH/gene expression
    • SKY/M-FISH&CGH integration
webcgh team
webCGH Team
  • NCICB
    • Mervi Heiskanen
  • RTI
    • David Hall
    • Vesselina Bakalov
    • Ying Chen
    • Matt Westlake
    • Bing Liu
    • Laxminarayana Ganapathi
    • Sheping Li
    • Stuart Allen
ad