Session goals

Session goals • Review existing APIs, and how they fit with • overall data architecture • MBAT architecture • Create a strategy for developing and assimilating uniform APIs, and priorities • Explore consequences for MBAT architecture

Architecture; data types and interfaces MBAT WOMBAT Other clients DataRegistrationPortlets publication Discovery, Retrieval, Analysis, Viz, Integration APIs Mediator Catalog wrappers, following uniform web service APIs Catalogs and indexes Spatial Registry BIRNLex, etc. CCDB Publication Gene Expression 2D Images 2D vector segmentations 3D Volumes SRB, other sources Surfaces Phenotype / behavioral 4+D Volumes (FMRI) Time Series Source wrappers, following uniform web service APIs Sources Sources Sources

APIs

Uniform Web Services API(towards BIRN-ML??) Web services is a standard way to access remote functionality cross-platform, and assemble applications. We have several data types accessed by atlases: microarray data, 2D images, 3D volumes, surfaces, segmentations, annotations, phenotype/behavioral data, FMRI, time series, etc. Some of these data types have common representation models (e.g. MAGE). These models are typically large and exist in multiple incarnations. The level of detail they provide often is not needed for data discovery and common data access and integration tasks. So it would be useful to envelope such data in a common set of services that would expose the most essential data characteristics and represent the common denominator queries against each particular data type (e.g. getGenes, getProbes, getStructures...) that any dataset of this type shall respond to. Such services would support multiple clients, including atlases, BDR interface, mediator, etc. Plug-in architecture vs SOA: no contradiction (focus on a single product, vs on a larger system)

Issues/steps (for MA and 2D) • figure out how search requests and outputs as implemented in MBAT (http://www.loni.ucla.edu/twiki/bin/view/MouseBIRN/WebServices), MA module in BIRN (http://microarray.nbirn.net/), and GN (http://www.genenetwork.org/CGIDoc.html) • Examine MAGE and see how the same MA requests and output can be expressed in MAGE. Then, depending on the results of (1), either abandon MAGE in favor of some simpler XML (potentially embedded in XCEDE?), or rely on MAGE constructs (and include them in XCEDE wrappers for gene expression sources, as a foreign namespace?). This shall be done vis-à-vis common information requirements of client applications (e.g. GetProbes?, GetGenes?, GetStructures?, etc.) • In parallel, review the schema used in the MA module, for whether it sufficiently reflects information model for GE data, and update as necessary • If we decide on the XCEDE route, make sure the mediator can connect with XCEDE sources, be it a database source or a web source in XCEDE wrapper (see http://mediator.nbirn.net:8080/axis/services/MedTestBService?wsdl – this would involve passing web service calls via the ExecuteQuery? method, and conversion between XML output and mediator’s recordset) • Identify additional sources or databases to be wrapped in the same API (GN, Gensat, ABA, BIRN MA + GEO +UCSC (VISIGENE) – for MA data; CCDB, ABA, ArcIMS, spatial registry, Gensat – for 2D). Then finalize the signatures. • Make sure terms used in queries and in the output, are tagged with BIRNLex terms (e.g. develop controlled vocabularies for each term) • Implement web services for the GN and MA module (incl testing/deployment) • Based on results of (3), update data publication tools (i.e. software for loading data from common CSV and text files into the MA module), make sure controlled vocabularies are enforced; • Make sure AIDB’s XCEDE wrapper supports the services as well(?). Now CCDB-based. • Publish and document web services; develop a series of examples of how they can be called from various programming environments and applications • GEO API: connect the region names with MBAT: need semantic registration;possibly scrape the GEO catalog, reconcile labels with MBAT semantics, and have a service wrapper into GEO data,

About XCEDE and MA • XCEDE is the common schema providing access to BIRN databases. • HID and the emerging AIDB are being wrapped in XCEDE (see http://www.na-mic.org/Wiki/index.php/Slicer3:Remote_Data_Handling), and - as deployed at BIRN-CC: http://bcc-dev-mediator.nbirn.net:8080/axis/services/HidQuerierWS?wsdl. Web services are being written against XCEDE, so both HID and AIDB will be accessible through XCEDE web services. The goal, therefore, could be to route common metadata requests against gene expression, 2d images, 3d volumes data, in XCEDE, and extend XCEDE to support additional requests. • If we switch to CCDB as the image catalog – what components of XCEDE shall be retained • MAGE-ML/FUGE/MAGEv2 • MAGE-ML is derived from Microarray Gene Expression Object Model (MAGE-OM), which is developed and described using the Unified Modelling Language (UML. MAGE-ML is by purpose used to describe microarray designs, microarray manufacturing information, microarray experiment setup and execution information, gene expression data and data analysis results. MAGEv2 is being built on top of FuGE as an extension to add in microarray specific classes (extending Data as ArrayDesign, DesignElementData, etc, Material as Array, QPCRPlate, etc, and DimensionElement as DesignElement extended by Feature, Reporter, and CompositeElement). • FUGE Home Page • MAGE Home Page

From XCEDE API • Gets/Puts: • GetProjects, GetProject, GetProjectDetail • GetSubject, GetSubjects, GetSubjectDetail • GetVisits,… • GetStudies,… • GetSeries,… • Get Data Acquisitions • Get Assessments,… • getData, Get DataSizeEstimate • Also some getCapabilities returns (e.g. getMethods)

API Examples: Mediator services • http://mediator.nbirn.net:8080/axis/services/MedTestBService?wsdl • SOAP Method : executeQuery (loginTimeoutSecs, maxByteCountPerBatch, queryID, queryLifeInSeconds, queryParameters.item0.name, queryParameters.item0.value, queryParameters.item1.name, queryParameters.item1.value, queryString, queryTimeoutSecs, resultLifeInSeconds, securityCertificateString) • SOAP Method : fetchNextResultBatch, fetchPreviousResultBatch, fetchCurrentResultBatch, fetchRelativeResultBatch, fetchResultBatch, getErrorMessage, getStatistics

API Examples: BIRN MA • BIRN Microarray (http://microarray.nbirn.net/get_data.php? ) • REST service: cmd=<get_probes|get_my_probes> • user_id=<int> • dset=<all|null> • strain=<string> • keyword=<string> • species=<string> • sex=<string> • stage=<string> • subject_group=<string> • anatomy=<string> • probe_id=<string> GN will need several more: - platformID - GeneID (proxied by ProbeID here?) - ExonID - “bestID” (sort by quality, based on user-selected metric, e.g. highest expression) Infomodel: Species – probes – structures - genes Passing SQL queries as opposed to just filters that we have… Is there a way to unify what is returned from GN, ABA, BIRN-MA? Matrix (from ABA-Neuroblast): who are best covariants: spatially,semantically, temporally; which genes were most modulated

API examples: Gensat • GENSAT:http://maloney.loni.ucla.edu:8080/axis/GensatSource.jws?wsdl • getGene(geneSym, geneName, exprLevel, anatStruc, stage, sex) • Don’t have probes; ABA doesn’t have them either (= genes) • get2DImage(geneSym, geneName, exprLevel, anatStruc, stage, sex, plane) • No spatial info • getDataTypes(dataSourceID) • Essentially, a capabilities request

API Examples: GeneNetwork • http://www.genenetwork.org/webqtl/WebQTL.py? • cmd=birn (also: genotype, get, trait, map, interval, correlation…) • species=XXXX • tissue=XXXX • symbol=XXXX • ProbeId=XXXX • function=XXXX • Strain=XXXX Check with Amarnath on ontology mapping for genes in GN http://www.genenetwork.org/CGIDoc.html

Our expectations for MA data More?

Allen Brain Atlas API The API to the Allen Brain Atlas-Mouse Brain consists of a set of services allowing users to programmatically download the complete high resolution images, 3D volumes, and metadata for more than 20,000 genes in the database. In addition to the documentation, a demo has been created to demonstrate the use of the services of the API. The demo's source code is also available…

ABA API details (expressions) • ImageSeries Structure Expression (ImageSeries ID) -> XML in ABA schema • Expression Energy Volumes (ImageSeriesID) -> sparse volume file (x,y,z for each voxel where expression energy value > 0, + density of expression) Comment:Smoothed energy volume for gene Tspan2 imageseriesId 75144618 Dimensions:67,41,58 38,14,4,2.01994e-06 39,14,4,2.37068e-05 40,14,4,3.08554e-05 . . .

ABA API (Genes) No need to put sex or strain Need to control for dummy entries on front end Resolution is another issue Provenance information, when multiple probes (have on their site) Status=OK|failed|single_best • Genes (GeneSymbol) xml (image-series, gene-expressions) - <image-series> <age>56</age> <geneid>12593</geneid> <imageseriesdisplayname>Coch-Coronal-05-2779</imageseriesdisplayname> <imageseriesid>71717614</imageseriesid> <ncbiaccessionnumber>NM_007728</ncbiaccessionnumber> <plane>coronal</plane> <probeorientation>antisense</probeorientation> <projectname>0310</projectname> <riboprobename>RP_050623_02_G08</riboprobename> <sex>male</sex> <specimenid>05-2779</specimenid> <strain>C57BL/6J</strain> <templateid>143280</templateid> <transcriptgi>31982455</transcriptgi> <transcriptid>9068</transcriptid> <transcriptname /> <treatmenttype>ISH</treatmenttype> </image-series> - <gene-expression> <avgdensity>100.0</avgdensity> <avglevel>93.9770317077637</avglevel> <geneid>12593</geneid> <projectcode>0310</projectcode> <rgb>#a0d8e8</rgb> <structureid>343</structureid> <structurelabel>STRd</structurelabel> <structurename>Striatum dorsal region</structurename> </gene-expression>

ABA API Details (images) • Get Image: http://www.brain-map.org/aba/api/image?zoom=[zoom]&path=[filePath]&mime=[mime]&top=[top]&left=[left]&width=[width]&height=[height] • Default output = jpeg; zooms = 0…6; path = filepath to a file in image series • Top, left – in image coords, for full size image (implied zoomify images) • ImageProperties (by path; by ImageID): • <IMAGE_PROPERTIES WIDTH="15185" HEIGHT="8817" NUMTILES="2832" NUMTIERS="7" NUMIMAGES="1" VERSION="1.8" TILESIZE="256" /> • ImageSeries (ID)  • Have GetImage feature implemented in GN (per Rob)

ABA API Demos

Other existing APIs (spatial) • The registry has web service interface, to find available images in ROI: • http://smartatlas.nbirn.net:8080/axis/services/ImageMetadataForROI?wsdl <request><category>mouse</category><regionofinterest>-2,2,-2,-2,2,-2,2,2,-2,2</regionofinterest><slicenumber>031</slicenumber></request> • Requesting image fragments: • E.g. http://geon15.sdsc.edu/axis/services/ImageQueryService?wsdl • the name of the method is getSimpleImageWithSpecs • method inputs: • host - 132.239.131.188 • serviceName - slice_15b_warped1194307428796 • minx - -6.035011 miny - -8.165376 maxx - 12.088584 maxy - 0.812765 • imageHeight - 800 imageWidth - 600 • There are standards in the GIS world on how you exchange spatial data, e.g. GML simple features that are the basis of many application schemas. E.g. • <gml:Point srsName="urn:ogc:def:crs:EPSG:6.6:4269> <gml:pos>45.256 -71.92</gml:pos> </gml:Point>

Catalog, and catalog services • The current model: • MBAT registers individual source services, queries them for both metadata and data • Response time based on the slowest of them • Catalog-based: • For each data type, there is a catalog that stores information for initial discovery • Eg. probes: ABA:probe1; geneA…;…. GN:probe1(ensuring unique probe IDs…) • Or images: ABA:image1; type=zoomify;URL=…; • Discovery queries (getProbes, getProbeInfo, getTissues, getTissueInfo, etc.) are executed against the catalog, while getData go against data sources • The Catalog is synched with data sources periodically (sync services)

Feature Requests • Ability to mix "AND" and "OR" in queries • (currently all queries assume "AND" of all parameters) • might need a "language" to specify query • Ability to request "pages" of results • for example, show me the first 10 results • similar to most search engine results • More requests: more complete SQL

Issues/steps (for MA and 2D) • figure out how search requests and outputs as implemented in MBAT (http://www.loni.ucla.edu/twiki/bin/view/MouseBIRN/WebServices), MA module in BIRN (http://microarray.nbirn.net/), and GN (http://www.genenetwork.org/CGIDoc.html) • Examine MAGE and see how the same MA requests and output can be expressed in MAGE. Then, depending on the results of (1), either abandon MAGE in favor of some simpler XML (potentially embedded in XCEDE?), or rely on MAGE constructs (and include them in XCEDE wrappers for gene expression sources, as a foreign namespace?). This shall be done vis-à-vis common information requirements of client applications (e.g. GetProbes?, GetGenes?, GetStructures?, etc.) • In parallel, review the schema used in the MA module, for whether it sufficiently reflects information model for GE data, and update as necessary • If we decide on the XCEDE route, make sure the mediator can connect with XCEDE sources, be it a database source or a web source in XCEDE wrapper (see http://mediator.nbirn.net:8080/axis/services/MedTestBService?wsdl – this would involve passing web service calls via the ExecuteQuery? method, and conversion between XCEDE and mediator’s recordset) • Identify additional sources or databases to be wrapped in the same API (GN, Gensat, ABA, BIRN MA – for MA data; CCDB, ABA, ArcIMS, spatial registry, Gensat – for 2D). Then finalize the signatures • Make sure terms used in queries and in the output, are tagged with BIRNLex terms (e.g. develop controlled vocabularies for each term) • Implement web services for the GN and MA module (incl testing/deployment) • Based on results of (3), update data publication tools (i.e. software for loading data from common CSV and text files into the MA module), make sure controlled vocabularies are enforced; • Make sure AIDB’s XCEDE wrapper supports the services as well(?). • Publish and document web services; develop a series of examples of how they can be called from various programming environments and applications

API for GE/MA • What are use cases? • What is the information model, and what is the catalog?: a) species -> subjects (age, sex, etc.)-> strains -> probes ->genes -> tissues • API for discovery… (by tissues, genes, probe sets,…) • getSpecies()  list of species in the registry • getSpeciesInfo -> Species metadata from the registry • getProbes (species, strain, sex, age) • getProbeInfo • API for retrieval…

Species Geneticmanipulations (biowarehouse) Subjects (Stage|age, sex) Strains (genetic manipulations CCDB getSpecies, getSpeciesInfo getSubjects, .. getGenes({probeseries}) ,..getGenes({tissue}) getProbes({Genes}) getProbeTissues getGE ({probeTissues}) Tissues (incl. pointer totissue vocabulary) Probes (resolution, failed or not) Genes (from a masterlist) Probe-tissue Catalog GE values (categorical ornumeric) Manufacturer/ provenance Info normalization, Units, etc. Discovery stage: ultimately till GetProbeTissues call DataRetrieval: getGE

API for 2D images • What are use cases? • What is the information model, and what is the catalog? • API for discovery… (by ROI, by labeled regions, by spatial relations…also, getImageInfo? getImageSeriesInfo?) • API for retrieval (getImage? getImageStack?)

Find expression for a Gene based on Gene Name or abbreviation Image stacksand groups getImages getImageInfo (size, orientation..) getImageStacks getImageStackInfo Species, strains, Subjects, projects Gene|protein-tissuecatalog Imagery servers Genes (name, abbrev) Image catalog Type of image, gene expressed, Spatial characteristics (coord system, Spatial extent, plane, local XYZs Proteins retrieval discovery

Have “representatives” for each of the GE and 2D sources, who would vet the schema for whether the sources can be mapped into it without significant losses

Session goals

Session goals

Presentation Transcript

Session 1 Goals

Goals of the Session

Session Goals

USHERS SESSION GOALS

Goals for This Session

Session goals

Session Goals:

Session Goals

Session Goals

Session Goals

Session Goals

GOALS FOR SESSION

SESSION GOALS

Session Goals

Session Goals

Goals for session

Program Session Goals

Goals for Session

Goals of Session

Session Goals