1 / 42

cancer Bioinformatics Infrastructure Objects (caBIO)

cancer Bioinformatics Infrastructure Objects (caBIO). Providing Innovative and Integrative Informatics Solutions Himanso Sahni (SAIC) Sharon Settnek (SAIC). caBIO.

lulu
Download Presentation

cancer Bioinformatics Infrastructure Objects (caBIO)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. cancer Bioinformatics Infrastructure Objects (caBIO) Providing Innovative and Integrative Informatics Solutions Himanso Sahni (SAIC) Sharon Settnek (SAIC)

  2. caBIO • Thecancer Bioinformatics Infrastructure Objects (caBIO) is an infrastructure which integrates internal and publicly available bioinformatics data spanning multiple scientific disciplines • caBIO objects simulate the behavior of actual bioinformatics components such as genes, chromosomes, sequences, ontologies, trials, agents, etc. • caBIO provides access to a variety of bioinformatics data sources including, Unigene, Homologene, LocusLink, RefSeq, BioCarta, GoldenPath (via DAS), and NCICB’s CGAP (Cancer Genome Anatomy Project) and GAI (Genetic Annotation Initiative) data repositories • caBIO is “open source” and provides an abstraction layer that allows developers to access genomic information using a standardized tool set without concerns for implementation details and data management

  3. caBIO Object Model

  4. Model Extensions Clinical Protocols • A clinical protocols object model facilitates the integration of clinical data with genomic data Animal Models • An animal models object model supports queries between human and animal models of cancer

  5. MAGE-OM Extension • caBIO is currently being extended to included the MAGE-OM object model • Microarray data from the NCICB Gene Expression Data Portal (GEDP) will be retrieved via the caBIO MAGE API

  6. caBIO APIs • A Java API is available for Java programmers • A Simple Object Access Protocol (SOAP) API is provided for non-Java programmers • An HTTP API is also available • Developers can request XML or HTML (via XSLT)

  7. Powered by caBIO! caBIO ApplicationsCancer Molecular Analysis Project (CMAP)

  8. Powered by caBIO! Molecular Targets • A collection of genes organized by pathways can be displayed facilitating the evaluation of anomalies

  9. Powered by caBIO! Targeted Agents • Researchers can retrieve information about agents linked to multiple targets and contexts

  10. Powered by caBIO! Clinical Trials • Researchers can view detailed information about therapeutic trials associated with histology types and agents • A Clinical Protocols Portal is available to allow researchers to search and submit clinical protocols affiliated with Specialized Programs of Research Excellence (SPOREs)

  11. caBIO Architecture • caBIO was designed using a J2EE architecture with client interfaces, server components, back-end objects and data sources • Clients (browsers, applications) can receive information (HTML and XML) from back-end objects over HTTP • Client applications can also communicate with back-end objects via Java RMI (Java applications) • Non-Java based applications can communicate via SOAP or HTTP • Server components communicate with back-end objects via Java RMI • Back-end objects communicate directly with data sources (database, URLs, flat files) • caBIO web services can be advertised to facilitate information sharing • RDF can be used to advertise content to crawlers and agents • A UDDI registry may be configured to advertise services • caBIO services can be advertised via bioMOBY central

  12. Clients Presentation Layer Object Layer Data Sources Web Server Servlet Container JSPs External Databases HTML/HTTP Data Access Objects Servlets Object Managers Browsers SOAP Engine JDBC EVS XML/HTTP Other Apps RMI caDSR UI Bean Domain Objects SOAP HTTP XML Builder Chromosomes Genes URLs XSLT Engine Tissues Clusters Agents RDF FTP Libraries Sequences DTDs Flat Files XML Docs Diseases XSL Style Sheet Other Java Apps caBIO Architecture

  13. Data Sources External Public Databases CGAP Database UniGene RefSeq Reference Sequences Genes, Sequences Chromosomes caBIO GO BioCarta CGAP/ GAI CTEP/ SPOREs Gene Ontology Pathways SNPs Trials Locus Link Homolo Gene UCSC Golden Path DAS Homologs Gene Loci, Locus Link Summaries Gene Annotations Genes, Sequences

  14. caBIO Benefits • Provides an abstraction layer that allows developers to access genomic information using a standardized tool set without concerns for implementation details • Permits access to allow developers to obtain the information they need from a variety of data sources without data management • Manages the display of large volumes of data to assist in load balancing • Provides an effective mechanism for performing complex queries that rely on diverse data sources • Facilitates information sharing without managing linkages between multiple data sources

  15. caBIO Usage Facilitates solving Complex Queries such as: Find me the Pathways, with Genes that are expressed in tissues with a particular Histopathology that includes a particular Organ and a particular Disease.

  16. Java Packages • gov.nih.nci.caBIO.bean • Contains domain objects to access genomic and biomedical components • gov.nih.nci.caBIO.util.das • Primary interface to the UCSC DAS • Uses JAXB to convert DAS DTDs to objects • gov.nih.nci.caBIO.evs • Provides synonym search and concept based search to the NCI’s Enterprise Vocabulary System (EVS) • gov.nih.nci.caBIO.webservices • Provides access to caBIO via SOAP • gov.nih.nci.caBIO.servlet • Provides access to caBIO via HTTP • gov.nih.nci.caBIO.util • Provides interface to caBIO utilities

  17. Java API Domain objects have companion SearchCriteria objects Gene myGene = new Gene(); GeneSearchCriteria criteria = new GeneSearchCriteria(); criteria.setSymbol("pTEN"); SearchResult result = myGene.search(criteria); Gene[] genes = (Gene[]) result.getResultSet(); • caBIO supports nested SearchCriteria • SearchCriteria from one object type can be fed as parameters into SearchCriteria of another type. • Complex queries without any SQL

  18. Traverse Relationships in Model Find me thePathways, withGenesthat are expressed inTissueswith a particularHistopathologythat includes a particularOrganand a particularDisease. INPUT Disease Histopathology Genes Organ Pathways OUTPUT

  19. findPathway Input disease, organ; create SearchCriteria Objects: public Pathway[] findPathway(String disease, String organ) { DiseaseSearchCriteria diseaseCriteria = new DiseaseSearchCriteria(); OrganSearchCriteria organCriteria = new OrganSearchCriteria(); HistopathologySearchCriteria histoCriteria = new HistopathologySearchCriteria(); GeneSearchCriteria geneCriteria = new GeneSearchCriteria(); PathwaySearchCriteria pathCriteria = new PathwaySearchCriteria();

  20. findPathway Nest the SearchCriteria, then do the search: diseaseCriteria.setName(disease); organCriteria.setName(organ); histoCriteria.putSearchCriteria(diseaseCriteria,CriteriaElement.AND); histoCriteria.putSearchCriteria(organCriteria, CriteriaElement.AND); geneCriteria.putSearchCriteria(histoCriteria, CriteriaElement.AND); pathCriteria.putSearchCriteria(geneCriteria, CriteriaElement.AND); Pathway myPathway = new Pathway(); return myPathway.searchPathways(pathCriteria); }

  21. findPathways: Query Results

  22. Web Services: SOAP http://cabio.nci.nih.gov/soap/services/index.html

  23. SOAP API Perl Example use SOAP::Lite; $s = SOAP::Lite ->uri(urn:nci-gene-service) ->proxy("http://cabio.nci.nih.gov/soap/servlet/rpcrouter"); my %searchCriteria=(); $searchCriteria{symbol}=“pTEN”; $som=$s->getGenes(SOAP::Data->type(map =>\%searchCriteria)); $xmldoc = $som->result;

  24. SOAP output with xlinks <?xml version="1.0" encoding="UTF-8" ?> <nci-core> - <gov.nih.nci.caBIO.bean.Gene id="2221" xmlns:xlink="http://www.w3.org/1999/xlink/"> <name>PTEN</name> <title>phosphatase and tensin homolog (mutated in multiple advanced cancers 1)</title> <dbCrossRefs>{LOCUS_LINK=5728, OMIM=601728, UNIGENE=10712}</dbCrossRefs> <Pathwayxlink:href= "http://lpgprot101.nci.nih.gov:5080/CORE/GetXML?operation=Pathway&GeneId=2221" /> [Additional xlinks for ExpressionExperiment, Organ, Chromosome, GeneHomolog, Sequence, Gene Alias, Protein, SNP, and MapLocation] </gov.nih.nci.caBIO.bean.Gene> [2 Additional Genes with “PTEN” in their name] - <searchResult> <hasMore>false</hasMore> <startsAt>1</startsAt><endsAt>3</endsAt> </searchResult> </nci-core>

  25. SOAP with returnHeavyXML Data is now returned in full. Pathway object snippet: <gov.nih.nci.caBIO.bean.Pathway id="92"> <name>ptenPathway</name> <displayValue>PTEN Dependent Cell Cycle Arrest and Apoptosis</displayValue> <pathwayDiagram>ptenPathway.svg</pathwayDiagram> </gov.nih.nci.caBIO.bean.Pathway>

  26. HTTP API Direct access to XML-formatted data via URLs: http://cabio.nci.nih.gov/servlet/GetXML? operation=Gene&Symbol=pTEN Method Parameter Value Search Parameter

  27. HTTP API Direct access to SVG-formatted data via URLs: http://cabio.nci.nih.gov:80/servlet/GetSVG?operation=Pathway&amp;name=g2Pathway&amp;GeneInfoLocation=/servlet/GetXML?operation=Gene&amp;ielikes=.svg Method Parameter Value Search Parameter

  28. BIOgopher • BIOgopher enables a researcher to perform complex queries against caBIO data sources • Researchers can: • Provide local data • Create custom queries • Design custom reports

  29. Importing Local Data • Researchers can import local experiment data in spreadsheet format • Researchers can leverage imported data during the session • Researchers can include imported data in defining custom queries and reports

  30. Creating a Query • Researchers can create a query or access an existing query within a session • Researchers specify the caBIO object that will be the subject of the query

  31. Specifying Search Criteria • Researchers can dynamically specify search criteria • Attributes of caBIO objects related to the chosen subject can be selected as search criteria • Local data can be fetched for inclusion as search criteria • Researchers can browse caBIO data for inclusion in search criteria values

  32. Creating a Report • Researchers can create and format reports based on the selected search criteria • Reports can be viewed and exported as a spreadsheet

  33. BIOgopher Architectural Details • Leveraged the Model-View-Controller 2 (MVC 2) architecture • Abstracted the presentation layer from spreadsheet manipulation, meta-data retrieval, query design, and report generation • Developed a server-side N-dimensional query builder • An object-cube was leveraged in support of object-mining

  34. Presentation Layer • Leverages the Jakarta Struts Project

  35. Spreadsheet Manipulation • Leverages the Apache POI Project

  36. Meta-Data Layer • Leverages NCICB’s caDSR

  37. Query Design • Leverages Java Swing components for trees, nodes and tables

  38. Ontology Interface

  39. caBIO Kernel BIOgopher Client • Facilitates the creation of a federation of caBIO servers to share information between local data sources and the NCICB caBIO server • Leverages the JXTA protocol for peer-to-peer communication Proxy NCICB caBIO server Object/DB bridge 5. Queries Persistence Layer DSI 4. Parses query and DSI and authenticates user (in any) . 1. Sends query and user info 8. Returns results 6.Returns objects to requestor 3. Passes query to NCICB server Local caBIOServer 7.Queries data map Datamap 8.Queries Persistence layer Object/DB bridge 2. Parses query and DSI, and authenticates user .

  40. Future • caBIO "kernel" • Object-level Security Module (fine grain) • Standard LDAP authentication • Vocabulary Object to extend the EVS API • caDSR objects, API • MAGE-OM, API • Animal Models-OM, API • Extend pathway object model to support KEGG and BioCarta interactions • Analytical Tool handling i.e. BLAST.

  41. Future • New Data Sources • Proteins • PDB, PIR, BioJava for protein data • OMIM - For link from proteins to diseases • Agents - Access agent data from EVS and DCP • Pharmacokinetics • Histology, Tissue/Organ - Leverage EVS vocabulary (currently LASH) • PubMed

  42. NCICB Kenneth Buetow Peter Covitz Carl Schaefer Robert Clifford Mike Edmonson Frank Hartel Sherri DeCoronado SAIC Scott Gustafson Mike Connolly Joshua Phillips Kevric ( documentation ) Diane Zimmerman Acknowledgements Visit our new and improved web site: http://ncicb.nci.nih.gov/core/caBIO

More Related