1 / 26

Brain Data & Knowledge Grid

Brain Data & Knowledge Grid. Data-Intensive Computing Environments San Diego Supercomputer Center (SDSC) Reagan Moore Chaitan Baru Amarnath Gupta Bertram Ludäscher Richard Marciano Arcot Rajasekar Ilya Zaslavsky. National Center for Microscopy and Imaging Research (NCMIR)

wes
Download Presentation

Brain Data & Knowledge Grid

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brain Data & Knowledge Grid Data-Intensive Computing Environments San Diego Supercomputer Center (SDSC) Reagan Moore Chaitan Baru Amarnath Gupta Bertram Ludäscher Richard Marciano Arcot Rajasekar Ilya Zaslavsky ... National Center for Microscopy and Imaging Research (NCMIR) Mark Ellisman Maryann Martone Steve Peltier Steve Lamont ... (or: Towards Services for Knowledge-Based Mediation of Neuroscience Information Sources) University of California, San Diego

  2. Surface atlas, Van Essen Lab stereotaxic atlas LONI MCell, CNL, Salk CCB, Montana SU NCMIR, UCSD Infrastructure for Sharing Neuroscience Data • SOURCES: • NCMIR, U.C. San Diego • Caltech Neuroimaging • Center for Imaging Science, John Hopkins • Center for Computational Biology, Montana State • Laboratory of Neuro Imaging (LONI), UCLA • Computatuonal Neurobiology Laboratory, Salk Inst. • Van Essen Laboratory, Washington University • … • Data Management Infrastructure (DICE/NPACI) • MIX Mediation in XML • MCAT information discovery • SRB data handling • HPSS storage • ... Knowledge-based GRID infrastructure ? ? ? ? Data Management Infrastructure (“Data Grid”) GTOMO, Telemicroscopy, Globus, SRB/MCAT, HPSS

  3. Sharing Resources on the Brain Data Grid • Scientific groups ... • create data products (e.g., text data, images, simulation data …) • put them in collections • add metadata (who created it, what is the data about …) • make it available for sharing (on the web, in data caches, in HPSS, …) • Technical challenges ... • size & packaging of data • heterogeneity: data types, storage technologies, transport mechanisms, authentication, ... • access levels: collection, object, fragment; data-specific functions (“data blades”) • Data Grid technologies can help ... • distributed data management, e.g., Storage Request Broker/Metadata Catalog (SRB/MCAT), computing (Globus), ... • focus is on resource sharing (data, networks, cycles)

  4. Integration Issue: Semantic Integration/Mediation ??? SEMANTIC INTEGRATION ??? • SYNTACTIC/STRUCTURAL Integration • Integrated Views (Src-XML => Intgr-XML) • Schema Integration (DTD =>DTD) • Wrapping, Data Extraction (Text => XML) MIX Mediation of Information using XML Distributed Query Processing SRB/MCAT Globus JDBC DOM CORBA storage, query capabilities protocols & services SYSTEM INTEGRATION TCP/IP grid-ftp HTTP

  5. WWW DB Standard Mediator/Wrapper Architecture Client/User-Query XML Q/A INTEGRATED VIEW domain semantics ??? GRID federation services ??? Integration logic } protocol translation SRB/MCAT, DOM, X(ML)Query structure syntax Wrapper Wrapper Wrapper transport storage Files Lab1 Lab2 Lab3 (Neuro)Science (Re)Sources

  6. ??? Integrated View ??? ??? Integrated View Definition ??? ???Mediator ??? The Need for Semantic Integration Cross-source queries What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? Cross-source relationships are modeled Semantic (knowledge-based) mediation services Data, relationships, constraints are modeled (CMs) Wrapper Wrapper Wrapper Wrapper Web protein localization morphometry neurotransmission CaBP, Expasy

  7. Purkinje Cell layer of Cerebellar Cortex Molecular layer of Cerebellar Cortex Fragment of dendrite Hidden Semantics: Protein Localization <protein_localization> <neuron type=“purkinje cell” /> <protein channel=“red”> <name>RyR</> …. </protein> <region h_grid_pos=“1” v_grid_pos=“A”> <density> <structure fraction=“0.8”> <name>spine</> <amount name=“RyR”>0</> </> <structure fraction=“0.2”> <name>branchlet</> <amount name=“RyR”>30</> </>

  8. Branch level beyond 4 is a branchlet Must be dendritic because Purkinje cells don’t have somatic spines Hidden Semantics: Morphometry <neuron name=“purkinje cell”> <branch level=“10”> <shaft> … </shaft> <spine number=“1”> <attachment x=“5.3” y=“-3.2” z=“8.7” /> <length>12.348</> <min_section>1.93</> <max_section>4.47</> <surface_area>9.884</> <volume>7.930</> <head> <width>4.47</> <length>1.79</> </head> </spine> …

  9. Knowledge-Based (Semantic) Mediation • Multiple Worlds Integration Problem: • compatible terms not directly joinable • complex, indirect associations among attributes • unstated integrity constraints • Approach: • a “theory” under which terms can be “semantically joined” => lift mediation to the level of conceptual models (CMs) => formalize domain knowledge, ICs become rules over CMs => Knowledge-Based/Model-Based (Semantic) Mediation

  10. Integrated-DTD := XML-QL(Src1-DTD,...) Integrated-CM:= CM-QL(Src1-CM,...) DOMAIN MAP IF  THEN  IF  THEN  Logical Domain Constraints IF  THEN  No Domain Constraints Structural Constraints (DTDs), Parent, Child, Sibling, ... Classes, Relations, is-a, has-a, ... C1 A = (B*|C),D B = ... C2 R C3 . . .... .... .... XML Elements .... (XML) Objects Raw Data Raw Data ConceptualModels Raw Data XML-Based vs. Model-Based Mediation CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, OIL, DAML, …} XML Models

  11. FL rule proc. LP rule proc. GCM GCM GCM Mediator Engine CM S1 CM S2 CM S3 XSB Engine Graph proc. CM-Wrapper CM-Wrapper CM-Wrapper XML-Wrapper XML-Wrapper XML-Wrapper S3 S1 S2 Knowledge-Based Mediator Prototype USER/Client CM (Integrated View) Domain Map DM Integrated View Definition IVD CM Plug-In CM Queries & Results (exchanged in XML) Logic API (capabilities)

  12. Mediation Services: Source Registration (System Issues) Source Data Type Query Capability Result Delivery Access Protocol ARC XML QL DOOD SQL tree file table HTTP JDBC SRB Tuple-at-a-time Stream Set-at-a-time SPJ Selections Binary for Viewer

  13. Mediation Services: Source Registration (Semantics Issues) • Domain Map Registration • provide concept space/ontology • … as a private object (“myANATOM”) • … merge with others (give “semantic bridges”) • … and check for conflicts • Conceptual Model Registration • schema: classes, associations, attributes • domain constraints • “put data into context” (linking data to the domain map) Next

  14. ANATOM ANATOM Domain Map Back

  15. Senselab (Yale) and NCMIR (UCSD) “Semantic Bridge” anatom_dom(X) :- (ucsd_has_a(X,_) ; ucsd_has_a(_,X) ; ucsd_isa(X,_) ; ucsd_isa(_,X)). senselab_dom(X) :- (sl_has_a(X,_) ; sl_has_a(_,X) ; sl_isa(X,_) ; sl_isa(_,X)). % map Senselab anatom terms to equivalent UCSD ANATOM sl2ucsd(X,X) :- senselab_dom(X), anatom_dom(X). sl2ucsd('A',axon). sl2ucsd('AH',axon). sl2ucsd('Dad',spiny_branchlet). % should map to a PATH not just the end of the path sl2ucsd('Dam',main_branches). % some of the main_branches based on the branch level sl2ucsd('Dap',main_branches). sl2ucsd('Dbd',spiny_branchlet). sl2ucsd('Dbm',main_branches). sl2ucsd('Dbp',main_branches). sl2ucsd('Ded',spiny_branchlet). sl2ucsd('Dem',main_branches). sl2ucsd('Dep',main_branches). sl2ucsd('T',axon). % keep has_a edge if at least one node is known from UCSD has_a(X,Y) :- sl2ucsd(_,X), ucsd_has_a(X,Y). has_a(X,Y) :- sl2ucsd(_,Y), ucsd_has_a(X,Y). % keep all and only UCSD is_a rels isa(X,Y) :- ucsd_isa(X,Y).Back

  16. Refinement of a Domain Map (Ontology): Putting Data in Context via Registration of new Classes & Relationships Neuron MyNeuron Neostriatum Compartment Spiny Neuron ALL:has Soma Axon Dendrite Medium Spiny Neuron Neurotransmitter MyDendrite exp = AND OR GABA Substance P exp Dopamine R Substantia Nigra Pc Substantia Nigra Pr Globus Pallidus Int. Globus Pallidus Ext.

  17. Mediation Services: Integrated View Definition DERIVE protein_distribution(Protein, Organism, Brain_region, Feature_name, Anatom, Value) FROM I:protein_label_image[proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> {AS:anatomical_structure[name->Anatom]}] , % from PROLAB NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM located_in->>{Brain_region}], AS..segments..features[name->Feature_name; value->Value]. • provided by the domain expert and mediation engineer • declarative language (here: Frame-logic)

  18. Example Query Evaluation (I) • Example: protein_distribution • given:organism, protein, brain_region • Use DOMAIN-KNOWLEDGE-BASE: • recursively traverse the has_a_star paths under brain_region collect all anatomical_entities • Source PROLAB: • join with anatomical structures and collect the value of attribute “image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = proteinand “study_db.study.animal.name” = organism • Mediator: • aggregate over all parents up to brain_region • report distribution

  19. Example Query Evaluation (II) @SENSELAB: X1 := select output from parallel fiber; @MEDIATOR: X2 := “hang off” X1 from Domain Map; @MEDIATOR: X3 := subregion-closure(X2); @NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors); @MEDIATOR: X5 := compute aggregate(X4); "How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"

  20. Mediation Services: Client Registration Client Update Client Query Client Thin Result Viewer Fat Result Viewer Navigate/ Ad-hoc Query Capability Query on Schema Derive Before Insert Check Data Merge Before Insert Client-side Processing Client-side Buffer Send Full Data Context Sensitive Server-side Buffer Server-Push/ Client-Pull

  21. Example Client: Query Formulation and Result Display • combination of ad hoc and navigational queries • client side visualization (left) • results are shown in semantic context (right)

  22. Mediation Services: Semantic Annotation Tools line drawing ==annotation==> (spatial) database for mediation

  23. Mediator Architecture Blueprint Mediation Services Mediator Layer • Source model lifting: • domain knowledge reconciliation • model transformation • Query formulation: • user query • integrated view definition Deductive Engine Model Reasoner • Source registration: • domain knowledge • model & schema • query & computation capabilities • Query processing: • view unfolding • semantic optimization • capability-based rewriting Optimizer Wrapper Layer • Query interface (down API): • SDLIP, SOAP, ... • (subsets of) SQL, X(ML)-Query, CPL,... • DOM • SRB-based access • Result delivery interface (up API): • SDLIP, SOAP, ... • pull (tuple/set-at-a-time, DOM) vs. push (stream) • synchronous/asynchronous • direct data/data reference File Sources RDB Sources Spatial Sources HTML Sources XML Sources Digital Libraries (Collections) Boston Univ. NCMIR UCSD Montana Univ. Yale Univ. SDLIP ARC IMS

  24. Surface atlas, Van Essen Lab stereotaxic atlas LONI  Knowledge-Based Mediation MCell, CNL, Salk CCB, Montana SU NCMIR, UCSD Coming up: Knowledge-Based/Semantic Mediation of Brain Data Result (XML/XSLT) PROTLOC Result (VML/SVG) ANATOM

  25. Some Open Issues • Data/Knowledge Modeling • Extensibility: how to handle a source with new data types and operations? • Temporal Data: instrument readings, video microscopy • Spatial Data: Integrating with spatial database systems • Image database systems • Conflict Management • Grades of certainty • Alternate Hypothesis • Integrating Services • Registration and warping of my image slice to a reference • Integrating into Larger Applications • M-Cell simulation • Telemicroscopy • Visualization

  26. References • Model-Based Mediation with Domain Maps, Bertram Ludäscher, Amarnath Gupta, Maryann Martone, Intl. Conference on Data Engineering (ICDE), Heidelberg, 2001 • Knowledge-Based Mediation of Heterogeneous Neuroscience Information Sources, Amarnath Gupta, Bertram Ludäscher, Maryann Martone, Intl. Conference on Scientific and Statistical Databases (SSDBM), Berlin, 2000. • Model-Based Information Integration in a Neuroscience Mediator System, Bertram Ludäscher, Amarnath Gupta, Maryann Martone, Intl. Conference on Very Large Data Bases (VLDB), Cairo, 2000.

More Related