Beyond federation of data collections
Download
1 / 21

Beyond Federation of Data Collections - PowerPoint PPT Presentation


  • 298 Views
  • Updated On :

Beyond Federation of Data Collections. Making Information Integration Service a part of NPACI Data Management Infrastructure. Amarnath Gupta Bertram Ludäscher Maryann Martone Ilya Zaslavsky. Collection Federation. In this scenario, scientific groups

Related searches for Beyond Federation of Data Collections

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Beyond Federation of Data Collections' - Antony


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Beyond federation of data collections l.jpg

Beyond Federation of Data Collections

Making Information Integration Service a part of NPACI Data Management Infrastructure

Amarnath Gupta

Bertram Ludäscher

Maryann Martone

Ilya Zaslavsky


Collection federation l.jpg
Collection Federation

  • In this scenario, scientific groups

    • produce data items (e.g., text data, images, simulation data …)

    • put them in collections

    • add metadata (who created it, what is the data about …)

    • make it available for sharing (on the web, in a data cache accessible with VBN, in HPSS with authorization information …)

  • The Problem

    • The data may be large number of small chunks or small number of large chunks – data movement is an issue

    • Heterogeneity in data types, storage technologies, networks, authentication protocols

    • Access has to be collection-based, data item wise, or data fragment wise, access may need executing data-specific functions

  • Storage Resource Broker/Metadata Catalog

    • The focus is on making the data available

NPACI AHM,2001


Information integration l.jpg
Information Integration

Cross-source queries

What is the cerebellar distribution of rat proteins with more than 70%

homology with human NCS-1? Any structure specificity?

How about other rodents?

Cross-source relationships are modeled

Information-producing services can be invoked

??? Integrated

View ???

Data, relationships, constraints are modeled

??? Integrated

View Definition ???

???Mediator ???

Wrapper

Wrapper

Wrapper

Wrapper

Web

protein localization

morphometry

neurotransmission

CaBP, Expasy

NPACI AHM,2001


Hidden semantics protein localization l.jpg

Purkinje Cell layer of

Cerebellar Cortex

Molecular layer of

Cerebellar Cortex

Fragment of dendrite

Hidden Semantics: Protein Localization

<protein_localization>

<neuron type=“purkinje cell” />

<protein channel=“red”>

<name>RyR</>

….

</protein>

<region h_grid_pos=“1” v_grid_pos=“A”>

<density>

<structure fraction=“0.8”>

<name>spine</>

<amount name=“RyR”>0</>

</>

<structure fraction=“0.2”>

<name>branchlet</>

<amount name=“RyR”>30</>

</>

NPACI AHM,2001


Hidden semantics morphometry l.jpg

Branch level beyond 4

is a branchlet

Must be dendritic

because Purkinje cells

don’t have somatic spines

Hidden Semantics: Morphometry

<neuron name=“purkinje cell”>

<branch level=“10”>

<shaft>

</shaft>

<spine number=“1”>

<attachment x=“5.3” y=“-3.2” z=“8.7” />

<length>12.348</>

<min_section>1.93</>

<max_section>4.47</>

<surface_area>9.884</>

<volume>7.930</>

<head>

<width>4.47</>

<length>1.79</>

</head>

</spine>

NPACI AHM,2001


The problem l.jpg
The Problem

  • Multiple Worlds Integration

    • compatible terms not directly joinable

    • complex, indirect associations among schema elements

    • unstated integrity constraints

  • What’s needed?

    • a “theory” under which non-identical terms can be “semantically joined”

      => lift mediation to the level of conceptual models (CMs)

      => domain knowledge, ICs become rules over CMs

      => Model-Based Mediation

NPACI AHM,2001


Information integration7 l.jpg
Information Integration

What is the cerebellar distribution of rat proteins with more than 70%

homology with human NCS-1? Any structure specificity?

How about other rodents?

??? Integrated

View ???

??? Integrated

View Definition ???

???Mediator ???

Wrapper

Wrapper

Wrapper

Wrapper

Web

protein localization

morphometry

neurotransmission

CaBP, Expasy

NPACI AHM,2001


Example query evaluation i l.jpg
Example Query Evaluation (I)

  • Example: protein_distribution

    • given:organism, protein, brain_region

    • Use DOMAIN-KNOWLEDGE-BASE:

      • recursively traverse the has_a_star paths under brain_region collect all anatomical_entities

    • Source PROLAB:

      • join with anatomical structures and collect the value of attribute “image.segments.features.feature.protein_amount” where “image.segments.features.feature.protein_name” = proteinand “study_db.study.animal.name” = organism

    • Mediator:

      • aggregate over all parents up to brain_region

      • report distribution

NPACI AHM,2001


Example query evaluation ii l.jpg
Example Query Evaluation (II)

@SENSELAB: X1 := select output from parallel fiber;

@MEDIATOR: X2 := “hang off” X1 from Domain Map;

@MEDIATOR: X3 := subregion-closure(X2);

@NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors);

@MEDIATOR: X5 := compute aggregate(X4);

"How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?"

NPACI AHM,2001


Integration issues l.jpg
Integration Issues

SEMANTIC Integration

  • SYNTACTIC/STRUCTURAL Integration

  • Integrated Views (Src-XML => Intgr-XML)

  • Schema Integration (DTD =>DTD)

  • Wrapping, Data Extraction (Text => XML)

MIX

Mediation of Information using XML

Distributed

Query Processing

SRB/MCAT

storage, query capabilities

protocols & services

SYSTEM Integration

TCP/IP HTTP CORBA

NPACI AHM,2001


The mediator architecture l.jpg
The Mediator Architecture

Mediation Services API

Mediator Layer

  • Source model lifting:

  • domain knowledge reconciliation

  • model transformation

  • Query formulation:

  • user query

  • integrated view definition

Deductive

Engine

Model

Reasoner

  • Source registration:

  • domain knowledge

  • model & schema

  • query & computation capabilities

  • Query processing:

  • view unfolding

  • semantic optimization

  • capability-based rewriting

Optimizer

Wrapper Layer

  • Query interface (down API):

  • SDLIP, SOAP, ...

  • (subsets of) SQL, X(ML)-Query, CPL,...

  • DOM

  • SRB-based access

  • Result delivery interface (up API):

  • SDLIP, SOAP, ...

  • pull (tuple/set-at-a-time, DOM) vs. push (stream)

  • synchronous/asynchronous

  • direct data/data reference

File

Sources

RDB

Sources

Spatial

Sources

HTML

Sources

XML

Sources

Digital Libraries (Collections)

Boston

Univ.

NCMIR

UCSD

Montana

Univ.

Yale

Univ.

SDLIP

ARC

IMS

NPACI AHM,2001


Mediation services source registration i l.jpg
Mediation Services: Source Registration-I

Source

Data Type

Query Capability

Result Delivery

Access Protocol

ARC

XML

QL

DOOD

SQL

tree

file

table

HTTP

Java

SRB

Tuple-at-a-time

Stream

Set-at-a-time

SPJ

Selections

Binary for Viewer

NPACI AHM,2001


Mediation services source registration ii l.jpg
Mediation Services: Source Registration-II

  • Domain Model Registration

    • Here is my concept ontology

      • Keep it only as a private object

      • Merge my ontology with a pre-existing non-private ontology

        • Here are the equivalence relations

      • Detect conflicts between my ontology and a given public ontology

  • Conceptual Schema Registration

    • Classes, methods

    • Constraints

    • Domain Model Reference

Next

NPACI AHM,2001


Anatom domain map l.jpg

ANATOM

ANATOM Domain Map

Back

NPACI AHM,2001


Slide15 l.jpg

anatom_dom(X) :- (ucsd_has_a(X,_); ucsd_has_a(_,X); ucsd_isa(X,_); ucsd_isa(_,X)).

senselab_dom(X) :- (sl_has_a(X,_); sl_has_a(_,X); sl_isa(X,_); sl_isa(_,X)).

% map senselab anatom terms to equivalent ucsd anatom terms

sl2ucsd(X,X) :- senselab_dom(X), anatom_dom(X).

sl2ucsd('A',axon).

sl2ucsd('AH',axon).

sl2ucsd('Dad',spiny_branchlet). % should REALLY map to a PATH not just the end of the path

sl2ucsd('Dam',main_branches). % really only SOME of the main_branches based on the branch level

sl2ucsd('Dap',main_branches).

sl2ucsd('Dbd',spiny_branchlet).

sl2ucsd('Dbm',main_branches).

sl2ucsd('Dbp',main_branches).

sl2ucsd('Ded',spiny_branchlet).

sl2ucsd('Dem',main_branches).

sl2ucsd('Dep',main_branches).

sl2ucsd('T',axon).

% keep has_a edge if at least one node is known from ucsd

has_a(X,Y) :- sl2ucsd(_,X), ucsd_has_a(X,Y).

has_a(X,Y) :- sl2ucsd(_,Y), ucsd_has_a(X,Y).

% keep all and only ucsd is-a's

isa(X,Y) :- ucsd_isa(X,Y).Back

NPACI AHM,2001


Slide16 l.jpg

Neuron ucsd_isa(X,_); ucsd_isa(_,X)).

MyNeuron

Neostriatum

Compartment

Spiny Neuron

ALL:has

Soma

Axon

Dendrite

Medium Spiny Neuron

Neurotransmitter

MyDendrite

exp

=

AND

OR

GABA

Substance P

exp

Dopamine R

Substantia Nigra Pc

Substantia Nigra Pr

Globus Pallidus Int.

Globus Pallidus Ext.

Back

NPACI AHM,2001


Mediation services client registration l.jpg
Mediation Services: ucsd_isa(X,_); ucsd_isa(_,X)).Client Registration

Client

Update Client

Query Client

Thin Result Viewer

Fat Result Viewer

Navigate/

Ad-hoc

Query

Capability

Query on

Schema

Derive

Before

Insert

Check

Data

Merge

Before

Insert

Client-side

Processing

Client-side

Buffer

Send Full Data

Context

Sensitive

Server-side Buffer

Server-Push/

Client-Pull

NPACI AHM,2001


Mediation services integrated view definition l.jpg
Mediation Services: ucsd_isa(X,_); ucsd_isa(_,X)).Integrated View Definition

  • For the domain data modeler

  • Currently in a Logic Language (Frame-logic)

    protein_distribution(Protein, Organism, Brain_region, Feature_name, Anatom, Value)

    if

    I:protein_label_image[ proteins ->> {Protein}; organism -> Organism; anatomical_structures ->> % from PROLAB {AS:anatomical_structure[name->Anatom]}], % NAE:neuro_anatomic_entity[name->Anatom; % from ANATOM

    located_in->>{Brain_region}], %

    AS..segments..features[name->Feature_name; value->Value].

  • May be wrapped into a simpler tool

NPACI AHM,2001


Mediation services query formulation tools l.jpg
Mediation Services: ucsd_isa(X,_); ucsd_isa(_,X)).Query Formulation Tools

  • Combination of ad hoc and navigational

  • Open Issues

    • Recursive queries

    • Aggregate queries

    • Combining data and service requests

NPACI AHM,2001


Mediation services data update tools l.jpg
Mediation Services: ucsd_isa(X,_); ucsd_isa(_,X)).Data Update Tools

NPACI AHM,2001


Some open issues l.jpg
Some Open Issues ucsd_isa(X,_); ucsd_isa(_,X)).

  • Data/Knowledge Modeling

    • Extensibility: how to handle a source with new data types and operations?

      • Temporal Data: instrument readings, video microscopy

      • Spatial Data: Integrating with spatial database systems

      • Image database systems

    • Conflict Management

      • Grades of certainty

      • Alternate Hypothesis

  • Integrating Services

    • Registration and warping of my image slice to a reference

  • Integrating into Larger Applications

    • M-Cell simulation

    • Telemicroscopy

    • Visualization

NPACI AHM,2001


ad