The ppod core data model
Download
1 / 16

The pPOD Core Data Model - PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on

The pPOD Core Data Model. The pPOD CDM team: Bill Piel, Shirley Cohen, Tim McPhillips, Shawn Bowers, Sarah Cohen-Boulakia, Val Tannen

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The pPOD Core Data Model' - louise


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The ppod core data model

The pPOD Core Data Model

The pPOD CDM team: Bill Piel, Shirley Cohen, Tim McPhillips, Shawn Bowers, Sarah Cohen-Boulakia, Val Tannen

Special thanks to Brent Mishler, David Maddison, Jeff Oliver, Rutger Vos, Francois Lutzoni, Martin Ramirez, Jonathan Coddington, Wayne Maddison, Fan Ge, Ashley Green,Jin Ruan, Martin Wu, John Lundberg, John Sullivan


Goals
Goals

  • The Core Data Model (CDM) under development in the pPOD project will serve the following purposes:

    • It will allow experimentation with the modeling of provenance in phylogenetic pipelines.

    • It will serve as a schema for a persistence tool, to work (1) in standalone mode, (2) with our lab notebook suite and (3) integrated with Mesquite as a module.

      3. It will serve as a target for schema mappings used to connect other AToL databases, resources like TreeBASE, etc., using the Orchestra integration engine.


The role of provenance
The Role of Provenance

Backwards provenance “query”

Starting from a research “product”, eg. a tree, a supertree, a matrix, track backwards through stored objects to all the raw input information that led to this product.

Forwards provenance “query”

Starting from a raw input, eg., a specimen, an image, a sequence, track forwards through stored objects to all research products that this input contributed to.

In both cases, navigate biological assumptions in both directions, eg., homology assumptions.


Persistence tool

store

commands

provenance

query

query

(phylogenetic

query language)

AToL AAA

schema

mappings

TreeBASE

persistence

manager

RDBMS

Persistence Tool

CDM

(an OO schema)

Kepler-based

workflow tool

Mesquite

module


Atol data that needs to be modeled in cdm not an exhaustive list
AToL Data that needs to be modeled in CDM(not an exhaustive list)

Analyzed data: trees,

matrices,cells,(row) segments,

operational taxomic units (OTUs),taxa,

standard characters and their states,

genes,gene fragments

Raw data: standard views,images,

sequences,chromatograms,primers,

specimens,samples, collections


Cdm phylogeny inference data
CDM: Phylogeny Inference Data

Analyzed data: trees, matrices,

operational taxomic units (OTUs),

standardtaxa

Tree

provenance

authority

StdTaxon

Matrix

isA

Set

taxon

OTU

List

StdMatrix

SeqMatrix


Modeling provenance 1
Modeling Provenance (1)

provenance

Tree

Matrix

…but also…

Software(Parameters)

Author

Date

Must be modeled

and stored explicitly!

But it can be provided by

automatic workflow tools


Kinds of provenance
“Kinds” of Provenance

In our CDM

tools

  • Relationship between stored objects

    • Eg., tree T123 was obtained from matrix M456 by Joe Bio on 01/31/2001 using PAUP with parameters… (SEE PREVIOUS SLIDE)

    • Tracking through copy or cut/paste operations, possibly across repositories

  • Trace of data moving through a workflow

    • Sequence of timestamps, tool invocations (parameters), authors

  • Trace of data through a logically expressed view/query

    • Can be computed automatically as the view/query output is computed

In our

workflow

tool


Cdm morphological data
CDM: Morphological Data

Analyzed data: standard matrices,cells,

standard characters and their states,

Raw data: standard views,images,

specimens,collections


The ppod core data model

prov

OTU

Specimen

Collection

List

Matrix

StdMatrix

Cell

prov

prov

code(states)

Set

Image

List

prov

StdChar

StdView

states : List <string>

Set


Modeling provenance 2
Modeling Provenance (2)

img 194

cell(0,0)

tree T123

spec 19

img 193

matrix M456

cell(28,23)

img 206

spec 20

img 204

cell(28,45)

spec 21

img 211


Example of phylogenetic query
Example of Phylogenetic Query

Find all standard matrices

with some character C whose label contains the substring "elytra"

and some OTU whose state for character C contains the substring "transverse";

return all such matrices, together with their characters, OTUs and states satisfying the conditions.


Semi formalized oql query example
Semi-formalized (OQL) query example

SELECT M, label of C, label of X,

label of state encoded in cell E

FROM M over all standard matrices,

C over all characters of M,

X over all OTUs of M,

E is the cell corresponding

to C and X in M

WHERE the label of C is like "*elytra*"

AND the label of the state encoded

in cell E is like "*transverse*"


Molecular data
Molecular Data

Analyzed data: sequence matrices,(row) segments,

genes,gene fragments

Raw data: sequences,chromatograms,primers,

specimens,samples, collections


The ppod core data model

molecular matrix

gene frag 1

gene frag 2

OTU1

OTU2

from some contig

a row segment

(from some

sequence)

from different

specimens


The ppod core data model

prov???

List

List

Row

Segment

SeqMatrix

endPos : int

prov

List

List

ColumnSeg

OTU

Contig

endPos : int

prov

Set

isA

Raw Sequence

Protein

GeneFragment

prov

prov

prov

prov

Set

Set

Primer

Chromatogram

prov

prov

Collection

Specimen

Sample