The pPOD ( p rocessing P hyl OD ata) Project. a collaborative between University of Pennsylvania University of California, Davis Yale University University of Florida. AToL Projects' Data. 1. Genotypic descriptions and their provenance;
a collaborative between
University of Pennsylvania
University of California, Davis
University of Florida
1. Genotypic descriptions and their provenance;
2. Phenotypic descriptions and their provenance;
3. Specimens and their provenance including collection information, voucher deposition, etc.;
4. Interpretation of the primary measurements including homology;
5. Estimates of phylogenies, and information about the methods employed;
6. Supertree construction, and information about the methods employed; and
7. Post-tree analyses such as character evolution hypotheses.
… more …
Need to develop the infrastructure to integrate AToL data sources together with other valuable resources such as publication archival databases, morphological character databases, phylogenomics databases, etc.
Hence the pPOD mission.
An extensible core data model for phylogenetic data, with a query language and a persistence tool;
Components for peer-to-peer data integration and exchange, using schema mappings (Orchestra);
A scientific workflow system for interoperating the data integration components with the local database access components and with analysis tools (Kepler).
strong support for systematics-oriented provenance management
Val Tannen (PI), Sue Davidson, Zack Ives, Junhyong Kim (coPIs)
Zhaowei Bao, T.J. Green, Greg Karvounarakis, U Pennsylvania
Bertram Ludaescher (PI), Shawn Bowers, Tim McPhillips (coPIs)
Manish Anand, UC Davis
Reed Beaman (PI), U Florida
Bill Piel (coPI), Greg Jordan, Yale
Peter Buneman, U Edinburgh
Michael Donoghue, Yale
Jim Leebens-Mack , U Georgia
Francois Lutzoni, Duke
David Maddison, U Arizona
Wayne Maddison, U British Columbia
Brent Mishler, UC Berkeley
Bernard Moret, EPF Lausanne
Rod Page, U Glasgow
Mike Sanderson, U Arizona
Todd Vision, U North Carolina and NESCENT
Collaboration with Sarah Cohen-Boulakia and Christine Froidevaux, U Paris-Sud
Many of these people and others
have participated in a
community feedback meeting
11-12 Sept. 2007 (at NESCENT)
The pPOD CDM team: Bill Piel, Shirley Cohen, Tim McPhillips, Shawn Bowers, Sarah Cohen-Boulakia, Reed Beaman, Val Tannen
Special thanks to Brent Mishler, David Maddison, Jeff Oliver, Rutger Vos, Francois Lutzoni, Martin Ramirez, Jonathan Coddington, Wayne Maddison, Fan Ge, Ashley Green,Jin Ruan, Martin Wu, John Lundberg, John Sullivan
MIAPA compatibility: working with Jim Leebens-Mack and Todd Vision
3. It will serve as a target for schema mappings used to connect other AToL databases, resources like TreeBASE, etc., using the Orchestra integration engine.
Starting from a research “product”, eg. a tree, a supertree, a matrix, track backwards through stored objects to all the raw input information that led to this product.
Starting from a raw input, eg., a specimen, an image, a sequence, track forwards through stored objects to all research products that this input contributed to.
In both cases, navigate biological assumptions in both directions, eg., homology assumptions.
In our CDM
Analyzed data: trees,
operational taxomic units (OTUs),taxa,
standard characters and their states,
Raw data: standard views,images,
Technology: object-oriented modeling (from OO databases),
use Hibernate for relational storage
OQL (ODMG) and HQL (Hibernate) for query language
Find all standard matrices
with some character C whose label contains the substring "elytra"
and some OTU whose state for character C contains the substring "transverse";
return all such matrices, together with their characters, OTUs and states satisfying the conditions.
SELECT M, label of C, label of X,
label of state encoded in cell E
FROM M over all standard matrices,
C over all characters of M,
X over all OTUs of M,
E is the cell corresponding
to C and X in M
WHERE the label of C is like "*elytra*"
AND the label of the state encoded
in cell E is like "*transverse*"
Bill Piel and Greg Jordan (Yale)
Funded in part by NESCENT/Google Summer of Code.
A phylogeny browser and editor designed for large tree visualization and manipulation. Java applet web deployable or stand-alone application (under construction).
To be integrated with the pPOD CDM query language.
For relational databases:
AliceTable(Attr1, - , Attr2, - ) CDMTable(Attr1,Attr2, - )
From TreeBASE II to pPOD CDM:
study(studyId,studyName,studyAcc), analysis(anId,studyId,anName,_), ANALYSISSTEP(stepId,anId,softId, dataId,stepNam,_,parms),SOFTWARE(softId,soft,-,-,-,-), analyzeddata(dataId,false,len), MATRIX(matrixId, dataId,matrixTitle,-,-,-), matrixrow(rowId,matrixId,mLabelId),TAXONLABEL(mLabelId,inLabel), PHYLOTREE(treeId,dataId,treeTitle), treenode(nodeId,treeId,tLabelId, nodeLabel), TAXONLABEL(nodeId,otuLabel)
MATRIX(matrixId,matrixTitle), matrixotu(matrixId,otuId,-), OTU(otuId,otuLabel),
Each peer is able to
Map its schema to other peers' schema
Query through its own view of the shared data
Publish its updates to the other peers
Reconcile automatically its data with other peers' data through mappings
Curate other peers data when imported
Define trust policies over the related databases
It is easy to change the mappings to adapt to schema changes, new data needs, ... A much simpler development task!
Zack Ives, Val Tannen, T.J. Green, Greg Karvounarakis
Working on deployment with Reed Beaman
Computer ScienceThe Kepler/pPOD SystemScientific workflow and provenance support for the AToL community
Bertram Ludaescher, Shawn Bowers, Tim McPhillips, Manish Anand
A workflow for aligning protein sequences using the CIPRes ClustalW service
A workflow for refining protein sequence alignments using the local Gblocks application
Similar workflows for inferring trees using the CIPRes RAxML service...
...the CIPRes MrBayes service...
... and the local Phylip protpars program