Interoperability of phylogenetic data

Interoperability of phylogenetic data Weigang Qiu, Rutger Vos

Introduction • Improving interoperability at the data level. • We consider two options: • “new standard” approach: a new file format, endorsed by this meeting; • “abstraction” approach: parsing and serialization layer, intermediate data model layer;

A new standard • What is to be done to achieve this? • Walk through of the steps involved; • Compare & contrast with abstraction approach;

Creating a new standard • Exhaustively define and publish structure and syntax; • Create unambiguous validation procedure; • Create extension protocol: • Governed extension adoption mechanism • Versioning

Commitment to maintenance • Standard governance means long term commitment of someone or some “body” (NESCENT? OBF? Us?), but: • So does abstraction approach, • …which doesn’t encourage standardization • …and trails rather than leads data trends • …neither of which provide impetus for maintenance

New standard, new features • When designing standard, add attractive new features from the start: • Substitution models; • More metadata for taxa, trees, nodes, matrices, sequences, sites; • More metadata for “project” (analysis metadata, logging)

Expanding abstraction architecture • Even without new features, facilitating union of existing features in abstraction approach implies complex ontology and metaformat: • Premature generalization • Analysis paralysis

Implement IO in common tools • Possible early adopters: • Services: CIPRES and TreeBASE • Analysis apps: Paup*, HyPhy, MEGA, MrBayes, Mesquite • Toolkits: Bio::*

Implementation of abstraction • Abstraction architecture needs to be developed separately • By whom? In what language? • More of a “complete rewrite” • Doesn’t tap into phylogenetics community expertise • Might be “lock in”

Adoption of new standard • Advocacy to increase community adoption: • Carrots: • Access to new features • Robust, can be validated objectively • Interoperability • Stick: • Submission requirement for services, databases, journals

Adoption of abstraction architecture • Adoption is hampered by catch-22: • Abstraction architecture only encourages contributions of mappings from 3rd party authors if that “adds value” to their application, • But main added value of abstraction architecture is the number of contributed mappings

Do the simplest thing that could possibly work

Interoperability of phylogenetic data

Interoperability of phylogenetic data

Presentation Transcript

Phylogenetic inference using molecular sequence data

Exploring Phylogenetic Data with Splits-Graphs

Phylogenetic Tree of Secernentea

From data harmonisation to data interoperability

Data Interoperability An Introduction

Interoperability of Research Data

Phylogenetic inference using molecular sequence data

Alternative Data Interoperability Solutions

Data Interoperability: An Introduction

Terminology of phylogenetic trees Types of phylogenetic trees Types of Data Character Evolution

Data Interoperability

Semantic Interoperability of Geospatial Data and Services

Wheat Data Interoperability

Botany 563: Phylogenetic Analysis of Molecular Data

Data Standardization Interoperability

Wheat Data Interoperability

Assessing Phylogenetic Hypotheses and Phylogenetic Data

Phylogenetic relationship of lentiviruses

Alternative Data Interoperability Solutions