nexml a future data exchange standard for phylogenetics n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Nexml A future data exchange standard for phylogenetics PowerPoint Presentation
Download Presentation
Nexml A future data exchange standard for phylogenetics

Loading in 2 Seconds...

play fullscreen
1 / 25
suki-stuart

Nexml A future data exchange standard for phylogenetics - PowerPoint PPT Presentation

75 Views
Download Presentation
Nexml A future data exchange standard for phylogenetics
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. NexmlA future data exchange standard for phylogenetics Rutger Vos University of British Columbia

  2. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (1/7)The problem Increased automation in evolutionary informatics is hampered by poorly defined “standards”

  3. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (2/7)EvoInfo interests Semantics: CDAO Addressing interoperability problems by coding our way out of it Syntax: Nexml Transport: PhyloWS

  4. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (3/7)This subproject’s mission • To create a file format like nexus*, but: • Fix (some) problems with nexus • Give access to data at higher level • Be extensible • Expose data to xml goodies *Maddison, Swofford and Maddison, 1997. NEXUS: An Extensible File Format for Systematic Information. Syst. Biol.46(4):590-621

  5. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (4/7)Nexus problems • Hard/impossible to validate • No explicit versions • Nothing ever deprecated • No public extensions • Leads to hacks such as ‘mixed’ data, ‘hot comments’ • Phylogenetics post-’80s in private blocks

  6. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (5/7)Parsing plain text versus parsing XML • Processing nexus data involves lexing + parsing + processing • XML allows choosing a parser library, data can be processed as a structure that hides tokenization issues

  7. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (6/7)Extensibility • ‘Extensible’ file format should provide the ability to: • define new data types that implement described ‘interfaces’ • attach typed data structures to core types • attach custom XML

  8. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Introduction (7/7)XML goodies • Large stack of off-the-shelf tools: • XML parser libraries • Web service toolkits • Native XML databases • Editors / IDEs • Serialization / data binding tools

  9. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (1/5)Design principles • Re-use of prior art • Follow design patterns • Referencing • Verbose and compact representations

  10. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (2/5)Re-use of prior art • Generic key/value attachments following apple’s plist semantics: <dict> <key>prior</key> <float>0.78</float> </dict> • Trees and networks following graphml • General file structure following nexus concepts, i.e. blocks that reference each other

  11. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (3/5)XML design patterns • http://www.xmlpatterns.com • “Declare before use” • “Metadata first” • “Venetian blinds” • Abstract inheritance through extension, concrete inheritance through restriction

  12. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (4/5)Inheritance “Base”, optional base/lang/href attributes extends “Annotated”, optional dict elements extends “Labelled”, optional label attribute extends “IDTagged”, required id attribute extends “AbstractElement”, in root schema restricts “ConcreteElement”, in instance document

  13. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Design (5/5)Referencing • Elements sometimes refer to other elements, much like in nexus • In nexml, elements refer to the id of other elements by the name of the referenced element: <otu id="t1"/> <!-- i.e. OTU, referenced later as: --> <node id="n1" otu="t1"/>

  14. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (1/6)Approach • Schema design • Community feedback through wiki, email, telecon, projects (evoinfo, ppod, MIAPA) etc. • Processors (perl, java, python, c++, VB) development in parallel • Experiments with xml tools (ws, db, data binding tools)

  15. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (2/6) root element • version="1.0" • generator="mesquite" • Versioned namespace: xmlns:nex="http://www.nexml.org/1.0"

  16. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (3/6)inheritance tree for elements

  17. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (4/6) anatomy of a “block” <characters id="c1" xsi:type="nex:DnaSeqs" otus="t1"> </characters> <dict> <key>desc</key> <string>description…</string> </dict> Contents…

  18. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (5/6)Character Classes Granularity Data type

  19. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Implementation (6/6)Tree Classes Branch type Topology

  20. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (1/4)Schema blocks • Done: • OTUs • characters: dna, rna, nucleotide, protein, categorical, continuous, restriction (compact and verbose) • trees: graphml trees and networks, various edge formats and rootings

  21. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (2/4)Parsers and writers • Nexml parsers and writers: • mesquite, java, using xmlbeans • Bio::Phylo, perl • pyNexml, python • DAMBE, Visual Basic • stubs for c++ xmlbeans • plans for ruby?

  22. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (3/4)Experiments • Included schema in soap wsdl • Indexed files in dbxml • Created large files from tolweb, rbcl • XInclude with tinyseq xml • REST service described using nexml

  23. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Current status (4/4)To do • Cross-reference with glossary, ontology • Substitution model descriptions • Publish standard • Follow up on earlier feedback (small fixes) • Sets (in progress, using class identifiers) • more restricted vocabulary attachments (Darwin core) • Distances • Splits

  24. Introduction • The problem • EvoInfo interests • This subproject • Nexus issues • Parsing • Extensibility • XML goodies • Design • Principles • Re-use • Patterns • Inheritance • References • Implementation • Approach • Example • Inheritance • Anatomy • Characters • Trees • Current status • Schema blocks • Parsers & writers • Experiments • To do • Resources Resources Base URL http://www.nexml.org Wiki https://www.nescent.org/wg_evoinfo/Future_Data_Exchange_Standard SourceForge project http://sourceforge.net/projects/nexml/

  25. Acknowledgements • Contributions: Jason Caravas, Mark Holder, Peter Midford, Jeet Sukumaran, Xuhua Xia • Feedback: wg-evoinfo, pPOD, Wayne Maddison, David Maddison • Additional funding, support: NESCent, GSoC