1 / 37

Introduction to the BinX Library

Introduction to the BinX Library. eDIKT project team Ted Wen tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk. Agenda. About the BinX project A brief introduction to the BinX language Introduction to the BinX library Advanced API to the BinX library Use cases and requirements

halden
Download Presentation

Introduction to the BinX Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk

  2. Agenda • About the BinX project • A brief introduction to the BinX language • Introduction to the BinX library • Advanced API to the BinX library • Use cases and requirements • Dr Bob Mann • Dr Chris Maynard • Discussion

  3. About the BinX project

  4. The problem • XML is useful to represent metadata • Scientific datasets can be too large in XML • Most scientific data are in binary files • Binary data files are not all standardized • Binary data files are platform-dependent

  5. BinX – a solution • Initially designed for the Grid environment • Annotate data schema for any binary file • Data elements are marked up in XML • Describe three levels of features in a binary file • Underlying physical representation (byte order) • Primitive data types (integer, float) • Structure of the dataset (array, table)

  6. The BinX project at eDIKT • Implementing a software library for BinX • Develop a series of tools based on the library • Choose C++ for performance • Write portable code for different platforms • Robust and easy to use

  7. Development status • Requirement gathering from July 2002 • Development started in October 2002 • Prototype finished in December 2002 • Alpha version complete in April 2003 • Beta version to be released in June 2003

  8. The deliverables • The BinX library • Compiled code on different platforms • Source code with Open Source license • Documentation • User’s guide • Developer’s guide • Utilities and examples

  9. The BinX Language

  10. What is BinX? • The Binary XML Description Language • A language for annotating binary data files • It describes data types, data structures and attributes such as byte order • A BinX document is an XML file with metadata of a binary data file

  11. A BinX document Root element • <dataset byteOrder=“bigEndian”> • <definitions> • <defineType typeName=“myTyp”> • <arrayFixed> • <character-8/> • <dim indexTo=“9”/> • </arrayFixed> • </defineType> • </definitions> • <filesrc=“myfile.bin”> • <useType typeName=“myTyp”/> • <integer-32 varName=“X” /> • </file> • </dataset> Data class section Abstract data type Data instance section

  12. Primitive data elements Byte, character, integer, real Complex data elements Arrays, struct, union User-defined data elements Data elements

  13. Primitive data types • Bit • <bit-1> • Character • <character-8> • <unicodeCharacter-16> • <unicodeCharacter-32> • Integer • <byte-8> • <short-16>, <unsignedShort-16> • <integer-32>, <unsignedInteger-32> • <longInteger-64>, <unsignedLongInteger-64> • Real • <ieeeFloat-32> • <ieeeDouble-64> • <ieeeQuadruple-128>

  14. Complex data types • Arrays • Repetitive collection of any data element • Multidimensional • Three types of arrays • Fixed length array • Variable-length array • Streamed array • Struct • A sequence of data elements • Union • One of a group of possible data elements conditional to the discriminant

  15. Arrays • Streamed array • <arrayStreamed> • <byte-8/> • <dimStreamed/> • </arrayStreamed> • Fixed-length array • <arrayFixed> • <ieeeDouble-64/> • <dim indexTo=“3” name=“X” /> • <dim indexTo=“4” name=“Y” /> • <dim indexTo=“5” name=“Z” /> • </arrayFixed> • Variable-length array • <arrayVariable sizeRef=“byte-8”> • <ieeeFloat-32 /> • <dim indexTo=“7”/> • <dimVariable/> • <arrayVariable>

  16. Struct • <struct> • <short-16 varName=“ID” /> • <integer-32 varName=“Count” /> • <ieeeDouble-64 varName=“Var” /> • </struct>

  17. Union • <union> • <discriminant> • <byte-8/> • </discriminant> • <case discriminantValue=“32”> • <ieeeFloat-32 /> • </case> • <case discriminantValue=“64”> • <ieeeDouble-64 /> • </case> • <case discriminantValue=“0”> • <void-0 /> • </case> • </union>

  18. User-defined data type • <defineType typeName=“HeaderStruct”> • <struct> • <character-8 varName=“A”/> • <character-8 varName=“B” /> • <integer-32 varName=“Length” /> • </struct> • <defineType>

  19. Data elements as instances • <file src=“myfile.bin”> • <short-16 varName=“id”/> • <arrayFixed varName=“name”> • <character-8 /> • <dim indexTo=“7” /> • </arrayFixed> • <struct varName=“record”> • <short-16 /> • <ieeeFloat-32 /> • </struct> • </file>

  20. Reference defined elements • <definitions> • <defineType typeName=“A”> • <struct> • <short-16/> • <integer-32/> • </struct> • <defineType> • </definitions> • <file src=“myfile.bin”> • <useType typeName=“A” varName=“FirstUse”/> • <useType typeName=“A” varName=“SecondUse”/> • </file>

  21. The BinX Library Alpha version

  22. Fundamental requirements • Access to data elements in binary files via BinX • Parse the BinX document • Build in-memory data structures • Read data values from the binary file • Automatic conversion • Byte ordering • Padding • Producing BinX document and binary data • Generate BinX document for data structures • Save assigned data values into binary files

  23. General use cases • Data conversion (byte order) • Data extraction (sub-dataset) • Data combination (two arrays to one) • Data presentation (browse, pure XML)

  24. BinX Components • The library has core functionality to support generic utilities and applications Applications BinX core functionality Parse BinX document Read binary data Utilities BinX Library Core Generic tools Data conversion Extraction Packing/Unpacking Applications Domain-specific

  25. The BinX library core • Input: SchemaBinX, binary data file • Output: DataBinX, In-memory dataset In-memory Data structure (Values loaded on demand) <dataset> … … </dataset> The BinX library 0101010101 <short-16> 100 </short-16>

  26. The BinX Utilities • DataBinX generator • DataBinX splitter • SchemaBinX creator • Binary file indexer

  27. DataBinX generator • Put binary data inside XML • For browsing, web service return, query result set <dataset> … … </dataset> The BinX library <short-16> 100 </short-16> 0101010101

  28. DataBinX splitter • The reverse of DataBinX generator • Generate binary file for testing, transportation • Cross-platform (byte order) <dataset> … … </dataset> The BinX library <short-16> 100 </short-16> 0101010101

  29. SchemaBinX creator • GUI and Web-based utilities • Build BinX document interactively • Create a BinX document based on another

  30. X Y 0000 0004 Binary file indexer • Generating indices for binary data files • Such indices can be used for fast data access <dataset> … … </dataset> The BinX library 0101010101

  31. Applications for astronomy • FITS and VOTable conversion DataBinX Utility BinX library Core SIMPLE = T … … END 01010101 <?xml version=. <VOTABLE> … … </VOTABLE>

  32. FITS →DataBinX →VOTable • FITS to VOTable conversion DataBinx Utility FITS XSLT transformer DataBinx Schema BinX Preprocessor XSLT VOTable

  33. VOTable→DataBinX→FITS • VOTable to FITS conversion Schema BinX DataBinx Utility XSLT transformer VOTable Binary Data DataBinx XSLT Post processor Preprocessor FITS FITS Header

  34. FITS-VOTable experiment • Sample FITS file • A data table of 82 rows X 20 fields • File size: 37KB • Generated DataBinx by DataBinx utility • Time spent: 268 ms • DataBinx document size: 1.2MB • VOTable transformed by MSXML • Time spent: about 1 second • VOTable document size: 51KB

  35. Possible future releases • DataBinX parsing • Utilities (GUI BinX editor) • XPath-based data query • DFDL support • Preserving special tags • For comments, application-specific tags • Text file support

  36. Features or issues to consider • Converting floating point numbers • 80-bit, 96-bit, 128-bit floating point • Array manipulation (slice, section) • SAX-based XML document parsing • Use cases in place of DOM parsing • Built in the library or as add-on component? • Database support • Annotating database tables? • Query database tables through BinX? • Java version of the library • Keeping exactly the same features with the C++ version? • Supporting XQuery • Query binary data files with XQuery on BinX

  37. Support • For problems of usage: • http://www.edikt.org/binx (coming soon) • support@edikt.org • For requirements and suggestions: • tedwen@edikt.org • robertc@edikt.org

More Related