1 / 19

EXPRESS/Binary Report

EXPRESS/Binary Report. David Price ISO TC184 SC4 Toulouse June 2006. Agenda. Status since last ISO STEP in Italy (added) Walkthrough of current EXPRESS/HDF5 mapping Presentation of prototypes and testing results Issue discussion for next draft of mapping Next actions and plans for testing.

hubert
Download Presentation

EXPRESS/Binary Report

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EXPRESS/Binary Report David Price ISO TC184 SC4 Toulouse June 2006

  2. Agenda • Status since last ISO STEP in Italy (added) • Walkthrough of current EXPRESS/HDF5 mapping • Presentation of prototypes and testing results • Issue discussion for next draft of mapping • Next actions and plans for testing

  3. March 2006 Italy STEP Meeting Report Items • Workshop hosted by HDF Group • Workshop Dec 6-8, 2005 • Champaign, Illinois, USA • STEP, ESA, commercial, EXPRESS/Binary and HDF 5 developer attendees • Agenda was • Introduced HDF Group to EXPRESS language and STEP information models • HDF developers provided overview of HDF 5 Concepts and Structures • Walkthrough of EXPRESS/HDF Mapping Draft 0.2 • Presentation by domain experts : AP209 Analysis, STEP TAS, SINDA/G, Ship AP Analysis Needs • Issues/requirements around APIs, programming languages, etc.

  4. Summary Reported at March 2006 Italy STEP Meeting • Many core issues on V0.2 spec addressed at the Dec 2005 workshop at HDF Group US facilities • The basic approach was flawed, V0.2 did not use enough of the HDF capability • V0.3 will be an improvement and should allow better control of efficiency by the application • http://www.exff.org/express_binary • Prototyping will follow V0.3

  5. March 2006 Italy STEP Meeting Action Items • David Price – Publish EXPRESS/HDF Mapping V0.3 due March 24 • Mats Lindeblad – Create New Work Item for June SC4 meeting • David Price - contact Hans-Peter about linking a one-day workshop with the NASA/ESA PDE at the end of April (a day before Monday?) • Keith Hunten – plan session at Eng Analysis sessions at PDES, Inc. Offsite end of March • David/Mats – plan for technical work at June SC4 meeting

  6. Progress Since March • V0.3 published • Short requirements session at PDES, Inc Offsite where the EA team prioritized • Add SELECT • Add redefined attributes (does HDF support this?) • Add schema version attribute (may use URN) • What kind of metadata does NARA required? • National archives project • Also, need a EXPRESS-to-C software to lower barrier to participating in prototyping

  7. Progress Since March (2) • One-day workshop held with pyEXPRESS prototype team lead by Alain Fagot and Hans-Peter • David Price Slides/Notes are available • Post-workshop plan to produce V0.4 • EA requirements • better examples • Incorporate feedback/issues from pyEXPRESS • Editor (i.e. David Price) could not provide sufficient time to the project to produce V0.4 or the EXPRESS-to-C software before June vacation • V0.31 was published June 9 adding proposal for subset of SELECT types (one of the EA team priorities)

  8. Current Mapping Walkthrough

  9. Prototypes and Testing results • pyEXPRESS testing (slides from PDE workshop) • Subset of EXPRESS (e.g. no complex instances) • Based on pyTables 1.3, HDF 1.6.5, Python 2.4 • Using same EXPRESS-based API for P21 and HDF access • HDF is just another backend to the pyEXPRESS API • This is a different approach from what is assumed by the EXPRESS/Binary team where direct HDF API access was assumed (is “programmer ease of use” a very high priority?) • Compression (using ZLIB) and chunking make file smaller and more efficient for read/write • Even PC processors are powerful enough that decompression is faster than file access as HDF lets you only read into memory what you need at any given time • Benchmarks show good results (e.g. 10-50% file size and 75% access times), but also identify areas in the mapping that need improvement (e.g. small HDF files are bigger than P21 and sometimes slower) • STEP TAS will be a NWI in SC4 starting soon

  10. Issue discussion for next draft of mapping • <Technical work goes here> • David can edit source XML for V0.4 draft to include issue resolution we develop today • EA needs • Check V0.31 SELECT support (DONE) • Add redefined attributes (does HDF support this?) (DONE) • Add schema version attribute (may use URN) • pyEXPRESS Cannes issues • Object ID (i.e. pointers) handling code ID = Integer + string (string is pyTable name, generated from EXPRESS name) (DONE) • Unset values for each datatype within the file (DONE)

  11. Issue discussion for next draft of mapping (2) • Issues • Complex/partial entity instances (ANDOR) (DONE) • David Issue = (Multiple) Inheritance? Had something to do with select types. (DONE) • Defined type of array “TYPE x = aggregate of whatever” (TODO) • Complicated types for array values e.g. SELECT (REAL, INTEGER, ENTITY INSTANCE) (DONE) • We will use the same generic object identifier approach to handle these as to handle complicated SELECT types. • Variable length string • HPdK thinks that these cannot be put in a HDF Compound Datatype. Georg found where it the UG seems to say this is allowed 7.1 Complex combinations of datatypes. Maybe it’s a limitation of pyTables? • The current mapping says use Varaible length datatypes but it’s not clear if that’s allowed in a Compound Datatype. • We may have to use the general purpose object id capability and have a dataset somewhere containing varying length strings (or find another solution). It does look like you may have to specify the maximum length of the varying length strings. • (DEFER TO EMAIL WITH HDF)

  12. Instance identifiers • Every hdf5 link and hdf5 dataset has an hdf5 object id that is an unsigned 32/64 bit integer • Issue : Is there a problem with using 64 bit integer as part of entity instance ids on a 32 bit platform (i.e. does this place a limit on file size or interoperabilty?) • H-P thinks the object ids are managed inside a hash table in HDF • Also thinks the object id is not exposed in the hdf API everywhere that we need it • Proposal is to use a tuble of integers that can be used for both an entity instance id and a pointer into the aggregates • (hdf object id, row index)

  13. Complicated Select types • TYPE x = SELECT OF (REAL, INTEGER, LIST OF BOOLEAN, e2); • Proposal is to have each base type in a separate HDF dataset in a separate group • Group for REAL, Group for INTEGER, Group for LIST OF BOOL, etc. • It could be configurable • May have a single dataset for ALL integers in the file used in this way • May have a dataset for each attribute used in this way (similar to how the mapping for aggregate attribute values works now) • For cases where every entity instance that has TYPE x as its domain, you might use the simple type instead of the complicated mapping

  14. Redeclared attributes attribues • Redeclaration things we can address • specialize the attribute domain • Write the encoding of the specialized value in the HDF compound type representing the subtype • type is subtype of original • We only use the object identifier everywhere so this is no problem • rename of attribute • Use new name in HDF compound data type for the subtype • Explicit to derived • Do not put the attribute in the HDF5 compound data type and do not store a value

  15. ANDOR • SCHEMA test; ENTITY a; name : STRING; ENTITY b SUBTYPE OF a; age : INTEGER; x : REAL; ENTITY c SUBTYPE OF a; height : REAL; x: BOOLEAN; Results in test/a test/a/name test/b test/b/name test/b/age test/c test/c/height test/b__c test/b__c/name test/b__c/age test/b__c/height test/b__c/b__x test/b__c/c__x

  16. Next actions and plans for testing • pyEXPRESS testing based on pyTABLES, there is a C Tables API … Should our other testing be based on that? • Can/should we set up another workshop with HDF Group to complete mapping? • DP Action to talk to Mike Folk to about doing something prior to the ISO in October (we remember him saying there was a workshop in DC) • What do testers need to help get them started? • EXPRESS-to-C has been mentioned (if we use C Tables API that’s not useful) • Training? • Test data? • Schemas? • Closing plenary slides for Friday • NWI – Will be created and circulated via telecon before the next ISO STEP meeting.

  17. Notes from Meeting • Are there other sources of MetaData? • Are there other archiving (e.g. NARA) or LTDR standards (e.g. LOTAR)? • If you treat HDF as a “database” what is needed? • What about internal company meta-data? • What about Web-based standards (e.g. Dublin Core)? • Should we just include a generic meta-data “name-value pair” capability? • What about non-STEP data in the same file that the STEP data references (e.g. jpegs)? • Where multiple mappings are still being tested, it is OK to include more than one in the specification. • The specification is currently a guide for prototype testers, not a draft standard. • What are the highest priority requirements? “Performance”, but performance and efficiency of exactly what?

  18. Notes from Meeting (2) • We may need to add some HDF attributes to the Groups and Datasets when they are written to help readers (e.g. number of instances of an entity type that were written) • C Tables API uses this approach so we should look at that to see if we can learn anything for our use. • We need to have more discussion about whether to allow or require writing inverse attribute values into the file, nothing is done there now. • For “read-only files” inverses could be a nice optimization. • Would we need to allow this to be configured? If so, how? • What about the “unnamed inverse” that EXPRESS says exists?

  19. Action Items • HPdK – Find out how to implement the object id using the HDF 5 API • DP – Find email thread on entity instance identifiers from a year ago, it might be useful for the new proposal • AF – Write text to describe the multi-dataset approach to Aggregate Instances, email to DP who will add to spec V0.4 • DP – Read “fixme” from meeting and fix them. • HPdK – Put example HDF5 files on the Web somewhere for others to view. Mapping document too. • ML – Look at what Vivace stuff can be published publicly. • ML – Look at What can be published to the Vivace Forum 2 (unfortunately, these are same dates as Hershey).

More Related