PDS4 Implementation Simplification

Todd King PDS4 Implementation Simplification

Background The first requirement for the Information Model [IM2008] is: 1. The Information Model shall be developed and maintained independent from any specific technology choices, implementations, or expressions. That is: The IM is implementation neutral.

Current Implementation • Early on it was decided that the target implementation was to be XML. • XML Schema was used to express the IM • It was realized that constraining content (by nodes and missions) would lead to a proliferation of XML schema. • In 2012 Schematron was adopted so that content constraints could be expressed as rules.

Build 3b Review • As with past builds we performed detailed reviews of the IM. • Test Coverage: Approx. 80% of the model (excludes 8 Array classes): Product_Observation, Identification_Area, File_Area_Observational, Table_Binary (Record_Binary, Field_Binary), Table_Character (Record_Character, Field_Character), Table_Delimited (Record_Delimited, Field_Delimited), Group_Field_Binary, Group_Field_Character, Group_Field_Delimited, Time_Coordinates, Investigation_Area, Observing_System, Target_Identification, Mission_Area, Discipline_Area, Internal_Reference, External_Reference, Product_Collection, File_Area_Inventory, Product_Bundle, Bundle_Member_Entry, Context_Area, Reference_List, Product_Browse, File_Area_Browse, Product_Document, Product_Context, Agency, Facility, Instrument_Host, Instrument,Mission, Target, Node, Investigation, Target

Review Principals Be consistent • Apply naming conventions and formation rules across the board. Refactor to Simplest Form • Reducing to common denominators (see Field_*) • Removing elements with single values. (see record_delimiter in Table_Character) Eliminate redundant information by: • Removing counts of things that can be counted (see fields in Record_*) • Bytes in data_type where length in bytes is an attribute. Don't allow the undesirable • Don't accommodate bad practices to make "migration" a literal transform. (see data_type and allowing the changing of byte order field by field) Use technology effectively • What is possible with XML schema + Schematron cannot be done with XML Schema alone. Some principals adopted in the Model where based on XML schema alone.

The Epiphany During our build 3b review it was realized that the trifurcation of Table could be addressed with Schematron rules. And that other constraints in the model can be expressed as rules. Simplification and no loss of rigor.

Applied to Table Class From Table_Binary Record_Binary Field_Binary Table_Character Record_Character Field_Character Table_Delimited Record_Delimited Field_Delimited To Table Record Field <sch:pattern> <sch:rule context="pds:Table_Binary"> <sch:assert test="if (pds:encoding_type) then pds:encoding_type = ('Binary') else true()">...</sch:assert> </sch:rule> </sch:pattern> <sch:pattern> <sch:rule context="pds:Table_Character"> <sch:assert test="if (pds:encoding_type) then pds:encoding_type = ('Character') else true()">...</sch:assert> <sch:assert test="if (pds:record_delimiter) then pds:record_delimiter = ('carriage-return line-feed') else true()">...</sch:assert> </sch:rule> </sch:pattern> <sch:pattern> <sch:rule context="pds:Table_Delimited"> <sch:assert test="if (pds:encoding_type) then pds:encoding_type = ('Character') else true()">...</sch:assert> <sch:assert test="if (pds:record_delimiter) then pds:record_delimiter = ('carriage-return line-feed') else true()">...</sch:assert> <sch:assert test="if (pds:parsing_standard_id) then pds:parsing_standard_id = ('PDS DSV 1') else true()">...</sch:assert> <sch:assert test="if (pds:field_delimiter) then pds:field_delimiter = ('comma', 'horizontal tab', 'semicolon', 'vertical bar') else true()">...</sch:assert> </sch:rule> <sch:pattern> <sch:rule context="pds:Table"> <sch:assert test="pds:encoding_type = ('Binary', 'Character', 'Delimited')"> The attribute pds:encoding_type must be equal to on of the following values 'Binary', 'Character', 'Delimited'.</sch:assert> <sch:report test="pds:encoding_type = ('Binary') and pds:record_delimiter"> The attribute pds:record_delimiter is not necessary when pds:encoding_type is 'Binary'.</sch:report> <sch:report test="pds:encoding_type = ('Binary', 'Character') and pds:field_delimiter"> The attribute pds:field_delimiter is not necessary when pds:encoding_type is '<sch:value-of select="pds:encoding_type" />'.</sch:report> <sch:report test="pds:encoding_type = ('Character', 'Delimited') and not(pds:record_delimiter = ('carriage_return_line_feed'))"> The attribute pds:record_delimiter is required and must be equal one of the following values 'carriage_return_line_feed'.</sch:report> <sch:report test="pds:encoding_type = ('Delimited') and not(pds:field_delimiter = ('comma', 'horizontal_tab', 'semicolon', 'vertical_bar'))"> The attribute pds:field_delimiter is required and must be equal to one of the following values 'comma', 'horizontal_tab', 'semicolon', 'vertical_bar'.</sch:report> </sch:rule> </sch:pattern>

Advantages of Simplification • Concise It is possible to reduce Table to a much simpler form and eliminate 14 classes in the XML schema. • Consistent Fewer variants can make documentation, training and applications more concise. • Effective It makes the most effective use of the chosen implementation technologies (XML Schema and Schematron).

Advantages (Cont.) • If applied to the "Array" class • 9 classes can be consolidated to 1. • If applied to "File_Area“ • 13 classes can be consolidated to 1 • If applied to the "Product“ • 31 classes can be consolidated to 1. • And XPath to elements become more uniform. /Product/Identification_Area/logical_identifier

Impact • Documents • Makes them shorter and more concise. • Software • External - None. • Internal - harvest, registry. • Schedule • Minimal. • To meet immediate needs it can be done with hand edits on the XML Schema+Schematron, later it can be automated.

Why Bother? With so much time, effort and resources invested the IM should be self-consistent, concise, and optimized. We want the IM model to be: • Impressive • The wow factor. • Community impression is important. • An improvement • XML is good, but not enough. • Model should be a refinement. • Innovative • Do something to shift the paradigm. • XML Schema + Schematron is new!

Conclusion • The simplification is an editorial pass • Like a page limit in a proposal. • Can you say the same thing with fewer words? • Are you using a minimal set of terms? • We’re about 80% there. • Some areas of the IM are ideal. i.e., Product_Context is very well formed. • Others still need work. • Focus on the deliverables • XML Schema and Schematron is what will be used. • The IM spec is an internal document.

PDS4 Implementation Simplification

PDS4 Implementation Simplification

Presentation Transcript

Logic Simplification

PDS4 1B Assessment Results

PDS4 Mission Report: MAVEN

Mesh Simplification

SBN PDS4 Migration Planning

PDS4 Impact on Roses

PDS4 Demonstration

Data Compression for PDS4

More Simplification

PDS4 Project Report

PDS4 Mission Needs Assessment

PDS4 Project Update

PDS4 Project Report

PDS4 Mission Report: MAVEN

PDS4 Update

Algebraic Simplification

PDS4

SIMPLIFICATION

Text Simplification

Simplification

Data Compression for PDS4