1 / 29

Tools used for testing and long-term preservation

Tools used for testing and long-term preservation. Terje Pettersen-Dahl, adviser Department of electronic archives (Elark), National Archives of Norway. Bern, 10.4.2003. System types. Registry-based ERMs. Specialized case handling systems. Information systems. Arkadukt. Noark. ADDMML.

melony
Download Presentation

Tools used for testing and long-term preservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tools used for testing and long-term preservation Terje Pettersen-Dahl, adviser Department of electronic archives (Elark),National Archives of Norway Bern, 10.4.2003

  2. System types Registry-based ERMs Specialized case handling systems Information systems Arkadukt Noark ADDMML ArkN3 Arkade

  3. Original system Overview ADDMML-file Structural-description Arkadukt New ADDMML-file Arkade For Long term preservation Data files New data files AnalysisChecksControls Access

  4. Choice of method • Migration. • Preserving only extracts of data from the databases. • Extracts on a software and hardware independent format. • In addition to the extract we need technical metadata about the extracts.

  5. Metadata • The metadata has to be standardized • The National Archivist has established a national standard for metadata called ADDMML (Archives Data Description and Manipulation Mark-up Language). • ADDMML is an XML DTD.

  6. Structure in ADDMML The structure is hierarchical. A simple extract is called a dataset. A dataset can contain one or more files. A file may contain one or more tables. Tables contains some fields. Fields may contain codes. Dataset File Record-type Field Code

  7. Arkadukt • Arkadukt produces ADDMML-files that always are 100 % correct syntactic. • This is a must for Arkade. • The user does not need to know anything about ADDMML! • The metadata itself is registered as plain text. • Simple registration. • Adjusted to the structure in ADDMML.

  8. Arkadukt

  9. Arkade • Arkade has following functions: • Conversions • Analysis • Checks and controls • Special-functions • Additionally some functions can be initiated from Arkade: • Creation of SAS-dataset • Random quality testing of records

  10. Conversion • Arkade can convert data on different terms. Examples are: • Convert from one character-set to an other. • Convert from one file-format to an other. • Change record-delimiter. • Unpack packed fields. • Split repeating groups or record-types into different files. • Convert from one field-format to an other. • All conversions are initiated by processes in the ADDMML-file.

  11. Analysis Arkade does analysis of data on different levels • File level. • Count total number of records in the file. • Count total number of characters in the file.

  12. Analysis (cont.) • Record-type level. • Find minimum- and maximum-length for records of this type in the file. • Find number of fields in records of this type, eventually minimum- and maximum-number if the number varies. • Count number of records of this type in the file. • Produce sorted frequency-lists of the values throughout the file for each field in the record-type. • Produce cross-reference table for two specified fields from the same record-type.

  13. Analysis (cont.) • Field level • Count number of empty (NULL) and non-empty values in the field throughout the file. • Find length and record-number for the shortest and longest data-value (ex padding) in the field. • Find minimum- and maximum-value (including record-number) in the field. • Produce sorted frequency-lists of the values throughout the file in this field. All analysis are initiated by processes in the ADDMML-file.

  14. Checks and controls As analysis checks are done on different levels. • File level • Check if given record-length is correct. • Check if given number of record-types is correct. • Check if given number of records is correct. • Check if given number of characters is correct.

  15. Checks and controls (cont.) • Record-type level • Check whether primary key is unique and do not contain any empty value (NULL). • Check whether secondary key is unique and do not occurs with empty value (NULL). • Check whether foreign key either are empty or exists in the referenced file. Additionally if the given type of relation is correct. • Check if given record length is correct. • Check if given minimum record length is correct. • Check if given maximum record length is correct.

  16. Checks and controls (cont.) • Record-type level (cont.) • Check if given number of fields is correct. • Check if given number of records of this type is correct. • Field level • Check if given field length is correct. • Check if given minimum field length is correct. • Check if given maximum field length is correct. • Check if given data-type and field format is correct. • Check whether the field always has a value (no NULL).

  17. Checks and controls (cont.) • Field level (cont.) • Check on uniqueness. • Check given codes against a specified code-set. All checks are initiated by processes in the ADDMML-file.

  18. Special-functions Additionally Arkade has a few special functions: • Control of control-digits in birth-number. • Control of control-digits in account-number. • Add key-fields in record-types where these are not given (Key-values are given indirectly by the records internally positions to each other). All special-functions are initiated by processes in the ADDMML-file.

  19. SAS-dataset • Arkade can generate an internal dataset. As Arkade is made in SAS, this internal dataset will be a SAS-dataset. • The SAS-dataset can be used further to: • Sort tables • Do an extract • Make statistics • Make a basis for a public version. • Generation of SAS-dataset are initiated from the screen.

  20. Random tests • Arkade can do random tests on the extracts. Examples: • Look at the first 100 records only.(The number can vary and is decided by the user.) • Look at each 25. record.(Once again the number is decided by the user.) • Only test the ADDMML-file without doing anything with the extracts. • Random tests are initiated in the screen. • Random tests are mainly used to check syntax and conformity in the data-files.

  21. Arkade

  22. Conditions for Arkade • Arkade is dependent of a correct ADDMML-file. • To run Arkade there must be data-files, and the references to the data-files have to be correct. • Even most logical dependencies have to be correct.

  23. ArkN3 • Imports data in the format described in the Noark-3manual. • Tests whether the described format is followed. • Presents cases and registry-records. • Makes it possible to search on different levels. • Does an analysis on the imported data.

  24. International view Dublin Core ISAD(G) EAD ADDMML

  25. ISO 15489 and MoReq versus Noark • These new standards are in close harmony with Norwegian theory and Norwegian requirements • But Noark is not a general records management-standard • Noark = a detailed application standard, initially for registry systems

  26. ISO 15489 and MoReq versus Noark • Registry- and case handling workflow is integrated in Noark: 1) Registry handling control: follow-up- and “sign-off”-functions connected to case management (MoReq’s workflow-functions are related to capture, retention and availability/distribution) 2) Process management – implements the general specification in MoReq, but is closely related to registry handling and case handling in Noark 3) Board-handling (described in great detail, but only an option in Noark)

  27. ISO 15489 and MoReq versus Noark • MoReq-elements which are given less consideration in Noark: • ”freezing” of metadata • audit trails • “robust” metadata capture • Necessary to map Noark to MoReq’s requirements • It is important for us to have a standard which is related to Moreq, • Market considerations (Norwegian suppliers export opportunities to EU-countries - and vice versa)

  28. General RM-standard • I addition to Noark there is a need for en general Norwegian RM-standard based on MoReq • for systems without registry functions which generate and manage records • E.g. it is necessary with a category for file which is more general and liberal then the category “case” in Noark • A general standard is also necessary to avoid discrimination of EU-suppliers who offer MoReq-based solutions in Norway

  29. RM-standard: possible Norwegian model Specific RM (Process management) Board handling Case handling & RM workflow Case handling info. in registry Registry- & Noark- based process mgmt.*) Other case handling & workflow Not registry- & Noark- based process mgmt. *) Noark also requires defined levels of functionality in Basic RM Basic RM (Doc. & metadata capture and other MoReq-specified functions) ”May” ”Should” Level of requirements: ”Must” Basic workflow

More Related