1 / 16

Metadata Acquisition with XML

Metadata Acquisition with XML. Case studies from the Swiss Federal Archives 9. October 2002 / Stephan Heuscher. Overview. Problems acquiring metadata Why XML? Featured Projects Lessons learned Conclusions. Problems acquiring metadata. Documentation Data format Data consistency

ilana
Download Presentation

Metadata Acquisition with XML

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Acquisition with XML Case studies from the Swiss Federal Archives 9. October 2002 / Stephan Heuscher

  2. Overview • Problemsacquiring metadata • Why XML? • Featured Projects • Lessons learned • Conclusions Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  3. Problemsacquiring metadata • Documentation • Data format • Data consistency • System borders • Money • Communication with stakeholders Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  4. Why XML? XML … • … is an open standard • … is self-explanatory • … is human-readable • … can be validated automatically • … has a broad software support • Most products feature XML support Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  5. SIARD Archiving of relational databases Manual generation of additional metadata Metadata and content is stored in XML files AMDA Manages metadata for audio data from the Swiss Parliament Does not manage audio data Import of XML metadata Must provide a variety of export formats Featured Projects Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  6. SIARD (System Independent Archiving of Relational Databases) Oracle MS-SQL ???-DB Database regeneration Data and low-level metadata extraction Digital Archive (to be built) Additional high-level descriptive metadata Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  7. XML use in SIARD • SQL-99 (ISO/IEC 9075) • Low-level data description • Structure • Datatypes • Constraints • XML • High level metadata • Table content (thin wrapper) Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  8. Data Logic (SQL) CREATE TABLE "FLUGLE"."CLASS" ( "CLASS_ID" NATIONAL CHARACTER VARYING(20) NOT NULL , "SCHEDULE_ID" NATIONAL CHARACTER VARYING(20) , "CLASS_BUILDING" NATIONAL CHARACTER VARYING(25) , "CLASS_ROOM" NATIONAL CHARACTER VARYING(25) , "COURSE_ID" NATIONAL CHARACTER VARYING(5) , "DEPARTMENT_ID" NATIONAL CHARACTER VARYING(20) , "INSTRUCTOR_ID" NATIONAL CHARACTER VARYING(20) , "SEMESTER" NATIONAL CHARACTER VARYING(6) , "SCHOOL_YEAR" TIMESTAMP(0) ) CREATE TABLE "FLUGLE"."CLASS_LOCATION" ( "CLASS_BUILDING" NATIONAL CHARACTER VARYING(25) NOT NULL , "CLASS_ROOM" NATIONAL CHARACTER VARYING(25) NOT NULL ... Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  9. SIARD Metadata XML <?xml version="1.0" encoding="UTF-8"?> <archive> <database product-name="Oracle" product-version="Personal Oracle9i Release 9.0.1.1.1 - Production. With the Partitioning option. JServer Release 9.0.1.1.1 - Production" table-number="22" view-number="4" archiv-size="175KB"> <schemas> <schema tag-name="FLUGLE" table-number="22" view-number="4"> <status sql3="true" integrity="true" archiv="true" reason="0" mandatory="true"/> <tables> <table tag-name="BACKUP_CLASS" column-number="9" row-number="10"> <status sql3="true" integrity="false" archiv="true" reason="3" mandatory="true"/> <columns> <column tag-name="CLASS_ID" sql3type="NATIONAL CHARACTER VARYING" sql3size="(20)" type="VARCHAR2" length="20" precision="" scale="" nullable="false" defaultvalue=""> <status sql3="true" integrity="true" archiv="true" reason="0" mandatory="true"/> </column> ... Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  10. SIARD Data XML <?xml version="1.0" encoding="UTF-16"?> <dmp-file xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="../dmp.xsd"> <schema tag-name="FLUGLE"/> <table tag-name="CLASS"/> <column tag-name="CLASS_ID" sql3type="NATIONAL CHARACTER VARYING" sql3size="(20)" defaultvalue="" nullable="false" constraints="PK:PK_CLASS"/> <column tag-name="SCHEDULE_ID" sql3type="NATIONAL CHARACTER VARYING" sql3size="(20)" defaultvalue="" nullable="true" constraints="FK:FLUGLE.SCHEDULE_TYPE.SCHEDULE_ID"/> ... <data> <row>6,104200;4,S180;9,POCO HALL;3,150;3,198;5,PHILO;4,E491;6,SPRING;19,1997-03-01 00:00:00;</row> <row>6,104500;3,T15;11,NARROW HALL;3,200;3,184;4,HIST;4,D944;6,SPRING;19,1997-03-01 00:00:00;</row> ... Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  11. Audio data AMDA (Audio MetaData Acquisition) Access DB Online parliament session metadata (XML) Webinterface Unified XML import AMDA Metadata Digital Archive (to be built) Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  12. XML use in AMDA • Import • XSLT transformation to common format • Online metadata • Legacy data (Access database) • Export • Raw XML output transformed using XSLT Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  13. AMDA Import XML (raw) <?xml version="1.0" encoding="iso-8859-1"?> <root> <session oid="34695" session_id="session_4609" text_update_time="1002882007656"> <meeting date="20010917" local_time="1430" location="N" oid="34696" publish_status="final"> <subject oid="34697" publish_status="draft" subject_type="gesch"> <gesch_list oid="34698" publish_status="draft" transfer_gesch_list="01.9001;"> 01.9001; <gesch_info oid="000000000"> <a99_gesch last_modified="2001/03/05 14:43:42 GMT+01:00"> <gesch_id raw_id="20019001">2001.9001</gesch_id> <title language="d"> <line>Mitteilungen</line> <line>des Präsidenten</line> </title> </a99_gesch> </gesch_info> </gesch_list> <speech_text audio_channel="N" audio_end="1000729995203" audio_start="1000729751250" speaker_id="9005" turnus_nr="1000" turnus_oid="155989"> <pd_text> <p>Der Beginn dieser Herbstsession ist schmerzlich getrübt von unseren Gedanken an das ... Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  14. AMDA Import XML (transformed) <?xml version="1.0" encoding="iso8859-1"?> <Session id="4609" start="20010917T1430+0200"> <Geschaefte> <Geschaeft nummer="1998.0446" themaDeutsch="Parlamentarische Initiative&#xA;Hämmerle Andrea.&#xA;Post, SBB, Swisscom.&#xA;Arbeitsplätze&#xA;in der ganzen Schweiz" themaFranzoesisch="Initiative parlementaire&#xA;Hämmerle Andrea.&#xA;Poste, CFF, Swisscom.&#xA;Des emplois&#xA;dans toute la Suisse"/> <Geschaeft nummer="2001.9001" themaDeutsch="Mitteilungen&#xA;des Präsidenten" themaFranzoesisch="Communications&#xA;du président"/> ... </Geschaefte> <Verhandlungen> <Verhandlung geschaeftNummern="2001.9001" rat="V" start="1000729751" dauer="244" bulletin="" bulletinSeiten="825"> <Votum start="1000729751" dauer="20" sprache="de"> <Person id="9005" vorname="Peter" nachname="Hess" kanton="ZG" ort="Zug"/> <VotumText>Der Beginn dieser Herbstsession ist schmerzlich getrübt von unseren Gedanken ... Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  15. Lessons learned • Transforming and reformatting of XML data is easy • Documentation and data integrity are crucial • Agree on rules and standards for XML formats early • Stakeholders‘ uses of XML differ greatly Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

  16. Conclusions • XML • is not a preservation strategy • is only a technology • is too new for a common understanding • XML provides tools and techniques for a concise metadata management • Working solutions need both XML and non-XML experience • Most problems are still of human nature Urbino2002.ppt; Stephan Heuscher; Swiss Federal Archives

More Related