1 / 39

Millennium and XML: Repurposing and Customizing Metadata

Millennium and XML: Repurposing and Customizing Metadata. Lucas Mak and Dao Rong Gong Michigan State University. May 17 - 20, 2009. Today’s Outline. Overview of Metadata Millennium system and XML Overview of XSLT Case Studies Sunday School Books Collection New Book List

hetal
Download Presentation

Millennium and XML: Repurposing and Customizing Metadata

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Millennium and XML: Repurposing and Customizing Metadata Lucas Mak and Dao Rong Gong Michigan State University May 17 - 20, 2009

  2. Today’s Outline • Overview of Metadata • Millennium system and XML • Overview of XSLT • Case Studies • Sunday School Books Collection • New Book List • Conclusions and Observations

  3. Metadata • Structured data or information about an information resource. • Types of metadata: • Descriptive • Administrative/Rights • Preservation • Technical • Structural

  4. Descriptive Metadata • Popular descriptive metadata standards • Dublin Core (Simple & Qualified) • MODS • MARCXML • VRA Core • IEEE LOM • TEI Header • EAD

  5. Innovative XML • XML records from Millennium • Retrieved through HTTP query • Data arrangement based on MARC fields • But MARC field and its subfields are siblings • Optimized for WebPAC display • Brief record (for search result index page display) • Contains data from MARC 245, Publication year, record ID • Full record (for both public and staff MARC display of individual record)

  6. Public display Staff MARC display

  7. Millennium System and XML Millennium MARC Metadata Builder /xrecord XMLServer XML OAIHarvester Delimited Text Content Pro

  8. /xrecord

  9. XML Server XML server query string (search for title “xslt”): http://magic.msu.edu/xmlopac/?xml=<WXREQ_ROOT><KEY>txslt</KEY></WXREQ_ROOT>

  10. OAI Harvester

  11. MetaData Builder

  12. MetaData Builder

  13. Content Pro in Encore

  14. XSLT • Extensible Stylesheet Language Transformation • Current version: 2.0 • “Transformation” means: • Manipulation of XML documents by creating a new document based on the original document • We recommend against multiple bullet indents • Usages in library context: • Crosswalking • Data selection and manipulation • Web display • Example: converting EAD into HTML for web display

  15. XSLT • Uses XPath expressions to select/filter data node • By name of “Element” • <xsl:for-each select="marc:leader"> • By value of “Element” and/or “Attribute” • <xsl:for-each select="marc:datafield[@tag=650 and @ind2='0']> • <xsl:if test="$leader7='c'">

  16. Case Study One • Sunday School Books Collection • 19th century publications by religious societies • 170 titles digitized and cataloged • Data conversion needs • Source: Millennium • Target: Content Pro • Conversions in: • Format: .marc to XML • Schema and Data Structure: MARC to Qualified Dublin Core

  17. Options for Data Migration Create Lists MARCFile MARCXML MARCEdit Millennium Content Pro(QDC) XSLT MARCEdit HTTPQuery InnovativeXML

  18. Segment of Innovative XML Field indicator asvalue of element MARC field/subfield as value of element Siblings

  19. Segment of MARC21XML MARC field/subfield as value of element attribute Field indicator asvalue of element attribute Parent-Child

  20. Segment of MARC21XML • Issues with Innovative XML data conversion needs • Data structured differently from MARC21XML • Availability of existing “Innovative XML to DC/QDC” XSLT? • Not optimized for data manipulation • Complications in data selection • Selection of data node by matching criteria against values in individual elements • A series of matching may be needed for selecting just one node • Efficiency in processing • Multiple upward, downward, and lateral movement involved in data selection

  21. Final Path of Data Migration Create Lists MARCEdit Millennium (.marc) Content Pro(QDC) MARCFile MARCXML XSLT MARCEdit

  22. Design of XSLT • Based on LC’s “MARC To Simple DC” XSLT • Customized mappings according to LC’s suggestions • Crosswalking strategies • Conditional processing (i.e. matching) • boolean ( ), contains ( ), starts-with ( ) • <xsl:if>, <xsl:choose>, <xsl:when> • String manipulation • Used in both conditional processing and data selection for output • substring ( ), substring-before ( ), substring-after ( ), translate ( ), concat ( ), normalize-space ( )

  23. Design of XSLT • Conditional Processing & String Manipulation in De-duplication <xsl:for-each select="marc:datafield[@tag=246]/marc:subfield[@code='a']"><xsl:if test="not(contains($dataField245Lower, translate(substring(normalize-space(.),1,string-length()-1), $upperCase,$lowerCase)))"><xsl:element name="dcterms:alternative"><xsl:value-of select="normalize-space (substring(.,1,string-length()-1))"/> </xsl:element> </xsl:if> </xsl:for-each> Compare MARC 246 against MARC 245 Chop trailing period (.) Converts 245 & 246 into lower case before comparing

  24. Design of XSLT No <dcterms:alternative> for MARC 246

  25. Design of XSLT • Predicate • Used for data selection and de-duplication • <!-- Output MARC 650y as <dcterms:temporal> --> • <xsl:for-each select="marc:datafield[@tag=650 and @ind2='0'] • [not(marc:subfield[@code='y'] = preceding-sibling::marc: • datafield[@tag=650 and @ind2='0']/marc:subfield[@code='y'])]/ • marc:subfield[@code='y']"><xsl:element name="dcterms:temporal"><xsl:value-of select="normalize-space(self::node())"/></xsl:element></xsl:for-each> Selects unique 650$y only Selects LCSH only

  26. Design of XSLT • Hard-coding • Inserted elements that are global to all records <!-- Output <dc:format>application/pdf</dc:format> --><xsl:element name="dc:format"><xsl:text>application/pdf</xsl:text></xsl:element>

  27. Segment of Source MARCXML

  28. Segment of Output QDC XML

  29. Case Study Two • Library’s book lists • Issues with featured list

  30. Case Study Two • Existing New Book List • Newly cataloged books for browse shelf • New approach using XML and XSLT • New features design • Sorting • RSS feed • Customization

  31. New Book List Based on XML File • Millennium XML server outputs two files • Entire new book list over a rolling period of time • List of daily added books • New Book List program output • Book List in HTML format • RSS feed for daily added books

  32. Path of Data Processing Internet EXPECT XSLT Web Server & php Millennium XML output

  33. Design of XSLT

  34. Design of XSLT

  35. Design of XSLT

  36. Putting It Together

  37. Putting It Together

  38. Observations and Challenges • Millennium System and XML • XSLT processor within Millennium and customizing Innovative XML output • Using XML as data source • Large XML file size • XSLT and data processing • XSLT data manipulation • Lack of built-in functions for conditional data looping etc.

  39. Thank you! makw@mail.lib.msu.edu gongd@msu.edu

More Related