1 / 14

PubMed Central and the NLM Journal Archiving Vocabulary

PubMed Central and the NLM Journal Archiving Vocabulary. Why XML?. Preserves structure of an article Lends itself to intelligent processing Human readable – not dependent on technology Portable. PMC Workflow. PubMed Central DTD History. pmc-1.dtd. DTD currently in production

kairos
Download Presentation

PubMed Central and the NLM Journal Archiving Vocabulary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. PubMed Central • and the • NLM Journal Archiving Vocabulary

  2. Why XML? • Preserves structure of an article • Lends itself to intelligent processing • Human readable – not dependent on technology • Portable

  3. PMC Workflow

  4. PubMed Central DTD History pmc-1.dtd • DTD currently in production • Derived from keton.dtd and BMC article.dtd. • Designed to be a simple DTD for online display and archive. • Written with samples from PNAS, MBC, and BMC. • Why a new DTD? • Elements/attributes had to be added to accommodate new journals. • DTD would become cumbersome quickly if we had to keep making changes for each new title. • Original “simplicity” of design would lead to confusing data structures as the dtd expanded. • Moved away from standard XML practices to accommodate source SGML. • Needed an independent review.

  5. The Results pmc-2.dtd • Mulberry’s Suggestions • Create two DTDs: • one for archiving to allow us to convert data from multiple sources to our DTD. • a subset for authoring to allow us to retain some control when publishers create articles to the DTD. • Use proven solutions like XLINK and the XHTML table standard. • Use data models to simplify the DTD.

  6. Harvard E-Journal Archiving Project • The Melon Foundation funded the Harvard Library to study the feasibility of using one DTD for archiving journal articles. • Harvard commissioned Inera, Inc. for the E-Journal Archive DTD Feasibility Study. • Conclusion – yes, it is feasible, but the right DTD does not exist. • A meeting was held in April 2002 to discuss the changes needed to the PMC2 DTD to expand its range to include most any journal. Attendees included PMC, Mulberry Technologies, Inc. (consultant to PMC), The Mellon Foundation, The Harvard Library, and Inera (consultant to Harvard-Mellon).

  7. Conclusions • PMC and Harvard-Mellon had different ideas about what the DTD should do. • Harvard was interested in an Interchange DTD, which would allow publishers to submit in multiple formats, which would all be valid. PMC was interested in an Archive DTD, which would be open enough to allow conversion of multiple sources into one single format. • 2. If the PMC2 DTD was modularized, and some pieces were added (like the OASIS table model), many DTDs could be built using the same elements, giving both flexibility and consistency.

  8. Status • The “NLM Archiving and Interchange DTD Suite” has been created and released. • Mulberry and Inera analyzed hundreds of journals across subjects to insure that the DTD Suite was powerful enough to tag them. • The “NLM Journal Archiving DTD” and the “Journal Publishing DTD” have been created from the DTD Suite. • The Archiving DTD and the Suite were circulated through the Mulberry’s and Inera’s contacts in the electronic publishing world for comments and suggestions. Suggestions that made the DTD more useable were incorporated.

  9. Journal Archiving and Interchange DTD • Purpose is to preserve journal’s intellectual content • Written for • ease of conversion (from other DTDs) • completeness (union of current journal DTDs) • Characteristics • descriptive (tag what is there) • inclusive (preserve as much tagging as possible) • non-enforcing (there is no right way) • almost nothing required • very little required sequence (metadata in order, little else) • many large OR groups (do anything here)

  10. Journal Publishing DTD • Purpose is to provide guidance in creating new journal material • Written for • authoring article content • initial tagging of non-XML content • creating consistent structures • Differences from the archiving • smaller (not as many elements) • prescriptive (not as many choices) • enforcing (there is one way to do many things) • more required elements

  11. 2.0 – What is Archiving? • Archiving the submitted file • or • Archiving the article content? • <x>, </x> - the bone of contention in lists, author lists, etc. • In 2.0, Archiving and Publishing DTDs will allow for both types of archiving. • Creating a third “Authoring DTD”

  12. Who Owns the Tagset? The DTDs? • Not “Open Source” • DTDs and Tagset are in the public domain • NLM retains control over changes and additions to the Tagset and DTDs • But: Anyone may create a new DTD from or use them without permission from NLM

  13. What’s Next?: Other DTDs • Because the DTD is built as a set of DTD modules, other document types can be created (relatively) easily using the same content models. • We are building a Books DTD and planning an Online Documentation DTD.

  14. Links • PubMed Central – • http://www.pubmedcentral.gov • NLM DTDs and documentation • http://dtd.nlm.nih.gov • jeffbeck@nih.gov

More Related