200 likes | 317 Views
This document discusses the implementation of reference linking in the Physical Review Online Archive (PROLA) managed by the American Physical Society. It outlines the challenges of linking bibliographic references to full-text articles and explores three main approaches: static, dynamic, and semi-dynamic linking. The semi-dynamic approach is highlighted for its cost-effectiveness and efficiency. The statistics on articles and references illustrate the extensive scope of PROLA’s resources, emphasizing the importance of robust linking for advancing knowledge in physics.
E N D
Implementing Reference Linking in PROLA Mark Doyle Manager, Product Development The American Physical Society http://prola.aps.org/ CrossRef - Boston, MA
The American Physical Society • 40,000+ members • Founded in 1898 • Mission: “diffusion and advancement of knowledge of physics” • Publisher of Physical Review journals and Reviews of Modern Physics • 14,500 articles per year (100,000 pages per year) CrossRef - Boston, MA
What is PROLA? • Physical Review Online Archive • Covers all APS journals from 1893-present, but only 1893-1998 available • Separate subscription from current content journals • 1 year “migrated” each year • APS corpus is 330,000 articles CrossRef - Boston, MA
The Basic Problem • References in an article’s bibliography needs to linked to the full text article • Citation metadata given: author, journal, volume, page (or other enumeration) • Identify metadata, query linking partners, store results, create links for end users • Keep links up to date, keep system robust and fast, keep costs low CrossRef - Boston, MA
Three General Approaches • Static - query for links at time of publication, create a static HTML file with the appropriate links, serve that. • Dynamic - Store linking information in live database which is queried at the time the user requests the web page • Semi-dynamic - Pre-query links, update them periodically, generate HTML with links dynamically CrossRef - Boston, MA
Semi-Dynamic Approach • Lower investment in database technology • Lower costs to mirror • Fast for the user • High availability • Scales well with usage CrossRef - Boston, MA
APS Process Overview CrossRef - Boston, MA
XML File <references> …. <citation cid="C3"><ref><article><refauth>J. J. Boland</refauth>, <journal>Phys. Rev. Lett.</journal> <volume>67</volume>, <pages>1539</pages> (<date>1991</date>);</article></ref> <ref abbrev="prevau"><article><refauth>J. J. Boland</refauth> , <journal>J. Vac. Sci. Technol. A</journal> <volume>10</volume>, <pages>2458</pages> (<date>1992</date>).</article></ref></citation> ….. CrossRef - Boston, MA
Process Overview CrossRef - Boston, MA
Parse XML Bibliographic Record • Parse XML tagged references • Article’s DOI suffix becomes the primary key • Journal, volume, page information becomes a reference ID (J. Vac. Sci. Technol. A 10, 2458 gets mapped to JVacSciTechnolA.10.2458) • Table for DOI, reference id, citation number, reference number • Second table with article metadata for querying process. CrossRef - Boston, MA
Database Schema • ARTICLES (Phys. Rev. DOI, citation number, reference number, reference id) • ARTICLE_DATA (ref_id, first author, journal, volume, issue, enumeration, year) • ARTICLE_LINKS (ref_id, link type, link data) • QUERY_DATES (ref_id, link type, query date). CrossRef - Boston, MA
Query CrossRef and others • Nightly query of CrossRef for new references that don’t have DOI • Track batches in a Scheduler application • Table tracks link source (XREF, ADS, CAS, SPIN, INSPEC), linking data (DOI for XREF) for each reference ID. • Query dates table to track when we last queried something that didn’t match • Periodically rerun queries which haven’t matched CrossRef - Boston, MA
Links in the Database SQL> select link_type,link_data from article_links where ref_id='JVacSciTechnolA.10.2458'; LINK_TYPE LINK_DATA --------- ------------------------------ XREF 10.1116/1.577984 INSPEC JVTAD600001000000400245800000B SPIN JVTAD6000010000004002458000001 ADS 1992JVST...10.2458B CAS 1:CAS:528:DyaK38XltlygtLg%3D CrossRef - Boston, MA
Statistics • 330,000 articles (1893-present) • 6.4 million (journal) references • 3 million Phys. Rev. references • 1.4 million unique non-APS references • 210,000 CrossRef links (1.8 million links total) • Folding in the APS references which are also in CrossRef, about 30% of our references are in CrossRef CrossRef - Boston, MA
Process Overview CrossRef - Boston, MA
XML Linking File <?xml version="1.0"?> <apslinks> <citlink cid="1" rid="1"> <link ref_id="PhysRevLett.62.567” type="APS">PhysRevLett.62.567</link></citlink> … <citlink cid="3" rid="2"> <link ref_id="JVacSciTechnolA.10.2458" type="XREF">10.1116/1.577984</link> <link ref_id="JVacSciTechnolA.10.2458" type="INSPEC">JVTAD600001000000400245800000B</link> ….</apslinks> CrossRef - Boston, MA
Process Overview CrossRef - Boston, MA
Rendered Links CrossRef - Boston, MA
Conclusions • Simple and pragmatic solutions work • Marked up content makes it all fit together (obviates the need for extensive labor) • Modest resources are needed to implement and maintain the system • Scheme is easily expanded to include other linking targets CrossRef - Boston, MA
Contact information • http://prola.aps.org/ • doyle@aps.org CrossRef - Boston, MA