150 likes | 337 Views
OAI @ CERN. OAI Open Day for Europe February 26th 2001, Berlin, Germany Jean-Yves Le Meur CERN Document Server project leader. Background: CERN Library. It contains: HEP documents: preprints, books, journals, photos, notes, presentations, meeting agendas, etc
E N D
OAI @ CERN OAI Open Day for Europe February 26th 2001, Berlin, Germany Jean-Yves Le Meur CERN Document Serverproject leader Jean-Yves Le Meur CERN
Background: CERN Library It contains: • HEP documents: preprints, books, journals, photos, notes, presentations, meeting agendas, etc • 430 000 bibliographic records; 170 000 full text documents • Aleph 300 library system (Ex-Libris) • Customized Web interface: WebLib • Software built on top of Aleph APIs (RPC) • Two main servers: weblib and doc • A separate MySQL database for ‘non library’ documents Jean-Yves Le Meur CERN
Community Users are: • Physicists at CERN and all over the world • Distinct hosts counted in 2000: • Total of 127 000distinct hosts 8 000 at CERN 93 000 outside CERN (26 000 unresolved IP) • In average, 20 000 distinct hosts per month Jean-Yves Le Meur CERN
OAI @ CERN history Metadata acquisition (since 1994) • Manual: collection of scanned documents • Electronic: • Web & email submission mecanism • Uploader application for metadata transformation • Checked by human • Long term storage system with an open interface for collecting the metadata Involvement in OAI (1999) • Close follow up since Santa Fe meeting • Straightforward objectives for CERN: • Metadata exchange simplification • Metadata proof read saving Jean-Yves Le Meur CERN
OAI 1.0 @CERN status A test collection: • composed of books and eprints • 30 000 records extracted from our Library system • Stored in a MySQL database (based on MARC 21) OAI 1.0 compliant with: • Three formats supported: oai_dc, oai_marc and oai_rfc1807 • All functions implemented: Identify, ListSets, ListMetadataFormats, GetRecord, ListIdentifiers, ListRecords • oai:cerncds:xxxx ready but not in production yet Jean-Yves Le Meur CERN
Implementation Existing Infrastructure: • MARC 21 in use at CERN • MySQL database with PHP interfacing • Advanced search interface • Multiple output (display) formats OAI “plug-ins”: • New arguments added to search.php engine: verb=, etc • New output formats added to the supported set • About three full working days Jean-Yves Le Meur CERN
Example: Identify <?xml version="1.0" encoding="UTF-8" ?> - <Identify xmlns="http://www.openarchives.org/OAI/1.0/OAI_Identify" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/1.0/OAI_Identify http://www.openarchives.org/OAI/1.0/OAI_Identify.xsd"> <responseDate>2001-02-23T10:59:44+01:00</responseDate> <requestURL>http://cdsdev.cern.ch/casalini/search.php?verb=Identify</requestURL> <repositoryName>CERN Document Server</repositoryName> <baseURL>http://cdsdev.cern.ch/casalini/search.php</baseURL> <protocolVersion>1.0</protocolVersion> <adminEmail>mailto:cds.support@cern.ch</adminEmail> - <description>- <oai-identifier xmlns="http://www.openarchives.org/OAI/oai-identifier" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/oai-identifier http://www.openarchives.org/OAI/oai-identifier.xsd"> <scheme>oai</scheme> <repositoryIdentifier>cerncds</repositoryIdentifier> <delimiter>:</delimiter> <sampleIdentifier>oai:cerncds:1</sampleIdentifier> </oai-identifier> </description>- <description>- <eprints xmlns="http://www.openarchives.org/OAI/eprints" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/eprints http://www.openarchives.org/OAI/eprints.xsd">- <content> <URL>http://cdsdev.cern.ch/casalini/</URL> </content>- <metadataPolicy> <text>Free and unlimited use by anobody.</text> <URL>http://cdsdev.cern.ch/casalini/</URL> </metadataPolicy>- <dataPolicy> <text>Full content, i.e. preprints may not be harvested by robots</text> </dataPolicy>- <submissionPolicy> <URL>http://cdsdev.cern.ch/casalini/</URL> </submissionPolicy> </eprints> </description> </Identify> Jean-Yves Le Meur CERN
Example: ListMetadataFormats <?xml version="1.0" encoding="UTF-8" ?> - <ListMetadataFormats xmlns="http://www.openarchives.org/OAI/1.0/OAI_ListMetadataFormats" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/1.0/OAI_ListMetadataFormats http://www.openarchives.org/OAI/1.0/OAI_ListMetadataFormats.xsd"> <responseDate>2001-02-23T11:04:25+01:00</responseDate> <requestURL>http://cdsdev.cern.ch/casalini/search.php?verb=ListMetadataFormats</requestURL>- <metadataFormat> <metadataPrefix>oai_dc</metadataPrefix> <schema>http://www.openarchives.org/OAI/dc.xsd</schema> <metadataNamespace>http://purl.org/dc/elements/1.1/</metadataNamespace> </metadataFormat>- <metadataFormat> <metadataPrefix>oai_marc</metadataPrefix> <schema>http://www.openarchives.org/OAI/oai_marc.xsd</schema> </metadataFormat> <metadataFormat> <metadataPrefix>oai_rfc1807</metadataPrefix> <schema>http://www.openarchives.org/OAI/rfc1807.xsd</schema> </metadataFormat> </ListMetadataFormats> Jean-Yves Le Meur CERN
Example: GetRecord <?xml version="1.0" encoding="UTF-8" ?> - <GetRecord xmlns="http://www.openarchives.org/OAI/1.0/OAI_GetRecord" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/1.0/OAI_GetRecord http://www.openarchives.org/OAI/1.0/OAI_GetRecord.xsd"> <responseDate>2001-02-23T11:09:17+01:00</responseDate> <requestURL>http://cdsdev.cern.ch/casalini/search.php?verb=GetRecord&identifier=oai%3Acerncds%3A2229111&metadataPrefix=oai_dc</requestURL> - <record>- <header> <identifier>oai:cerncds:2229111</identifier> <datestamp>2000-11-16</datestamp> </header>- <metadata>- <dc xmlns="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://purl.org/dc/elements/1.1/ http://www.openarchives.org/OAI/dc.xsd"> <subject>Accelerators and Storage Rings</subject> <creator>Katz, Ulrich F</creator> <title>Deep inelastic positron-proton scattering in the high-momentum-transfer regime of HERA</title> </dc> </metadata> </record> </GetRecord> Jean-Yves Le Meur CERN
Example: ListIdentifiers <?xml version="1.0" encoding="UTF-8" ?> - <ListIdentifiers xmlns="http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers http://www.openarchives.org/OAI/1.0/OAI_ListIdentifiers.xsd"> <responseDate>2001-02-23T11:15:37+01:00</responseDate> <requestURL>http://cdsdev.cern.ch/casalini/search.php?verb=ListIdentifiers</requestURL> <identifier>oai:cerncds:101</identifier> <identifier>oai:cerncds:103</identifier> <identifier>oai:cerncds:105</identifier> <identifier>oai:cerncds:107</identifier> <identifier>oai:cerncds:108</identifier> <identifier>oai:cerncds:109</identifier> <identifier>oai:cerncds:110</identifier> <identifier>oai:cerncds:112</identifier> <identifier>oai:cerncds:113</identifier> <identifier>oai:cerncds:117</identifier> <identifier>oai:cerncds:118</identifier> <identifier>oai:cerncds:119</identifier> <identifier>oai:cerncds:120</identifier> <identifier>oai:cerncds:121</identifier> ….. </ListIdentifiers> Jean-Yves Le Meur CERN
Example: ListRecords (oai_marc) … <varfield id="072" i1=" " i2="7"><subfield label="a">Mathematical Physics and Mathematics</subfield><subfield label="2">CERN-CDS</subfield></varfield><varfield id="245" i1="1" i2=" "><subfield label="a">Sechs Vorträge über ausgewählte Gegenstände aus der reine Mathematik und mathematischen Physik</subfield></varfield><varfield id="909" i1="c" i2="a"><subfield label="a">BOO</subfield><subfield label="b">21</subfield></varfield></oai_marc> … <subfield label="a">Biography, Geography, History</subfield><subfield label="2">CERN-CDS</subfield></varfield><varfield id="100" i1=" " i2=" "><subfield label="a">Leroy, Francis</subfield></varfield><varfield id="245" i1="1" i2=" "><subfield label="a">Dictionnaire encyclopédique des prix Nobel de médecine</subfield></varfield><varfield id="650" i1=" " i2=" "><subfield label="a">Nobel prize winners</subfield><subfield label="a">chemistry</subfield></varfield><varfield id="909" i1="c" i2="a"><subfield label="a">BOO</subfield><subfield label="b">21</subfield></varfield></oai_marc> … Jean-Yves Le Meur CERN
Implementation Issues How to limit the OAI collection ? • Sub part of the whole collection • Depending on an existing or extra field e.g.: “CERN” in the report number e.g.: OAI tag inside all records • Test collection fully separated Which identifier to use ? • Document number(meaningful but may not exist) • Internal system number(always exist but meaningless) How to define sets ? • Within the HEP data providers, subjects (sets) are different • No limitation in the length of a set ? (GET/POST) E.g.:Library_Catalogue:Articles_and_Preprints:Theses:Detectors_and_Experimental_Techniques Jean-Yves Le Meur CERN
General Issues Harvester distinction ? • Kind of “OAI Intranet” would be useful • Different sets for different partners ? OpenUrl in OAI ? • OAI format already as a Web output format in our test collection (e.g.: search by author and give OAI output) • Agreed protocol necessary for searching many OAI compliant sites in parallel Full text Data provider within OAI ? • Full text exchange with agreed protocol Increase metadata quality ? • Too little mandatory tags in DC • Specific tags agreed for specific communities Jean-Yves Le Meur CERN
Future Short term • CERN as data provider … for CERN specific collections • CERN as data harvester (and service provider) … for High Energy Physicists Long term hopes • All HEP institutes OAI compliant … for metadata AND data • Parallel searching possible (with OpenURL protocol) • OAI also used inside CERN between various applications (Engineering Database, Administrative Documents…) to build the CERN electronic archive Jean-Yves Le Meur CERN
Questions ? http://cds.cern.ch Note: Workshop on the Open Archives Initiative and Peer Review journals in Europe CERN, Geneva, March 22-24 2001. http://doc.cern.ch/OAI/ Jean-Yves Le Meur CERN