1 / 42

Processing electronic literature: CERN case study

Processing electronic literature: CERN case study. C. Pettenati (ETT-SI) M. Draper (ETT-DH). CERN. Presentation plan (1). The CERN Library Definitions Grey literature management Current services CERN grey literature collection Submission & Acquisition services

bazyli
Download Presentation

Processing electronic literature: CERN case study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Processing electronic literature: CERN case study C. Pettenati (ETT-SI) M. Draper (ETT-DH) CERN

  2. Presentation plan (1) • The CERN Library • Definitions • Grey literature management • Current services • CERN grey literature collection • Submission & Acquisition services • Consultation & Dissemination services

  3. Presentation plan (2) • Tools available to the readers • Future perspectives for grey literature at CERN • Architecture • Hardware configuration • Software architecture • Re-usability

  4. CERN - European Organization for Nuclear Research • European Laboratory for Particles Physics • Fundamental research • Founded in 1954 in Geneva, Switzerland • 20 member states • 540 universities and laboratories, 7000 researchers, 90 nationalities • 5 accelerators, more than 1000 experiments and collaborations • Current year budget: 939 MCHF

  5. The CERN Library • A central unit and four satellites • Few monographs, less than 40,000 • 500 open subscriptions to scientific journals • 400 titles available electronically in full text • A very important collection of grey literature, more than 350,000 documents (with full-text electronically available from February 1994 onwards)

  6. Definitions (1) • The CERN grey literature collection is composed • Documents prepared to be submitted to scientific journals • Documents submitted to conferences • Theses • CERN internal notes (Committee papers, Proposals) • External reports • Pictures (photos & diagrams) • Videotapes on academic training (partly “webcasted”) • Administrative Documents (separate protected access) • CERN internal publication (weekly bulletin)

  7. Definitions (2) • Open Archive • A submission mechanism • A long term storage system • A management policy for submission and preservation • An open interface to let third parties collect data from the archive The CERN Preprint Server was an Open Archive a long while before this definition was set up last year in Santa Fe (see http://www.openarchives.org)

  8. Number of accesses to the CERN Library catalogue

  9. CERN Library collections

  10. Grey literature acquisition procedures • Direct electronic submissions • Official series • Open series • Theses • Downloading from other grey literature servers • Los Alamos, DESY, SLAC, Fermilab, etc. • Email based application: the Uploader • Digitization of paper documents • Exchange with other labs (Annual reports) Harmonization of the record description

  11. E-Submission Web Submission options: • Bibliographic Notice Input/Update • Fulltext document Transfer or Link (TeX, Word, PDF, HTML) • Revised version Transfer • Alert an e-mail distribution list • Forward to Printshop and Mail Office • Ask for approval (internal & scientific notes)

  12. Provenance • More than 40,000 documents processed per year • Internal to CERN 10% • External 90%

  13. Document prepared for publication: Preprints • They are sent to the CERN Library and at the same time submitted to the publisher of a scientific journal • They are distributed via the Library Web server the day after submission • In general they will be published much later, after 8-24 months

  14. Preprints processing procedure Submission to the Library record, text, figures Weekly list preparation Visibility on Internet the day after Input of the publication note Record updating INSPEC, conference proceedings, SLAC db, authors, ... 1 day 1 week 8 - 18 months ???? Article publication Submission to a journal

  15. Access to the preprint full text Access to the published text

  16. CERN Preprint full-text server Record # 123 Author: ....... Title .... Electronic journal Publisher server EXT : URL ... Pub. note Tit. AA, vol. pp ... URL: ..... CERN algorithm

  17. Accepted Tex/Latex Word TIFF HTML ... Distributed PDF PS HTML TIFF GIFF ... Document formats

  18. Formats elaborated by the electronic submission • Conversion from Tex/Latex to PS • Conversion from Word to PS • Conversion from PS to PDF

  19. Text trasmission • FTP by the author him/herself • FTP requested by the CERN Document Server • Automatic transfer from a Web server

  20. Citations management • The document PS format is analysed and citations are automatically extracted • If the cited document is also in the CERN database a link is inserted next to the citation • The citations can not always be safely processed automatically

  21. Documents submitted to conferences • In general they are prepared at the last minute … • Often the submission to the Library is forgotten • These documents are published later • On the conference server or • As printed conference proceedings • As independent monograph or • Included in a specialized journal • Hard and intensive work to discover them

  22. Annual reports • In general received as exchange • More and more often available electronically • Now processed as periodicals • One record, several issues • Automatic claiming • Link to a new title if required

  23. Theses • Degree and post-graduate • Prepared • On CERN equipment and/or • Under CERN staff supervision • In general defended 12-18 months later • Difficult to retrace

  24. Preprints electronic submission Full Aleph FTP ILAS T ext TIFF server CERN LAN TEX, LA TEX, WORD, HTML, ... PC MAC X

  25. CERN grey literature Internet GIF , TIFF , Aleph HTPP PS, PDF , HTML CERN LAN PC MAC X Preprints distribution Full ILAS T ext server

  26. Architecturesoftware MySQL Database PHP/Perl scripting /hardware Configuration DB (EDS) Submission + Services EDS DOCUMENT Electronic document submission SUN SPARC 450 4 CPUs 250 MHz 80 GigaBytes CDS SUN SPARC 450 3 CPUs 300 MHz (ORACLE DB) Aleph Link Manager CERN Document Server Metadata database C programming QUERY ACCESS Aleph APIs C programming (CGI) Java interface WEBLIB WWW interface MySQL Database PHP/Perl scripting Configuration DB (WebLib)

  27. Re-usability • Complete system • Modular: parts can be re-used • Software: • All sources are freely distributed • Databases • Aleph integrated system: commercial (Oracle based) • MySQL databases: freeware • Existing configuration tools • New functions easy to attach

  28. Tools available to the end users • Need to involve directly the readers in the search • Four groups of tools to: • Search • Access • Transfer • Manage

  29. Consultation & Dissemination (1) • Graphical User Interface:WebLib • All catalogues with “Find” and “Browse” • Available indexes on authors, titles, subjects, report numbers, etc. • Words searchable on all fields (including abstracts) • Output sort options • Record metadata available in HTML, LateX or PDF • Navigation & Search can be set up by institute, year, subject, etc. • Search history available • Downloading mechanism for many formats (PS, PDF, GIF, etc.) • Linking capabilities for book records to booksellers' records

  30. Consultation & Dissemination (2) • Personal Virtual Library • Results displayed in various formats (brief, detailed or personal) • Individual Alert mechanism (SDI) to e-mail new records • Personal shelf (basket) to keep searches, items, formats & profiles • E-prints • Record description is updated with the publication notes (Journal title + vol/year + starting page number) • Dynamic linking from the notice to the published article • Dynamic linking from the citations of the document to the article • Availability of the link to users with a subscription to the e-journal

  31. Document access tools • Web • Z39.50 client/server • Different formats (PDF, PS, TIFF, GIF, HTML, …) • Document size continuously increasing • Strong need for increased bandwith

  32. Usage measurement • Statistics collection • By country • By IP domain • By IP number • By type of format • By slice of time

  33. How to prepare a virtual library • The final goal is to provide the end reader with a complete toolbox to search, find, reach, download, use and manage the documents • There are no universal recipes • The CERN Library tries to find its own balance between traditional and electronic literature

  34. Basic components of the CERN virtual (digital) library • An integrated library automation system • A graphic User Interface • A network with enough bandwidth • A CD-ROM LAN • An electronic document delivery tool • A collection of external electronic resources • Electronic journals • Grey literature servers • Use of the protocols HTTP, Z39.50 (SR-U)

  35. Future of grey literature at CERN • Usage of the XML format • More intensive distribution before publication • Preparation of metadata done directly by the author • Use of specialized network search engine

  36. Network Article and DC (Dublin Core) metadata Author webmaster Search Service convert DC metadata website

  37. Conclusion • More and more important role for the grey literature • Contraction of the number of traditional scientific publications • Exponential growing of spontaneouselectronic journals

  38. QUESTIONS?

More Related