1 / 30

e-Prints

Metadata Issues for e-Prints: experiences from setting up an Institutional Repository Jessie Hey Research Fellow TARDis Project University of Southampton ePrints UK Workshop Ashmolean Museum Oxford 22 Mar 2004. e-Prints. A simple illustration of diversity in metadata! EPrints (software)

arin
Download Presentation

e-Prints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Metadata Issues for e-Prints:experiences from setting up an Institutional RepositoryJessie HeyResearch Fellow TARDis Project University of Southampton ePrints UK WorkshopAshmolean Museum Oxford 22 Mar 2004

  2. e-Prints A simple illustration of diversity in metadata! • EPrints (software) • e-Prints (Soton) • ePrints (UK project) • eprints (in URLs, emails) • E-print (Network – US gateway)

  3. Searching for e-Prints in Googlee-Prints 1,200,000; eprints 225,000

  4. Plam pilot? • Looking for a PDA? • Just try searching for plam pilot on eBay • Even a sale is not incentive enough

  5. Metadata • The modern word for ‘Data about data’ • Generally structured data describing an e-Print in this context • Describing an object such as a journal article or book chapter or thesis

  6. Metadata issues for today • Who needs the quality? • What kind of quality? • How we approached it in TARDis • the depositor • the process • classification • mediation • Balancing demands the pragmatic way

  7. Who needs the quality? Service providers (i.e. search services) • Analysis in both e-learning and e-prints communities showed concern about quality of metadata in individual databases to give good search results when combined in cross-domain search services Barton, Jane, Currier, Sarah and Hey, Jessie M.N. (2003) Building quality assurance into metadata creation: an analysis based on the learning objects and e-Prints communities of practice.In:2003 Dublin Core Conference: Supporting Communities of Discourse and Practice - Metadata Research and Applications, DCMI, 39-48. http://eprints.soton.ac.uk/archive/00000020/

  8. As I am in Oxford… • a tribute in Elvish to JRR Tolkien from the Lord of the Rings

  9. Gandalf on Dublin Core metadata • ‘I cannot read the fiery letters,’ said Frodo in a quavering voice. • ‘No’ said Gandalf ‘but I can. ……this in the Common Tongue is what is said, close enough: • One Ring to rule them all, One Ring to find them, • One Ring to bring them all and in the darkness bind them.’

  10. Standards for e-Prints: Dublin Core Metadata Sets • Define minimal metadata elements for simple resource discovery e.g. title, creator, subject and keywords, publisher, date, rights management • Fundamental building blocks for Open Archive Initiative compliant repositories • Software such as GNU EPrints is OAI compliant (in DSpace may need ‘switching on’) • Full text searching (in latest version) will give additional help to compensate for weaknesses

  11. Who needs the quality? • Academics (the depositors) need reasonable quality for their publication record whether full text is available or not • Tendency to think a good citation matters less if access leads straight to the full text An institutional repository needs • To represent their own work well • To represent their faculty and university well • For publicity and communication • For research assessment and proposals • For promotion

  12. What kind of quality? • Fit for purpose – visibility and citability • Rolls Royce or Volkswagon Golf or a Skoda? • The Rolls Royce may not produce a sustainable repository • Library of Congress had to think again with a backlog of millions • A departmental archive had to scrap its editors (too slow) • Need a model with a light touch

  13. Examples to correct From an academic’s current departmental publication record: • Co-author given as Fadden on older references • Given as McFadden on newer ones • McFadden would not find all his papers!

  14. Examples to correct • Authors are not perfect but neither are information specialists or other sources Recent examples: • Author’s assistant put a conference in year 2400 • ‘Web of Knowledge’ put a conference in 2010 NB Amazon proved useful for checking book information from the title page (new Amazon ‘search inside’ service) but main entries may be less accurate

  15. Quality Assurance Procedures • Would like to pick up these and obvious examples of metadata in the wrong field eg book title used for title of chapter • Options include regular checking (e.g at or close to time of deposit or for annual reporting) or random checking • Visualisation techniques promising but still expensive

  16. How we approached it in TARDis • Looked at process from point of view of depositor • to decrease the barriers to deposit • to improve quality by design or example • Looked at metadata required for a good citation • academics using e-print records for many purposes not just visibility • Some information may be easier to strip out if required but harder to add later e.g. • first name or initials – although cultural variations too • journal title or abbreviation

  17. Simple things deter • Questions you can’t answer • No place to put it • Errors which force you to enter it again • On a credit card payment • Date on the card: 06/05 • Date to enter: 06/2005 How many times do I do this incorrectly!

  18. To help the depositor • Aimed to enter information as the depositor sees it on the full text • Arranged input in the order the information is seen • With relevant information grouped together • With ‘pages’ of daunting size • Fields of a size to view as much of the text as possible

  19. TARDis - Aiding deposit – relevant fields – relevant help

  20. The Process • Added help where examples are useful • Added extra buttons at top to ease navigation • Made mandatory fields where essential • Tension between full details and deterrent • commentary field currently not included although some might find useful

  21. Some ‘quality’ traditions may be less practical • Search service recommendations: capitals only for first word of title except proper nouns • Process is generally ‘cut and paste’ so result is variable and advice ignored • Get Caps, non-caps, rarely ALL CAPS • Found in practice likely to be too time consuming to insist • Think retrieval first rather than consistency

  22. Classification – a specific area of debate • ePrints UK exploring automatic classification with Dewey • TARDis looked at current practice: Reviewed subject classification in discipline based and early institutional archives Found whole variety of choices and levels of complexity

  23. TARDis on subject classification • Discussion of issues and snapshot chart http://tardis.eprints.org • Using basic Library of Congress with view to harvesting eg papers in Oceanography • Added search box to find subject • Departments could use an additional scheme if they wish (software option) • Keywords can be added (cut and paste) if available (sometimes papers also have classification categories added for a journal) • Computer classification generally expensive and requires learning examples but accuracy is improving

  24. Towards the future – subject classification – on the fly

  25. Mediation • TARDis is experimenting with deposit choices • Branch to: • Self archiving (author or local assistant) with light review as pass through submission buffer • Assisted archiving – give us the file with essential details not evident from the full text

  26. Mediation in practice • Current experience: • Assisted archiving often time consuming – meeting the difficult ones - but can add value (e.g.fuller publisher location details such as DOI) • Self archiving less accurate but author may know details which may be missing from full text • Balance likely to change as authors become either more familiar with early deposit or perhaps happy to delegate to save time • Learning curve for us – later may devolve some quality responsibility (use editorial options) • Give additional feedback into software

  27. The challenge of cutting and pasting from PDFs • Sometimes rather like the Hyperbookworms (Jasper Fforde, The Eyre Affair) • Who produce spurious capitals, apostrophes, hyphens • Problems with hyphens, accents and words starting with f! • LaTex usually the culprit so Humanities have an advantage here

  28. Balancing demands the pragmatic way • Author deposit changes the equation • Incentives can increase accuracy • Deposit support • Requests by department or university or funding council for up to date records • Collaboration between author, department and information specialist may be best way forward • Aim: light quality control to achieve visibility and citability

  29. The New World of e-Prints • Not so elegant to work in as an Oxford College Library such as Brasenose • But should be just as satisfying to use as it meets new needs

  30. Thank you For further information: TARDis http://tardis.eprints.org/ e-Prints Soton (Research Soton) http://eprints.soton.ac.uk/ FAIR Focus on Access to Institutional Resources Programme "Improving the Quality of Metadata in Eprint Archives" Marieke Guy and Andy Powell Ariadne Issue 38 30-January-2004 Barton, Jane, Currier, Sarah and Hey, Jessie M.N. (2003) Building quality assurance into metadata creation: an analysis based on the learning objects and e-Prints communities of practice.In:2003 Dublin Core Conference: Supporting Communities of Discourse and Practice - Metadata Research and Applications, DCMI, 39-48. http://eprints.soton.ac.uk/archive/00000020/

More Related