1 / 27

Data Mining Library Collection Silos: Print Books and E-books in Library Collections

Data Mining Library Collection Silos: Print Books and E-books in Library Collections. Lynn Silipigni Connaway Ed O’Neill Chandra Prabha Brian Lavoie. Collection Assessment. Why assess collections? Provide data for member libraries for decision-making Description of the collection

tracen
Download Presentation

Data Mining Library Collection Silos: Print Books and E-books in Library Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Mining Library Collection Silos: Print Books and E-books in Library Collections Lynn Silipigni Connaway Ed O’NeillChandra PrabhaBrian Lavoie

  2. Collection Assessment • Why assess collections? • Provide data for member libraries for decision-making • Description of the collection • Identify specific subject areas • Determine collection age • Rate of growth • Strengths and weakness • Overlap/gap analysis • Identify last copy • Useful information • Outside funding • Library collection comparisons • Remote storage decisions • Collection development and management • Identify role of non- ARL libraries

  3. WorldCat as a Collection • World’s largest bibliographic database • July 1, 2003 = 50 million+ records • 1 billion holdings • Ideal source for data-mining • Characteristics of WorldCat • Age • Subject, using NATC • Holdings by type of library • ARL • Academic, non-ARL • Public • School • Special

  4. WorldCat as a Collection • Use of MARC data elements in WorldCat • Types of materials • Library holdings to determine audience levels • Collection assessment and collection use • Unique titles • Analyze and compare aggregate holdings for libraries • Identify print books (p-books) and electronic books (e-books)

  5. WorldCat Holdings by Library Types

  6. WorldCatNumber of Holdings

  7. WorldCatNumber of Records

  8. WorldCat Holdings

  9. WorldCat Holdings

  10. Study Objective • Digital materials constitute increasing proportion of library collections • Effective strategies for integrating print and digital materials within a library collection • Eliminate redundancies • Meet user expectations • Data-mining increasingly important to support collection management decisions • WorldCat • World’s largest bibliographic database • Ideal as source for data-mining • Data-mine WorldCat in order to examine characteristics of p-books and e-books

  11. Rationale • Collection management • Development • Cooperation • Deselection • Preservation • Space allocation and management • Meet user expectations • Services for off-site users • Migration from print to digital • Convenient access • 24/7 access • Desk-top delivery

  12. Scope • WorldCat • July 1, 2003 = 50 million+ records • 1 billion holdings • Digital Items • Books • Print (p-book) • Digital (e-book)

  13. Strategy • Identify digital items • Identify digital items with at least one other manifestation in WorldCat • FRBRize database • Work • Distinct intellectual or artistic expression • Cluster works in WorldCat • Manifestation • Physical embodiment of a work • Identify digital items with p-book equivalents • Assumption • If digital items have p-book equivalents, then digital items are e-books • Identify publishers and publication dates

  14. Need to Determine • Comparison of p-books and e-books • What is a book? • What is a p-book? • What is an e-book? • What is a digital item? • How do we extend p-book criteria to digital world?

  15. What is a Digital Item? • Working definition of digital item • Computer file • OR Electronic resource • OR Appropriate 856 field • Indicates electronic location or access

  16. What is a P-book? • No consensus for definition of a book • Text (type = a) and monograph (bib level = m) • Broadsides? • Pamphlets? • Government documents? • Children’s books? • Microforms? • Authoritative Definitions • UNESCO • Nonperiodical literary publication consisting of > 49 pages, covers excluded • ANSI • Publications consisting of > 49 pages • Hard covers • US Postal Service (publication) • Publications > 24 pages

  17. A P-book IS: • Based on UNESCO definition • Working definition of a p-book • Printed on paper (excludes microform) • Language material • Monograph • Physical description • Form of item = regular or large print • Title does not include a GMD • Substantial length (> 49 pages; > 25 to include juvenile titles) • Excludes manuscripts (dissertations and theses)

  18. What is an E-book? • Difficult to define e-book • Digital version of p-book (straightforward) • New conceptual views of a book in digital environment • Assumption • P-book is well-defined • If digital item has manifestation as a p- book, then digital item must also be a book • If p-book has digital equivalent or vice-versa, ignore e-book that has no print equivalents

  19. An E-book IS: • E-Book = Electronic (Digital) + Book • Definition of e-Book: • Digital equivalents of p-books • New conceptual definitions of books in digital environment

  20. WorldCat Record Analysis • P-book records = 24,048,235 (48% of WC) • Digital item records = 795,630 (15% of WC) • Web sites • Collections of interlinked, Web-accessible materials residing at a single location on the Internet • Documents • Various forms of electronic documents • E-books with no p-book equivalents and no minimum page requirements • Book chapters • Broadsides • Brochures • Pamphlets • Reprints • E-books with p-book equivalents = 76,375 (1.5% of WC)

  21. WorldCat Record Analysis • Digital item records (continued) • Interactive learning objects • Computer programs offering self-contained, interactive tutorial or educational experience •  Software • Computer programs for creating and manipulating information • Serials • Journals • Proceedings • Images • Theses • Other (2 records) • Computer game • Raw data file

  22. Digital Items in WorldCat

  23. Publication Dates of Digital Items With P-Book Equivalents in WorldCat

  24. Publishers of Digital Items With P-Book Equivalents in WorldCat • Approximately 15,000 unique publishers • Approximately 150 publishers with > 25 records • Top 10 publishers • Institute of Electrical and Electronic Engineers (IEEE) • National Bureau of Economic Research • US Government Printing Office • Springer • Inter-University Consortium for Political and Social Research • PowerKids Press • University of Virginia Library • MIT Press • Microsoft • Broderbund Software and Books

  25. Discussion of Analysis • Small number of • E-books with p-book equivalents • Publishers with > 25 records for e-books with p-book equivalents • Recent publication dates for e-books with p-book equivalents • More Web sites than documents or reprints • Difficult to identify and categorize digital items • Inconsistent cataloging policies and practices for digital items • Inconsistent definitions for types of digital items

  26. Future Research • Establish accepted criteria for defining an e-book independent of p-books • Identify and compare type of library holdings and NATC subjects for p-books and e-books • Identify electronic collection silos • Continue to collect these data to compare for trends • Identify types of content/materials that are better suited for either print or digital environment

  27. Questions and Discussion connawal@oclc.org oneill@oclc.org

More Related