1 / 17

Bandits and Browsing: Data Mining and Network Analysis for Library Collections

Harriett Green , English and Digital Humanities Librarian, University Library Kirk Hess, Digital Humanities Specialist, University Library Richard Hislop , Ph.D. candidate, Department of Economics, UIUC ERRT, April 25, 2012.

ina
Download Presentation

Bandits and Browsing: Data Mining and Network Analysis for Library Collections

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Harriett Green, English and Digital Humanities Librarian, University Library Kirk Hess, Digital Humanities Specialist, University Library Richard Hislop, Ph.D. candidate, Department of Economics, UIUC ERRT, April 25, 2012 Bandits and Browsing: Data Mining and Network Analysis for Library Collections

  2. The Problem • How can users effectively find materials in today’s library collections and digital libraries? • Transformation in the acquisitions, access, and storage of library collections with digital materials, off-site storage, etc. • Availability of immense amounts of data • IR literature: user searching patterns

  3. Project • GOAL: To develop information retrieval and analytical tools that could be incorporated into a possible recommender system • Metadata analysis to help users navigate and retrieve items from the collection • Code libraries will allow interdisciplinary study and research about the library itself. • Network analysis can reveal essential information about the collection's structure.

  4. Project Structure • TEAM: Harriett Green (PI), Kirk Hess, Richard Hislop • SUPPORT: I-CHASS Scalable Research Challenge—Michael Simeone, co-PI • TOOLS: Awarded Start-Up Allocation of 30,000 SUs from XSEDE on the SGI Altix UV Blacklight cluster at Pittsburgh Supercomputing Center with XSEDE consultation support

  5. Questions • What other collection items are like X item? How do we show people these related items? • What is the topic area that people want? How do we show people an estimated result of what they want? • How do we create visualizations and recommendations of items in the collection?

  6. The Beginning: Sample Data Set • Initially ran analyses on 40,000 item English collection • Quantify inefficiencies in subject headings • Developed prototypes of analyses to run on the full UIUC Library catalog data

  7. XSEDE Analysis • Run analyses on entire UIUC Library catalog data • Conduct network analyses on entire UIUC Library catalog data for subject correlations • Extend betweennesscalculation to use weighting based on items checked out together • Find clusters that need to be connected via extra subject headings

  8. Analysis of subject headings • Simple subject analysis can uncover lesser known correlations

  9. Metadata analysis • Help users and library staff identify and connect search terms to subject headings and metadata in the catalog • Our initial approach: Use correlation of subject headings in bibliographic records. • Quantifying Efficiency – ECS and ACS. • Result in a recommender system: analysis that will provide lists of related topics.

  10. Approach: Finding the right questions • Niche topics are important • Some headings are bridges between subjects • Metadata as a network analysis problem

  11. Analyzing Circulation Data • Collection use provides information about how to further improve the catalog • Can identify not only the most-important known links, but find connections that need to be added • Database is represented as a network, with traffic between items that are checked out together

  12. Analyzing user transactions

  13. Other collection analyses • Collection development can be analyzed across time in acquisition of authors and titles • Changes in library policy • Effect of converting collection from Dewey to LOC? • Effect of book location on check out frequency? (General stacks vs. departmental library vs. high-density storage)

  14. Approaches to Collection Analysis

  15. Challenges for library Recommender System • Google/Amazon/Netflix vs. Voyager and VuFind different approaches to users • Keyword searching: word frequency, Solr sorting by proximity and frequency • Recommender systems : build user profiles, clustering of users and of documents • Easy Search: tracking by simple click-throughs

  16. Future Steps • Analyze other data sets from other libraries’ catalogs • Create a suite of tools that libraries can use to calculate and improve the economic efficiency • Code libraries that can be shared and used across library systems: Reduce the need to re-solve problems (UTF-8); Code uses CSV files for easy integration • Visualize network diagrams of the data for assessments of collections

  17. QUESTIONS? Thank you! Harriett Green, green19@illinois.edu Kirk Hess, kirkhess@illinois.edu Richard Hislop, rdhislop@gmail.com

More Related