1 / 23

Usage Data Analysis Using Python

Usage Data Analysis Using Python. Lei Jin Electronic Resources Librarian Josephine Choi Library Technician – Acquisitions, Electronic Resources and Serials. Background. Project Goals. Implement reporting system using EZproxy data Cover all electronic subscriptions

julianp
Download Presentation

Usage Data Analysis Using Python

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Usage Data Analysis Using Python Lei Jin Electronic Resources Librarian Josephine Choi Library Technician – Acquisitions, Electronic Resources and Serials

  2. Background

  3. Project Goals • Implement reporting system using EZproxy data • Cover all electronic subscriptions • Integrate demographic details to usage data • Explore patterns for evidence-based decision making

  4. Project Milestones • Project planning started Spring 2013 • First prototype in-house Analyzer created, Spring 2014 • First reports generated Jan 2015 • Began open source proxy analyzer testing • Ezproxy logs and registration data processed, Aug 2016 • Second prototype in-house analyzer launched, Oct 2016 • On campus Ezproxy implemented, Jan 2017 • Ezproxy logs and registration data processed, Aug 2018 • Reports generated using third prototype in-house analyzer, Jan 2019

  5. Scope of the project • Who: 1st year, 2nd year, 3rd year, 4th year, graduate students • When: September 2017 - August 2018 • What: ezproxy usage data both on and off campus + student profiles from registrar office • Why: dissect usage by student profile, by database, by programs, by faculty

  6. Methodology

  7. Simplify Anonymize EDA (activities per second) Combine EDA (session) Extract Future Development Faculty Profile and other by-products

  8. Raw Data Here is an example of what the log would look like 192.168.1.1 QsmG5smxpT1iPiU - jochoi [10/Dec/2013:15:41:00 -0500] "GET http://ezproxy.lib.ryerson.ca:80/connect?session=sQsmG5smxpT1iPiU&url=http://code.paperless.com HTTP/1.1" 302 0 192.168.1.1 QsmG5smxpT1iPiU - jochoi [10/Dec/2013:15:41:01 -0500] "GET http://code.paperless.com:80/ HTTP/1.1" 200 2336 192.168.1.1 QsmG5smxpT1iPiU - jochoi [10/Dec/2013:15:41:02 -0500] "GET http://code.paperless.com:80/default.taf?_function=main HTTP/1.1" 200 1189 Size: • Offcampus 3.67 GB zipped • On-campus 1.07 GB zipped • Equals to 125 GB from 2016-18 before being converted to .csv

  9. Tool • We use Python to handle most of the data processing procedure • Codes are stored in Jupyter Notebook and can be shared with those who are interested • Tableau is used as visualization tool

  10. Acknowledgement • We use code by Petrina Collingwood for the first step of the process (‘Simplify’) (https://github.com/prcollingwood/ezproxy) • Github as a good starting point for those who is interested in developing their own data project • Some of the codes for this project can be found here https://github.com/josiechoi/ezproxy-student

  11. Exploratory Data Analysis (EDA)

  12. Findings Access trend based on user group (students vs. staff/faculty) • the plot (right) shows the annual trend (per second) from Sept 2017-Aug 2018 (without applying moving average). The data is subsetted by user group (student vs. staff/faculty) • the usage shown here maybe affected by cases of massive downloads • student and staff/faculty may follow different patterns • our student usage has a dominant effect over the overall trend of our usage stats

  13. Access trend for on-campus and off-campus • The plot (right) shows annual trend (per second) of on-campus and off-campus usage. The data has been smoothened to mitigate seasonality • The trend resembles full-time vs. part-time students • Both on-campus and off-campus shows two peaks (one smaller one, followed by a big one)

  14. Access trend for Full-Time and Part-Time students • This plot (right) shows the annual trend (per second) from Sept 2017 to Aug 2018. The data is subsetted based on full-time/part-time status • We have smoothened the data (i.e. applying moving average) to mitigate the impact of seasonality • The trend of full-time student clearly shows two peaks in each semester (a small peak, followed by a big one). The same pattern was not found in the trend of part-time student

  15. Access trend for graduate and undergraduate students • The plot (right) shows the annual trend (session ) for Graduate and Undergraduate students • The two peaks pattern can be found in undergraduate student; however, it is not as prominent in graduate student

  16. Reports

  17. Live Demo Ryerson Intranet Tableau

  18. Faculty Profile

  19. Exploring Faculty Usage Based on Platform Domain

  20. Next steps • Mapping ezproxy data with our erm records • Combining ezproxy data with erm and acquisition data • Include faculty research profiles • Share reports with Collection liaison leads • Incorporate reports into ER evaluation workflow

  21. Questions?

More Related