How to Use Statistics for Library Decision-making Diana Very June 27, 2011
MBA for Librarians: Statistics This program will demystify statistical concepts and skills and illustrate their library applications. The instructor will show how data can and should influence all areas of library operations. Learn about studies, tools and resources to assist you in comparing your data with that from other institutions. Create information, knowledge and stories from numerical and qualitative data to enhance decision making. The goal: Manage smarter. Led by Diana Very.
Diana Very Director of LSTA, Statistics, & Research Funding provided for this presentation by IMLS through the LSTA program grant
Statistics tell a story • What • Where • When • How • Why
What is a Statistic? • A statistic is a quantity that is calculated from a sample of data. It is used to give information about unknown values in the corresponding population. For example, the average of the data in a sample is used to give information about the overall average in the population from which that sample was drawn. • It is possible to draw more than one sample from the same population and the value of a statistic will in general vary from sample to sample. For example, the average value in a sample is a statistic. The average values in more than one sample, drawn from the same population, will not necessarily be equal.
Definitions • Mean • Median • Mode • Percentage Change )*100 • Range • Sample • Standard Deviation • Target Population (Children at a Children’s Program) • Trend • Variance • Correlation
Example of Population, Mean, Variance, Standard Deviation • Consider a population consisting of the following eight values: 2,4,4,4,5,5,7,9 • These eight data points have the mean (average) of 5: • (2+4+4+4+5+5+7+9)/8 = 5 • To calculate the population standard deviation, first compute the difference of each data point from the mean, and square the result of each: • (2-5)2 = (-3)2 = 9 (4-5)2 = (-1)2= 1 (4-5)2 = (-1)2= 1 (4-5)2 = (-1)2= 1 • (5-5)2 = (-0)2= 0 (5-5)2 = (-0)2= 0 (7-5)2 = (2)2= 4 (9-5)2 = (4)2= 16 • Next compute the average of these values, and take the square root: (9+1+1+1+0+0+4+16)/8 = 4 = variance square root of 4 is 2 = Standard deviation
Example of Normal Curve • This quantity is the population standard deviation; it is equal to the square root of the variance. • A slightly more complicated real life example, the average height for adult men in the United States is about 70", with a standard deviation of around 3". This means that most men (about 68%, assuming a normal distribution) have a height within 3" of the mean (67"–73") — one standard deviation — and almost all men (about 95%) have a height within 6" of the mean (64"–76") — two standard deviations. If the standard deviation were zero, then all men would be exactly 70" tall. If the standard deviation were 20", then men would have much more variable heights, with a typical range of about 50"–90". Three standard deviations account for 99.7% of the sample population being studied, assuming the distribution is normal (bell-shaped).
Multivariate Analysis Not as scary as it sounds Involves observation and analysis of more than one statistical variable at a time. In design and analysis, the technique is used to perform trade studies across multiple dimensions while taking into account the effects of all variables on the responses of interest. Example: During a production process, a number of different measurements such as the tensile strength, brittleness, diameter, etc. are taken on the same unit. Collectively such data are viewed as multivariate data.
Pearson Correlation The Pearson correlation measures the correlation or strength of linear dependence between two variables X and Y. It returns values between +1 and −1 inclusive. 1 implies that Y increases as X increases. 0 implies that there is no linear correlation between the variables. −1 implies that Y decreases as X increases. For −1 and 1, a linear equation exists that describes the relationship between X and Y perfectly.
Public Library Use Determinantsby Diana Very, 4/1/2011 • This hypothesis was based on an assumption that library users only used the libraries for new books and best sellers that are provided when the budget is available to buy them. The library materials budget was used as the independent variable assuming that the circulation was dependent on the amount available in the budget. Using the Pearson correlation coefficient of r for determining the extent to which these variables are related produced an r coefficient of -0.502 which means that there is good evidence that these variables are not correlated. • The circulation statistics are generated by the library visits, which would lend itself to project that marketing to more of the library service population would increase circulation and use of library materials rather than spending more money for new materials. When making a decision about marketing budget or materials budget, this study may prove to be helpful.
Where to get statistics? • Statistics are everywhere. The statistics that you want to use will depend on what decision you want to make from them. • Some questions that come up for libraries • Who are our customers? • Can we bring in more users?
Public Library Statistical Survey - IMLS http://harvester.census.gov/imls/publib.asp Public Libraries in the United States: Fiscal Year 2008Release Date: June 2010 Revised Date: January 2011 http://harvester.census.gov/imls/pubs/pls/index.asp
Academic Library Statistical Survey • Academic Libraries: 2008 First Look • http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2010348 • National Center for Education Statistics • FY2008 edition provides stats on 3,827 academic libraries • Circulations • Public Service Hours • Gate Count • Collection Numbers & Types • Staff
Public School Library/Media Center • http://nces.ed.gov/pubsearch/getpubcats.asp?sid=041# • Several reports are available at this site, but only to 2000. • Federal Libraries and Media Centers reports are also available, but not up to date. • Digest of Education Statistics, 2010 • http://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2011015 • Contains up-to-date stats for education from kindergarten through graduate school.
Public Library Data Service Statistical Report • The survey for 2010 data is the 23rd edition of the annual survey. • This report is created from a survey sent to 9,272 valid U.S. and Canadian libraries through web contacts. • 1,105 responded to the questionnaire.
Census Data • Home page of data sets and instructions • http://www.census.gov/acs/www/ • American Community Survey – Provides demographics such as population number, races, housing, education, etc., for states, counties, and municipalities • http://www.census.gov/acs/www/ • Guidance for data users – provides instruction on using the data and finding the correct data set • http://www.census.gov/acs/www/guidance_for_data_users/guidance_main/
Other Examples of Library Stats • 2011 State of America’s Libraries — ALA Releases Annual Report : http://ala.org/ala/newspresscenter/mediapresscenter/americaslibraries2011/index.cfm • Library Research Service – Colorado Library Stats http://www.lrs.org/pub_stats.php • Current Look at Georgia Public Libraries FY 2010http://www.georgialibraries.org/lib/publiclibinfo/
Where to find comparative statistics • This depends on what type of comparisons you want to make • In Georgia, the library directors want to compare their system with others in Georgia • In DeKalb County, Georgia, the library branch managers want to compare their branches to other libraries within the county system. • The Public Library Survey from IMLS provides data for states to compare their state data with other states. • Peer-to-peer comparisons; • I’m ok just so I’m better than you…Oh, my!
Compare this year with last year • Use a trend analysis (compares different years of same statistic) for staff motivation, accountability reports, marketing and promotional activities. • Stats to Use: • Circulation • Visits • Program attendance • Genre Circulation • Library Cards • Try per capita calculations • Library Cards per capita • Program attendance per capita • Identify the % not participating.
How to make the decision – Step 1 1. What’s the situation? 4% budget cut
How to make the decision – Step 2 2. Decision tree • Reduce staff – already skeleton staff • Furlough staff – not fair to staff • Reduce hours – possibility • Reduce library collection budget – cut last year to nearly nothing • Reduce outreach services – agreements already in place
How to make the decision – Step 3 3. Justify how to reduce hours • Check into patterns of library use • Check into staff efficiencies • Check into circulation and reference use • Check into website hits and WIFI traffic
Group Work Name a statistic or set of statistics that will answer: • Patterns of library use • Staff efficiencies • Circulation and Reference use • Website and WIFI traffic
If you want to know about patterns of library use, you would collect what type of data? Reference contacts by hour
If you want to know about staff efficiencies, you would collect what type of data? Circulations per FTE
If you want to know about circulation and reference use, you would collect what type of data? Collection turnover rate (circulation/collection) Hint hint
If you want to know about website hits and WIFI traffic, you would collect what type of data?
Tell the 2011 Clifford Presentation Story 1,518 participants 15 programs @ a Cost of 62 Per Participant http://animoto.com/play/jSIexmvTn8wimFaCnajcbw
Use Statistics to Make Informed Decisions • When is the busiest time at the library? • Do I need more staff to keep up? • Do we really need a new building or can we rearrange the current facility? • Should we arrange the fiction by genre or alphabetically by author? • Why is our teen collection not being used? Old collection? Hidden in the middle of the picture books? Teens don’t know about our books? • Would the community support the program if they knew the benefits?
Thank you!! Contact information: Diana Very Georgia Public Library Service 1800 Century Place, Ste. 150 Atlanta, GA 30345 404-235-7156 firstname.lastname@example.org
References • Standard Deviation from Wikipedia retrieved on 5/12/2011 from http://en.wikipedia.org/wiki/Standard_deviation • PLA - Public Library Data Service Statistical Report. 2010. Presented by the PLA/ALA, ordering information found at http://pla.org/ala/mgrps/divs/pla/plapublications/pldsstatreport/index.cfm • IMLS - Public Libraries in the United States: Fiscal Year 2008. Only available on-line at http://harvester.census.gov/imls/pubs/pls/pub_detail.asp?id=130
References, cont. • Smith, Mark. 1996. Collecting and using public library statistics: A how-to-do-it manual for librarians, Number 56. Neal-Schuman Publishers, Inc. • 2011 State of America’s Libraries — ALA Releases Annual Report : http://ala.org/ala/newspresscenter/mediapresscenter/americaslibraries2011/index.cfm • Library Research Service - Colorado Statistics http://www.lrs.org/pub_stats.php • Multivariate Analysis Concepts, retrieved from http://support.sas.com/publishing/pubcat/chaps/56903.pdf
References, cont. 2 • Multivariate analysis, retrieved from Wikipedia, http://en.wikipedia.org/wiki/Multivariate_analysis • Very, Diana. 2011. Public Library Use Determinants, p 13, e-mail for copy email@example.com.