A Study of Citations in Users’ Online Personal Collections Nishikant Kapoor John T Butler, Sean M McNee, Gary C Fouty James A Stemper, Joseph A Konstan GroupLens Research Group and University Libraries University of Minnesota
Motivation • Citation web data can be used to effectively generate recommendations for the technical paper • “On the Recommending of Citations for Research Papers”, McNee, et al. (CSCW 2002) • “Enhancing Digital Libraries with TechLens+”, Torres, et al. (JCDL 2004)
Research Objectives • Design & Develop • Personalized digital library services • Understand • Users’ research interests
Research Questions Can we utilize users’ personal citation collections to offer them personalized DL services? • Can citations in users’ personal collections be resolved to unique identifiers? • How many of those do actually resolve to a unique online identifier? • How many of the resolved citations do actually lead to an online source for their content or metadata?
Citation Collections • RefWorks users • 96 collections, 30,336 citations • Two outliers (4000+ and 7000+) 316
User Profile • Users’ personal citation collections • Represent users’ profile • Research interests, Research collaborations • Are single collections • Related, Diverse • Are multiple collections • Task based, Workgroup based
Citation Types J B R N D S
Citation Types W J B R N D
Resolvability • A citation is resolvable if it has • A valid unique ID : DOI for articles, ISBN for books • Enough information to resolve it to a unique ID All citations that can be represented using a valid unique ID, are potentially resolvable.
External Resolvers • DOI and OpenURL Query Interfaces • Citation resolvers at crossref.org (CR) • ISBN Query Interfaces • Citation resolver at worldcat.org (WC)
Validity • A URL is valid if it leads to a citation’s source online • URLs : URL may or may not be unique • Validated existence of URL, not its accuracy • Did not attempt to retrieve ID for citation
DOI Resolvability 0 0 0 0
Resolvability Summary 8,540 (47%)
Limitations & Concerns • Very limited resolvers were used • Additional resolvers such as the Citation Matcher from PubMed could enhance resolvability further • Dataset too small and diverse • Difficult to find correlation among users • CF based services work better with larger dataset • Privacy concerns • Users want (a) control (b) anonymity
Future Work • Survey - Users’ willingness to share their personal collections • Understand how truly do users’ personal collections represent their profile? • Prototype of CF based DL services http://techlens.cs.umn.edu/ RecSys, Minneapolis, Oct 19-20, 2007
Acknowledgements • NSF grant IIS-0534939 • RefWorks (http://www.refworks.com/)
A Study of Citations in Users’ Online Personal Collections Nishikant Kapoor Questions?