1 / 52

Project Briefing: The Melvyl Recommender Project Colleen Whitney California Digital Library

Project Briefing: The Melvyl Recommender Project Colleen Whitney California Digital Library and UC Berkeley School of Information Peter Brantley California Digital Library. Background. Fundamental changes in user needs and expectations Library catalogs do not meet these needs

arty
Download Presentation

Project Briefing: The Melvyl Recommender Project Colleen Whitney California Digital Library

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Project Briefing:The Melvyl Recommender Project Colleen Whitney California Digital Library and UC Berkeley School of Information Peter BrantleyCalifornia Digital Library CNI Project Briefing: Melvyl Recommender Project, 1

  2. CNI Project Briefing: Melvyl Recommender Project, 2 Background • Fundamental changes in user needs and expectations • Library catalogs do not meet these needs • Exploratory project

  3. CNI Project Briefing: Melvyl Recommender Project, 3 Outline • Text-based discovery system • User interface strategies • Spelling correction • Enhanced relevance ranking • Recommending

  4. CNI Project Briefing: Melvyl Recommender Project, 4 Text-Based Discovery • eXtensible TextFramework (XTF): built on Lucene, Saxon • Open source, standards-based (XML, XSLT, Java servlets) • Very different from relational approach • Built-in ranking capability

  5. CNI Project Briefing: Melvyl Recommender Project, 5 Testbed • Bibliographic records • MARC export from Melvyl • ~4.2 million UCLA records used in the current prototype • Experimented with using UCB records as well, for a total of 9 million

  6. CNI Project Briefing: Melvyl Recommender Project, 6 For Further Exploration • Scalability and Performance • Successfully indexed and searched up to 9 million records • How will it do with 35 or 40 million records?

  7. CNI Project Briefing: Melvyl Recommender Project, 7 Outline • Text-based discovery system • User interface strategies • Spelling correction • Enhanced relevance ranking • Recommending

  8. CNI Project Briefing: Melvyl Recommender Project, 8 AJAX • Asynchronous JavaScript And XML • Using to call to additional services from outside the core system • Render the page, then update portions as data arrives • Adds flexibility while maintaining speed

  9. CNI Project Briefing: Melvyl Recommender Project, 9 FRBR • Functional Requirements for Bibiliographic Records • Work (Hamlet) • Expression (in French) • Manifestation (Presses universitaires de France, 1987) • Item (UCLA’s copy of it) • Researching existing implementations • Analyzing how we would apply the concepts, and how we would implement

  10. CNI Project Briefing: Melvyl Recommender Project, 10 Faceted Browse

  11. CNI Project Briefing: Melvyl Recommender Project, 11 Faceted Browse • Underlying mechanics in place • For effective browse, will require: • substantial metadata enhancement • significant UI design work

  12. CNI Project Briefing: Melvyl Recommender Project, 12 Outline • Text-based discovery system • User interface strategies • Spelling correction • Enhanced relevance ranking • Recommending

  13. CNI Project Briefing: Melvyl Recommender Project, 13 Spelling Correction • Goal: 90% correct on first try • Dictionary-based (aspell) vs. index-based • Proper nouns • Multilingual environment

  14. CNI Project Briefing: Melvyl Recommender Project, 14 Spelling Correction • Chose index-based strategy • “N-gram” speller from Lucene: • “primer” => pri prim rime imer mer • form query from n-grams • retain top 100, rank by closeness to original word • Modified in several ways • adjust for transpositions and insertions • use metaphones • boost on word frequencies • Tested successfully on Wikipedia and aspell datasets

  15. CNI Project Briefing: Melvyl Recommender Project, 15 Examples • “Mexaco” => “Did you mean...Mexico?” • “Javasript” => “Did you mean... Javascript” • “frehman” => “Did you mean...freeman” • “flod” => “Your search for flod in keywords returned 12 result(s).” • “Cailfornia” => “Your search for cailfornia in keywords returned 1 result(s).”

  16. CNI Project Briefing: Melvyl Recommender Project, 16 For Further Exploration • Relative benefits of this approach: increase in indexing time (construction of bi-grams) • Consider when to intervene...only on 0 results? • Multi-word correction

  17. CNI Project Briefing: Melvyl Recommender Project, 17 Outline • Text-based discovery system • User interface strategies • Spelling correction • Enhanced relevance ranking • Recommending

  18. CNI Project Briefing: Melvyl Recommender Project, 18 Ranking • Using built-in Lucene capability • “Boosted” with circulation data • ~9 million UCLA circulation transactions • September 1999 – May 2005 • Data from two systems: Taos, Voyager • “Boosted” with holdings data • For 10 UC campuses, provided by OCLC

  19. CNI Project Briefing: Melvyl Recommender Project, 19 Examples Holdings “boost” Content ranking only

  20. CNI Project Briefing: Melvyl Recommender Project, 20 Examples Circulation “boost” Content ranking only

  21. CNI Project Briefing: Melvyl Recommender Project, 21 Examples Holdings “boost” Content ranking only

  22. CNI Project Briefing: Melvyl Recommender Project, 22 Examples Circulation “boost” Content ranking only

  23. CNI Project Briefing: Melvyl Recommender Project, 23 Assessment • Small-scale user test in March • Key questions: • Which ranking method works best for our academic users? • How do academic users evaluate relevance? • Is there a difference based on subject matter expertise?

  24. CNI Project Briefing: Melvyl Recommender Project, 24 Assessment • Task-based, facilitated and observed • Rotated through 4 ordering methods: • Content ranking only • Content ranking boosted by circulation • Content ranking boosted by holdings • Unranked, sorted by system id • Grouped by naive vs. expert

  25. CNI Project Briefing: Melvyl Recommender Project, 25 Assessment

  26. CNI Project Briefing: Melvyl Recommender Project, 26 Assessment

  27. CNI Project Briefing: Melvyl Recommender Project, 27 Preliminary Results • In general, all 3 content ranking methods beat unranked in returning “Very Useful” items • All 3 content ranking methods put more “Very Useful” items in top quartiles • Preferences differed by expertise • No clear-cut advantage to a single ranked method • More queries per task using unranked method

  28. CNI Project Briefing: Melvyl Recommender Project, 28 Results • Additional observations: • All users place strong emphasis on title and publication date in assessing relevance • Expert users rely heavily on author • Many commented that term highlighting helps them assess matches against the query

  29. CNI Project Briefing: Melvyl Recommender Project, 29 For Further Exploration • Content-based ranking appears well worth pursuing, but consider.... • Adjustments to field weights, given observations? • Relative costs of incorporating boosts? • Sources of expanded metadata, which helps users assess relevance.

  30. CNI Project Briefing: Melvyl Recommender Project, 30 Outline • Text-based discovery system • User interface strategies • Spelling correction • Enhanced relevance ranking • Recommending

  31. CNI Project Briefing: Melvyl Recommender Project, 31 Recommending • Exploring multiple methods: • Circulation-based • Similarity-based (more like this…) • Same author, subject, call number

  32. CNI Project Briefing: Melvyl Recommender Project, 32 Circulation-based • “Patrons who checked this out, also checked out....”

  33. CNI Project Briefing: Melvyl Recommender Project, 33 Related Items Patrons Finding Relationships

  34. CNI Project Briefing: Melvyl Recommender Project, 34 Patrons TR647 Arts, Architecture and Applied Arts Creating Sets

  35. CNI Project Briefing: Melvyl Recommender Project, 35 Patrons TR647 Arts, Architecture and Applied Arts Creating Sets Related items (same category only)

  36. CNI Project Briefing: Melvyl Recommender Project, 36 Patrons TR647 Arts, Architecture and Applied Arts Creating Sets Related items (same category only)

  37. CNI Project Briefing: Melvyl Recommender Project, 37 Patrons TR647 Arts, Architecture and Applied Arts Creating Sets Related items (same category only)

  38. CNI Project Briefing: Melvyl Recommender Project, 38 Patrons TR647 Arts, Architecture and Applied Arts Creating Sets Related items (same category only)

  39. CNI Project Briefing: Melvyl Recommender Project, 39 Creating Sets Sort 4 3 2 1

  40. CNI Project Briefing: Melvyl Recommender Project, 40 Creating Sets

  41. CNI Project Briefing: Melvyl Recommender Project, 41 Creating Sets Sub-sort: overall circulation ?

  42. CNI Project Briefing: Melvyl Recommender Project, 42 Examples

  43. CNI Project Briefing: Melvyl Recommender Project, 43 Examples

  44. CNI Project Briefing: Melvyl Recommender Project, 44 Does this method work? • Yes: especially “serendipitous” discovery • But…again we are not our users. • Testing this method now (April 2006). Similar questions: • Useful for academic audiences? • Subject naïve vs. subject expert?

  45. Similarity-based • Generated from content of the record. • “More like this...” CNI Project Briefing: Melvyl Recommender Project, 45

  46. CNI Project Briefing: Melvyl Recommender Project, 46 Examples

  47. CNI Project Briefing: Melvyl Recommender Project, 47 Examples

  48. CNI Project Briefing: Melvyl Recommender Project, 48 For further exploration • Integrate several methods • Author, subject linkages • Call number “shelf browse” • “More like this...” • Circulation-based recommendations • Limitations of circulation-based method • Identify other data rich in human-generated linkages....citations, reading lists...

  49. CNI Project Briefing: Melvyl Recommender Project, 49 Timeline • Completing user tests on circulation-based recommendations this month. • Wrapping up in June.

  50. CNI Project Briefing: Melvyl Recommender Project, 50 Many thanks to... • Mellon Foundation • RLG • OCLC • UCLA Library • UC Berkeley Library • CDL Team(Peter Brantley, Lynne Cameron, Rebecca Doherty, Randy Lai, Jane Lee, Martin Haye, Erik Hetzner, Kirk Hastings, Patricia Martin, Felicia Poe, Michael Russell, Lisa Schiff, Roy Tennant, Brian Tingle, Steve Toub...)

More Related