Project Briefing: The Melvyl Recommender Project Colleen Whitney California Digital Library

Project Briefing:The Melvyl Recommender Project Colleen Whitney California Digital Library and UC Berkeley School of Information Peter BrantleyCalifornia Digital Library CNI Project Briefing: Melvyl Recommender Project, 1

CNI Project Briefing: Melvyl Recommender Project, 2 Background • Fundamental changes in user needs and expectations • Library catalogs do not meet these needs • Exploratory project

CNI Project Briefing: Melvyl Recommender Project, 3 Outline • Text-based discovery system • User interface strategies • Spelling correction • Enhanced relevance ranking • Recommending

CNI Project Briefing: Melvyl Recommender Project, 4 Text-Based Discovery • eXtensible TextFramework (XTF): built on Lucene, Saxon • Open source, standards-based (XML, XSLT, Java servlets) • Very different from relational approach • Built-in ranking capability

CNI Project Briefing: Melvyl Recommender Project, 5 Testbed • Bibliographic records • MARC export from Melvyl • ~4.2 million UCLA records used in the current prototype • Experimented with using UCB records as well, for a total of 9 million

CNI Project Briefing: Melvyl Recommender Project, 6 For Further Exploration • Scalability and Performance • Successfully indexed and searched up to 9 million records • How will it do with 35 or 40 million records?

CNI Project Briefing: Melvyl Recommender Project, 8 AJAX • Asynchronous JavaScript And XML • Using to call to additional services from outside the core system • Render the page, then update portions as data arrives • Adds flexibility while maintaining speed

CNI Project Briefing: Melvyl Recommender Project, 9 FRBR • Functional Requirements for Bibiliographic Records • Work (Hamlet) • Expression (in French) • Manifestation (Presses universitaires de France, 1987) • Item (UCLA’s copy of it) • Researching existing implementations • Analyzing how we would apply the concepts, and how we would implement

CNI Project Briefing: Melvyl Recommender Project, 10 Faceted Browse

CNI Project Briefing: Melvyl Recommender Project, 11 Faceted Browse • Underlying mechanics in place • For effective browse, will require: • substantial metadata enhancement • significant UI design work

CNI Project Briefing: Melvyl Recommender Project, 13 Spelling Correction • Goal: 90% correct on first try • Dictionary-based (aspell) vs. index-based • Proper nouns • Multilingual environment

CNI Project Briefing: Melvyl Recommender Project, 14 Spelling Correction • Chose index-based strategy • “N-gram” speller from Lucene: • “primer” => pri prim rime imer mer • form query from n-grams • retain top 100, rank by closeness to original word • Modified in several ways • adjust for transpositions and insertions • use metaphones • boost on word frequencies • Tested successfully on Wikipedia and aspell datasets

CNI Project Briefing: Melvyl Recommender Project, 15 Examples • “Mexaco” => “Did you mean...Mexico?” • “Javasript” => “Did you mean... Javascript” • “frehman” => “Did you mean...freeman” • “flod” => “Your search for flod in keywords returned 12 result(s).” • “Cailfornia” => “Your search for cailfornia in keywords returned 1 result(s).”

CNI Project Briefing: Melvyl Recommender Project, 16 For Further Exploration • Relative benefits of this approach: increase in indexing time (construction of bi-grams) • Consider when to intervene...only on 0 results? • Multi-word correction

CNI Project Briefing: Melvyl Recommender Project, 18 Ranking • Using built-in Lucene capability • “Boosted” with circulation data • ~9 million UCLA circulation transactions • September 1999 – May 2005 • Data from two systems: Taos, Voyager • “Boosted” with holdings data • For 10 UC campuses, provided by OCLC

CNI Project Briefing: Melvyl Recommender Project, 19 Examples Holdings “boost” Content ranking only

CNI Project Briefing: Melvyl Recommender Project, 20 Examples Circulation “boost” Content ranking only

CNI Project Briefing: Melvyl Recommender Project, 21 Examples Holdings “boost” Content ranking only

CNI Project Briefing: Melvyl Recommender Project, 22 Examples Circulation “boost” Content ranking only

CNI Project Briefing: Melvyl Recommender Project, 23 Assessment • Small-scale user test in March • Key questions: • Which ranking method works best for our academic users? • How do academic users evaluate relevance? • Is there a difference based on subject matter expertise?

CNI Project Briefing: Melvyl Recommender Project, 24 Assessment • Task-based, facilitated and observed • Rotated through 4 ordering methods: • Content ranking only • Content ranking boosted by circulation • Content ranking boosted by holdings • Unranked, sorted by system id • Grouped by naive vs. expert

CNI Project Briefing: Melvyl Recommender Project, 25 Assessment

CNI Project Briefing: Melvyl Recommender Project, 26 Assessment

CNI Project Briefing: Melvyl Recommender Project, 27 Preliminary Results • In general, all 3 content ranking methods beat unranked in returning “Very Useful” items • All 3 content ranking methods put more “Very Useful” items in top quartiles • Preferences differed by expertise • No clear-cut advantage to a single ranked method • More queries per task using unranked method

CNI Project Briefing: Melvyl Recommender Project, 28 Results • Additional observations: • All users place strong emphasis on title and publication date in assessing relevance • Expert users rely heavily on author • Many commented that term highlighting helps them assess matches against the query

CNI Project Briefing: Melvyl Recommender Project, 29 For Further Exploration • Content-based ranking appears well worth pursuing, but consider.... • Adjustments to field weights, given observations? • Relative costs of incorporating boosts? • Sources of expanded metadata, which helps users assess relevance.

CNI Project Briefing: Melvyl Recommender Project, 31 Recommending • Exploring multiple methods: • Circulation-based • Similarity-based (more like this…) • Same author, subject, call number

CNI Project Briefing: Melvyl Recommender Project, 32 Circulation-based • “Patrons who checked this out, also checked out....”

CNI Project Briefing: Melvyl Recommender Project, 33 Related Items Patrons Finding Relationships

CNI Project Briefing: Melvyl Recommender Project, 34 Patrons TR647 Arts, Architecture and Applied Arts Creating Sets

CNI Project Briefing: Melvyl Recommender Project, 35 Patrons TR647 Arts, Architecture and Applied Arts Creating Sets Related items (same category only)

CNI Project Briefing: Melvyl Recommender Project, 39 Creating Sets Sort 4 3 2 1

CNI Project Briefing: Melvyl Recommender Project, 40 Creating Sets

CNI Project Briefing: Melvyl Recommender Project, 41 Creating Sets Sub-sort: overall circulation ?

CNI Project Briefing: Melvyl Recommender Project, 42 Examples

CNI Project Briefing: Melvyl Recommender Project, 44 Does this method work? • Yes: especially “serendipitous” discovery • But…again we are not our users. • Testing this method now (April 2006). Similar questions: • Useful for academic audiences? • Subject naïve vs. subject expert?

Similarity-based • Generated from content of the record. • “More like this...” CNI Project Briefing: Melvyl Recommender Project, 45

CNI Project Briefing: Melvyl Recommender Project, 48 For further exploration • Integrate several methods • Author, subject linkages • Call number “shelf browse” • “More like this...” • Circulation-based recommendations • Limitations of circulation-based method • Identify other data rich in human-generated linkages....citations, reading lists...

CNI Project Briefing: Melvyl Recommender Project, 49 Timeline • Completing user tests on circulation-based recommendations this month. • Wrapping up in June.

CNI Project Briefing: Melvyl Recommender Project, 50 Many thanks to... • Mellon Foundation • RLG • OCLC • UCLA Library • UC Berkeley Library • CDL Team(Peter Brantley, Lynne Cameron, Rebecca Doherty, Randy Lai, Jane Lee, Martin Haye, Erik Hetzner, Kirk Hastings, Patricia Martin, Felicia Poe, Michael Russell, Lisa Schiff, Roy Tennant, Brian Tingle, Steve Toub...)

Project Briefing: The Melvyl Recommender Project Colleen Whitney California Digital Library

Project Briefing: The Melvyl Recommender Project Colleen Whitney California Digital Library

Presentation Transcript

Kangan Batman TAFE Library and Learning Centre

Project Management Basics

The Expeditionary Warfare Integrated Project

Introduction to Project Management

INTRODUCTION TO PROJECT FINANCE

Data-Driven Digital Library Applications -- The UC Berkeley Environmental Digital Library

Framework for Project Management

Open Course Library Project

Project Management

Project Management (x470)

The project structure (WBS)

ERP Project Kick Off

WRAP/RMC Fire Sensitivity Modeling Project

Project Finance

Cross-media Intelligent Searching in Digital Library

Software Project Management

Software Project Management

Software Project Management 2007 Project Scope Management

Open Course Library Project

FYP Briefing

Project finance