content based book recommending using learning for text categorization l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION PowerPoint Presentation
Download Presentation
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

Loading in 2 Seconds...

play fullscreen
1 / 20

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION - PowerPoint PPT Presentation


  • 345 Views
  • Uploaded on

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER BY RAYMOND J. MOONEY AND LORIENE ROY UNIVERSITY OF TEXAS, AUSTIN OVERVIEW Introduction Techniques Drawbacks of Existing Systems

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION' - Faraday


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
content based book recommending using learning for text categorization

CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION

TRIVIKRAM BHAT

UNIVERSITY OF TEXAS AT ARLINGTON

DATA MINING

CSE6362

BASED ON PAPER

BY

RAYMOND J. MOONEY AND LORIENE ROY

UNIVERSITY OF TEXAS, AUSTIN

overview
OVERVIEW
  • Introduction
  • Techniques
  • Drawbacks of Existing Systems
  • Advantages of Content Based Systems
  • LIBRA
  • System Description
  • Experimental Results
  • Future Work
  • Conclusions
introduction
INTRODUCTION

General goal of a Recommender System

  • Make personalized suggestions based on previous examples of users likes and dislikes

Types

  • Existing systems that use Social Filtering methods (base recommendations on other users preferences)
  • Content Based systems

(use information about an item itself to make suggestions)

introduction4
INTRODUCTION
  • Companies
    • Firefly
    • Net Perceptions
    • LikeMinds
    • Amazon ( Book Recommending )
    • Barnes And Noble ( Book Recommending )
techniques
TECHNIQUES
  • Social / Collaborative Filtering
    • Maintain a Database of user preferences
    • Find other users whose known preferences correlate significantly with a given user
  • Content Based Filtering
    • Allows a system to uniquely characterize each user without having to match their interests to someone else’s
    • Items are recommended based on the information of the item itself
drawbacks of existing systems
DRAWBACKS OF EXISTING SYSTEMS
  • Assume that a given user’s tastes are generally the same as another user
  • Assume that there are sufficient number of ratings
  • Tend to recommend popular titles
  • Need for sufficient information about other users which raises concerns about privacy and access to customer data
advantages of content based systems
ADVANTAGES OF CONTENT BASED SYSTEMS
  • Items are recommended based on the content of the item rather than on other users preferences
  • Provides a way to list content features that caused the item to be recommended
  • Allows users to provide initial subject information to aid the system
libra learning intelligent book recommending agent
LIBRA(Learning Intelligent Book Recommending Agent)
  • A database of book information extracted from web pages at Amazon.com
  • Users select a set of training books and rate them on a scale of 1-10
  • System learns a profile of the user using a Bayesian learning algorithm
  • Produces a ranked list of the most recommended additional titles from the system catalog
system description
SYSTEM DESCRIPTION

Extracting information and building a database

  • Perform Amazon subject search
  • Download book description URL’s
  • Information Extraction using slots to get valuable information about each book
  • Current slots used are title, authors, published reviews and many more
  • A simple extraction system is sufficient as the layout of Amazon’s automatically generated pages is regular
  • Some preprocessing is done

(author names into unique tokens of the form first_initial_last-name)

system description10
SYSTEM DESCRIPTION
  • Learning a Profile
  • User selects titles (maybe for a particular author)
    • - Need not perform a random scan of the entire database
  • Users rate the selected titles based on a scale of 1-10
  • Naïve Bayesian text classifier is used to classify a book title as either positive(6-10) or negative(1-5)
  • N training books Be (1 <= e <= N)
  • Each has 2 real weights
    • Positive weight e1 = (r-1)/9
    • Negative weight e0 = 1 - e1
    • r = user rating (1 <= r <= 10)
system description11
SYSTEM DESCRIPTION

Parameters

  • P(cj) =  ej / N
  • P(wk|cj, sm) =  ej nkem / L(cj, sm)
    • Where nkem = count of the number of times a word wk appears in example Be in slot sm
    • L(cj, sm) =  ej / dm denotes the total weighted length of the documents in category cj and slotsm
    • dm = vector of documents
  • Strength – It measures how much more likely a word in a slot is to appear in a positively rated book than a negatively rated book
slide12

Sample Positive Profile Features

Slot Word Strength

WORDS ZUBRIN 9.85

WORDS SMOLIN 9.39

WORDS TREFIL 8.77

WORDS DOT 8.67

SUBJECTS COMPARATIVE 8.39

AUTHOR D GOLDSMITH 8.04

WORDS ALH 7.97

WORDS MANNED 7.97

RELATED­TITLES SETTLE 7.91

system description13
SYSTEM DESCRIPTION

Producing, Explaining and Revising Recommendations

  • Once a profile is learnt, it is used to predict the preferred ranking of the remaining books
  • Recommendations are reviewed by the user and the user may assign their own rating to the examples they believe to be incorrectly ranked
  • Retrain the system by repeating the above several times in order to produce the best results
experimental results
EXPERIMENTAL RESULTS

Data Collection

  • Several data sets were assembled (LIT1, LIT2, MYST, SCI, SF)
  • In order to present a quantitative picture of performance on a realistic sample, books were selected at random
  • If the user was not familiar with a book, the user was asked to give a rating based on the information provided by the Amazon page describing the book
experimental results15
EXPERIMENTAL RESULTS

Performance Evaluation

  • Performed 10-fold cross validation on the examples
  • Various metrics were used to measure the performance
    • Classification accuracy (Acc): The percentage of examples correctly classified as positive or negative
    • Precision (Pr): The percentage of examples classified as positive which are positive
experimental results16
EXPERIMENTAL RESULTS

Discussion

  • User-selected examples v/s Randomly selected examples
    • User-selected examples are better as the user can accurately rate the selection
    • Randomly selected examples tend to cover the complete dataset
  • Conclusion – Avoid prematurely committing to a specific methodology
experimental results17
EXPERIMENTAL RESULTS
  • Can Collaborative and Content-Based approaches be combined to produce better results?
  • Slots – related authors, related titles
  • When the above slots were removed, performance degraded

Use of both approaches together produces better results

future work
FUTURE WORK
  • Web-Based interface (with a larger body of users)
  • Compare LIBRA’s Content-Based Approach to a standard Collaborative Approach
  • Maximize the utility of the small training set by using various Machine Learning techniques
    • Unsupervised learning
    • Active learning (incremental approach)
  • One effective approach – provide highly rated examples, generate initial recommendations, review the results, provide low rating for bad items and retrain the system to get new recommendations
conclusions
CONCLUSIONS
  • Content-Based Approach holds the promise of being able to effectively recommend items that have not been rated
  • Provides accurate information without any background knowledge of other users preferences
  • Combining Collaborative techniques does provide better results
  • www.cs.utexas.edu/users/ml/recommender.html
  • Partially supported by NSF