Overview BackgroundRelated WorkProblemSolution Questions
What is a recommender system? P. Melville and V. Sindhwani. Recommender systems. In Encyclopedia of Machine Learning, pages 829-838. 2010. The goal of a Recommender System is to generate meaningful recommendations to a collection of users for items or products that might interest them.
Recommender Systems Mainly produce a list of recommendations through • Collaborative filtering • Content-based filtering
Recommender Systems Collaborative filtering Content-based filtering Predictions based on Discrete characteristics of an item Profile of a user’s preference To recommend additional items that are similar to those a user liked in the past • Predictions based on • a user’s past behavior • decisions made by other similar users • To recommend additional items that other similar users chose
Recommender Systems Collaborative filtering Content-based filtering Pandora Radio Uses properties of a song or artist Seeds a station with songs with similar properties User disliking a song will deemphasize certain attributes User liking a song will emphasize other attributes Last.fm • Observes bands and tracks listened to by user • Compares them with listening behavior of other users • Plays tracks not in library often played by other users with similar interests
Recommender Systems Collaborative filtering Problems • Cold start: require a large amount of existing data on a user to make accurate recommendations • Scalability: millions of users and products. a large amount of computation power is needed to calculate recommendations. • Sparsity: Large number of items sold. The most active users will only have rated a small subset of the overall database. Thus, even the most popular items have very few ratings.
Recommender Systems Content-based filtering Problems • Is the system able to learn user preferences from users' actions regarding one content source, and use them across other content types • Cold start: new user preference profile is empty or has few ratings
Recommender Systems Collaborative filtering Content-based filtering a a
Recommender Systems Other recommendation techniques Demographic • Provides recommendations based on a demographic profile of the user. • Recommended products can be produced for different demographic niches, by combining the ratings of users in those niches.
Recommender Systems Other recommendation techniques Demographic • Advantage • Does not require a history of user ratings like those needed by collaborative and content-based techniques. • Disadvantage • Must gather demographic information
Recommender Systems Other recommendation techniques Utility-based • base their advice on an evaluation of the match between a user’s need and the set of options available • User profile is utility function the user constructs or the system derives for the user
Recommender Systems Other recommendation techniques Utility-based • Advantage • factor non-product attributes, such as vendor reliability and product availability, into the utility computation • Therefore, makes it possible to trade off price against delivery schedule • No cold start/ramp up or sparsity problems
Recommender Systems Other recommendation techniques Utility-based • Disadvantage • User must input utility function (or system derives one from a questionnaire) • Suggestions are static (does not learn)
Recommender Systems Other recommendation techniques Knowledge-based • Suggests products based on inferences about a user’s needs and preferences. • The knowledge will sometimes contain explicit functional knowledge about how certain product features meet user needs.
Recommender Systems Other recommendation techniques Knowledge-based • Advantages • All the advantages of the utility-based approach • Can map from user needs to products
Recommender Systems Other recommendation techniques Knowledge-based • Disadvantages • Need for knowledge acquisition • Catalog knowledge • About objects and their features • Functional knowledge • Mapping between user needs and the object that might satisfy those needs • User knowledge • Demographic information or specific user need
Recommender Systems Hybrid Recommender systems Combine multiple recommendation techniques
Recommender Systems Hybrid Recommender systems Implementations • Making predictions with different approaches separately and then combining them • Adding the capabilities of one or more approaches to another • Unifying the approaches into one model
Recommender Systems Hybrid Recommender systems Burke, Robin. "Hybrid Recommender Systems: Survey and Experiments." User Modeling and User-Adapted Interaction 12.4 (2002): 331-70. Results confirmed that hybrid recommendation approaches produce better results than pure approaches
400 unique row vectors Singular value decomposition is essentially trying to reduce a rank R matrix to a rank K matrix. K=10 K=50
Huang, J.; Rogers, S.; Joo, E. (2014). Improving Restaurants by Extracting Subtopics from Yelp Reviews. In iConference 2014 (Social Media Expo) • Goal • to increase restaurant yelp ratings by pointing out the demand of customers from a large amount of reviews with high dimensionality. • Result • They were able to show what the users cared about most in their reviews of a restaurant • and were able to isolate the areas of interest for each restaurant.
Kim, Ji won (04/01/2014). "Scan and click: The uses and gratifications of social recommendation systems". Computers in human behavior (0747-5632), 33 , p. 184-191 • Goal • To understand why and how people use social recommendation systems to express opinion • Result • Users viewed social recommendations as potentially expressive tools and use them to express their opinions • results suggest that social recommendation systems offer an additional way to scan collective opinions and express individual opinions
Lu, J., Shambour, Q., Xu, Y., Lin, Q. and Zhang, G. (2013), A WEB-BASED PERSONALIZED BUSINESS PARTNER RECOMMENDATION SYSTEM USING FUZZY SEMANTIC TECHNIQUES. Computational Intelligence, 29: 37–69. doi:10.1111/j.1467-8640.2012.00427.x • Problem • too much information to sift through which overwhelms business users in search of relevant business partners • Solution • An intelligent recommendation system called Smart BizSeeker was created to recommend relevant business partners to individual business users • hybrid fuzzy semantic recommendation approach is an improvement over classical collaborative filtering based recommendation approaches due to the semantic attributes for items providing additional information that alleviates sparsity and cold start problems
Z. Huang, W. Chung and H. Chen. A graph model for E-commerce recommender systems. J. Am. Soc. Inf. Sci. Technol. 55(3), pp. 259-274. 2004. • Goal • develop a graph model that provided a generic data representation that can support different recommendation methods • discover whether combining content-based information with collaborative information would improve recommendation quality • Result • usefulness and flexibility of the graph model was shown by using it with three recommendation methods and a data set from an online bookstore • a hybrid recommendation approach achieved greater performance than the collaborative or content-based approach
Rahman, M., Carbunar, B., Ballesteros, J. and Chau, D. H. (2015), To catch a fake: Curbing deceptive Yelp ratings and venues. Statistical Analysis and Data Mining, 8: 147–161. • Goal • find a way to detect venues whose rating were affected by fraudulent reviews • Result • novel system named Marco that identifies venues that contain abnormal review spikes, series of dissenting reviews, or impactful yet suspicious reviews • Marco was demonstrated to be both fast and effective on the tested dataset with a 94% classification accuracy with reviews and a 95.8% for venues
GryAgneteAlsos, ElisabetLjunggren, Liv TorilPettersen, "Farm‐based entrepreneurs: what triggers the start‐up of new business activities?", Journal of Small Business and Enterprise Development, Vol. 10 Iss: 4, pp.435 – 443 • Goal • Investigate why farmers start additional business activities
.Weihong, H., Yi, C.: An E-commerce recommender system based on content-based filtering. Wuhan Univ. J. Nat. Sci. 5, 1091–1096 (2006) • Goal • Content-based filtering E-commerce recommender system
Problem Bob is looking to start a business and make some money
Problem Bob does not know what kind of business to start or where to start it
Problem Bob needs a way to quickly and easily discover a lucrative business idea and where to start it, but there are no existing solutions!
Problem It would be even better if his business was the first to fulfill a big consumer need!
solution Generating Business Venture Recommendations
solution Analyze data and determine, for each area, the hottest business categories based on overall stars and review count Knowing what is successful in other areas, we can recommend to start a business of one of those categories in an area that does not have a significant amount of traffic to such a business Bob, optionally, enters search terms to narrow down what kind of business he is interested in starting
solution Bob receives recommendations of hot business categories related to his interests and locations where there is not yet large traffic for that business category Bob gathers information from the local populace on their potential interest in his new business If he gets positive interest feedback, he can now start planning his business! Otherwise, he will try another of our recommendations and try again.
Solution Data Yelp Challenge Dataset 2.7M reviews and 649K tips by 687K users for 86K businesses 566K business attributes, e.g., hours, parking availability, ambience Social network of 687K users for a total of 4.2M social edges Aggregated check-ins over time for each of the 86K businesses 200,000 pictures from the included businesses
Solution Data Yelp Challenge Dataset Assumptions Yelp has removed fraudulent reviews and data is legitimate Results from analyzing Yelp dataset is representative of all the businesses in each location
Solution Data Need database with keywords associated with each business category or generate one
Solution Data Preprocessing (if needed) Remove duplicate data (if Yelp hasn’t already done so for us) Missing values – decide on one of the following Remove instances that have missing values Estimate missing values Ignore missing values when running data mining algorithm Aggregation of features Discretization of continuous values Feature Selection Feature Extraction