200 likes | 345 Views
This paper presents an advanced approach to matrix factorization (MF) tailored for implicit feedback databases. We explore the significance of initialization methods, including naive and SimFactor approaches, emphasizing their impact on model accuracy. Through empirical experiments using real-world datasets, we demonstrate that context-aware similarity preservation significantly enhances MF performance. Our findings suggest that leveraging additional metadata and contextual information leads to better feature representation, which is crucial for improvement in recommendation systems.
E N D
Enhancing Matrix Factorization Through Initialization forImplicit Feedback Databases Balázs Hidasi Domonkos Tikk Gravity R&D Ltd. Budapest University of Technology and Economics CaRR workshop, 14th February 2012, Lisbon
Outline • Matrixfactoriztion • Initializationconcept • Methods • Naive • SimFactor • Results • Discussion
MatrixFactorization • CollaborativeFiltering • One of the most commonapproaches • Approximatestheratingmatrixasproduct of low-rankmatrices Items R P Q ≈ Users
Matrixfactorization • Initialize P and Q withsmallrandomnumbers • Teach P and Q • AlternatingLeastSquares • GradientDescent • Etc. • Transformsthedatato a featurespace • Separatelyforusers and items • Noisereduction • Compression • Generalization
Implicit feedback • No ratings • User-iteminteractions (events) • Muchnoisier • Presence of an event mightnot be positivefeedback • Absence of an event doesnotmeannegativefeedback • No negativefeedback is available! • More commonproblem • MF for implicit feedback • Less accurateresultsduetonoise • Mostly ALS is used • Scalabilityproblems (ratingmatrix is dense)
Concept • Good MF model • The featurevectors of similarentitiesaresimilar • Ifdata is toonoisy similarentitieswon’t be similarbytheirfeatures • Start MF from a „good” point • Featurevectorsimilaritiesare OK • Data is more thanjustevents • Metadata • Infoaboutitems/users • Contextualdata • Inwhatcontextdidtheeventoccured • Canweincorporatethosetohelp implicit MF?
Naiveapproach • Describeitemsusinganydatawehave (detailedlater) • Long, sparsevectorsforitemdescription • Compressthesevectorstodensefeaturevectors • PCA, MLP, MF, … • Length of desiredvectors = Numberoffeaturesin MF • Usethesefeaturesas starting points
Naiveapproach • Compression and alsonoisereduction • Doesnotreallycareaboutsimilarities • Butoftenfeaturesimilaritiesarenotthatbad • If MF is used • Half of theresults is thrown out Descriptors Descriptorfeatures Description of items Itemfeatures ≈ Items
SimFactoralgorithm • Trytopreservesimilaritiesbetter • Starting from an MF of itemdescription • Similarities of items: DD’ • Somemetricsrequiretransformationon D Descriptors Descriptorsfeatures Description of items (D) Itemfeatures ≈ Items Description of items (D’) Itemsimilarities (S) Description of items (D) =
SimFactoralgorithm • Similarityapproximation • Y’Y KxKsymmetric • Eigendecomposition Itemfeatures (X’) Itemsimilarities (S) Itemfeatures (X) Descriptorsfeatures (Y’) ≈ Descriptorsfeatures (Y) Itemfeatures (X’) Itemsimilarities (S) Itemfeatures (X) Y’Y ≈
SimFactoralgorithm • λ diagonal λ = SQRT(λ) * SQRT(λ) • X*U*SQRT(λ) = (SQRT(λ)*U’*X’)’=F • F is MxKmatrix • S F * F’ Fusedforinitialization Y’Y U λ U’ = Itemfeatures (X’) Itemsimilarities (S) Itemfeatures (X) U SQRT (λ) SQRT (λ) U’ ≈ F’ Itemsimilarities (S) F ≈
Creatingthedescriptionmatrix • „Any” dataabouttheentity • Vector-spacereprezentation • ForItems: • Metadatavector (title, category, description, etc) • Eventvector (whoboughttheitem) • Context-statevector (inwhichcontextstatewasitbought) • Context-event (inwhichcontextstatewhoboughtit) • ForUsers: • Allaboveexceptmetadata • Currently: Chooseonesourcefor D matrix • Contextused: seasonality
Experiments: Similaritypreservation • Real life dataset: online grocery shopping events • SimFactorapproximatessimilaritiesbetter
Experiments: Initialization • Usingdifferentdescriptionmatrices • And bothnaiveandSimFactorinitialization • Baseline: random init • Evaluationmetric: recall@50
Experiments: Grocery DB • Upto 6% improvement • Best methodsuseSimFactor and usercontextdata
Experiments: „Implicitized” MovieLens • Keeping 5 starratings implicit events • Upto 10% improvement • Best methodsuseSimFactor and itemcontextdata
Discussion of results • SimFactoryieldsbetterresultsthannaive • Contextinformationyieldsbetterresultsthanotherdescriptions • Contextinformationseparateswellbetweenentities • Grocery: Usercontext • People’sroutines • Differenttypes of shoppingsindifferenttimes • MovieLens: Itemcontext • Differenttypes of movieswatchedondifferenthours • Context-basedsimilarity
Whycontext? • Groceryexample • Correlationbetweencontextstatesbyusers low • Whycancontextawarealgorithms be efficient? • Differentrecommendationsindifferentcontextstates • Contextdifferentiateswellbetweenentities • Easiersubtasks
Conclusion & Futurework • SimFactor Similaritypreservingcompression • Similaritybased MF initialization: • Descriptionmatrixfromanydata • ApplySimFactor • Use output asinitialfeaturesfor MF • Contextdifferentitatesbetweenentitieswell • Futurework: • Mixed descriptionmatrix (multipledatasources) • Multipledescriptionmatrix • Usingdifferentcontextinformation • Usingdifferentsimilaritymetrics
Thanksforyourattention! For more of my recommender systems related research visit my website: http://www.hidasi.eu