Enhancing matrix factorization through initialization for implicit feedback databases
Download
1 / 20

Enhancing Matrix Factorization Through Initialization for Implicit Feedback Databases - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

Enhancing Matrix Factorization Through Initialization for Implicit Feedback Databases. Balázs Hidasi Domonkos Tikk Gravity R&D Ltd. Budapest University of Technology and Economics. CaRR workshop , 14th February 2012, Lisbon. Outline. Matrix factoriztion Initialization concept

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Enhancing Matrix Factorization Through Initialization for Implicit Feedback Databases' - rolf


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Enhancing matrix factorization through initialization for implicit feedback databases

Enhancing Matrix Factorization Through Initialization forImplicit Feedback Databases

Balázs Hidasi

Domonkos Tikk

Gravity R&D Ltd.

Budapest University of Technology and Economics

CaRR workshop, 14th February 2012, Lisbon


Outline
Outline

  • Matrixfactoriztion

  • Initializationconcept

  • Methods

    • Naive

    • SimFactor

  • Results

  • Discussion


Matrix factorization
MatrixFactorization

  • CollaborativeFiltering

  • One of the most commonapproaches

  • Approximatestheratingmatrixasproduct of low-rankmatrices

Items

R

P

Q

Users


Matrix factorization1
Matrixfactorization

  • Initialize P and Q withsmallrandomnumbers

  • Teach P and Q

    • AlternatingLeastSquares

    • GradientDescent

    • Etc.

  • Transformsthedatato a featurespace

    • Separatelyforusers and items

    • Noisereduction

    • Compression

    • Generalization


Implicit feedback
Implicit feedback

  • No ratings

  • User-iteminteractions (events)

  • Muchnoisier

    • Presence of an event mightnot be positivefeedback

    • Absence of an event doesnotmeannegativefeedback

    • No negativefeedback is available!

  • More commonproblem

  • MF for implicit feedback

    • Less accurateresultsduetonoise

    • Mostly ALS is used

    • Scalabilityproblems (ratingmatrix is dense)


Concept
Concept

  • Good MF model

    • The featurevectors of similarentitiesaresimilar

    • Ifdata is toonoisy similarentitieswon’t be similarbytheirfeatures

  • Start MF from a „good” point

    • Featurevectorsimilaritiesare OK

  • Data is more thanjustevents

    • Metadata

      • Infoaboutitems/users

    • Contextualdata

      • Inwhatcontextdidtheeventoccured

    • Canweincorporatethosetohelp implicit MF?


Naive approach
Naiveapproach

  • Describeitemsusinganydatawehave (detailedlater)

    • Long, sparsevectorsforitemdescription

  • Compressthesevectorstodensefeaturevectors

    • PCA, MLP, MF, …

    • Length of desiredvectors = Numberoffeaturesin MF

  • Usethesefeaturesas starting points


Naive approach1
Naiveapproach

  • Compression and alsonoisereduction

  • Doesnotreallycareaboutsimilarities

  • Butoftenfeaturesimilaritiesarenotthatbad

  • If MF is used

    • Half of theresults is thrown out

Descriptors

Descriptorfeatures

Description of items

Itemfeatures

Items


Simfactor algorithm
SimFactoralgorithm

  • Trytopreservesimilaritiesbetter

  • Starting from an MF of itemdescription

  • Similarities of items: DD’

    • Somemetricsrequiretransformationon D

Descriptors

Descriptorsfeatures

Description of items

(D)

Itemfeatures

Items

Description of items

(D’)

Itemsimilarities

(S)

Description of items

(D)

=


Simfactor algorithm1
SimFactoralgorithm

  • Similarityapproximation

  • Y’Y  KxKsymmetric

    • Eigendecomposition

Itemfeatures (X’)

Itemsimilarities

(S)

Itemfeatures (X)

Descriptorsfeatures

(Y’)

Descriptorsfeatures

(Y)

Itemfeatures (X’)

Itemsimilarities

(S)

Itemfeatures (X)

Y’Y


Simfactor algorithm2
SimFactoralgorithm

  • λ diagonal λ = SQRT(λ) * SQRT(λ)

  • X*U*SQRT(λ) = (SQRT(λ)*U’*X’)’=F

  • F is MxKmatrix

  • S  F * F’  Fusedforinitialization

Y’Y

U

λ

U’

=

Itemfeatures (X’)

Itemsimilarities

(S)

Itemfeatures (X)

U

SQRT

(λ)

SQRT

(λ)

U’

F’

Itemsimilarities

(S)

F


Creating the description matrix
Creatingthedescriptionmatrix

  • „Any” dataabouttheentity

    • Vector-spacereprezentation

  • ForItems:

    • Metadatavector (title, category, description, etc)

    • Eventvector (whoboughttheitem)

    • Context-statevector (inwhichcontextstatewasitbought)

    • Context-event (inwhichcontextstatewhoboughtit)

  • ForUsers:

    • Allaboveexceptmetadata

  • Currently: Chooseonesourcefor D matrix

  • Contextused: seasonality


Experiments similarity preservation
Experiments: Similaritypreservation

  • Real life dataset: online grocery shopping events

  • SimFactorapproximatessimilaritiesbetter


Experiments initialization
Experiments: Initialization

  • Usingdifferentdescriptionmatrices

  • And bothnaiveandSimFactorinitialization

  • Baseline: random init

  • Evaluationmetric: recall@50


Experiments grocery db
Experiments: Grocery DB

  • Upto 6% improvement

  • Best methodsuseSimFactor and usercontextdata


Experiments implicitized movielens
Experiments: „Implicitized” MovieLens

  • Keeping 5 starratings implicit events

  • Upto 10% improvement

  • Best methodsuseSimFactor and itemcontextdata


Discussion of results
Discussion of results

  • SimFactoryieldsbetterresultsthannaive

  • Contextinformationyieldsbetterresultsthanotherdescriptions

  • Contextinformationseparateswellbetweenentities

    • Grocery: Usercontext

      • People’sroutines

      • Differenttypes of shoppingsindifferenttimes

    • MovieLens: Itemcontext

      • Differenttypes of movieswatchedondifferenthours

  • Context-basedsimilarity


Why context
Whycontext?

  • Groceryexample

    • Correlationbetweencontextstatesbyusers low

  • Whycancontextawarealgorithms be efficient?

    • Differentrecommendationsindifferentcontextstates

    • Contextdifferentiateswellbetweenentities

      • Easiersubtasks


Conclusion future work
Conclusion & Futurework

  • SimFactor Similaritypreservingcompression

  • Similaritybased MF initialization:

    • Descriptionmatrixfromanydata

    • ApplySimFactor

    • Use output asinitialfeaturesfor MF

  • Contextdifferentitatesbetweenentitieswell

  • Futurework:

    • Mixed descriptionmatrix (multipledatasources)

    • Multipledescriptionmatrix

    • Usingdifferentcontextinformation

    • Usingdifferentsimilaritymetrics


Thanks for your attention
Thanksforyourattention!

For more of my recommender systems related research visit my website: http://www.hidasi.eu