Enhancing Matrix Factorization Through Initialization for Implicit Feedback Databases - PowerPoint PPT Presentation

Enhancing matrix factorization through initialization for implicit feedback databases
Download
1 / 20

  • 71 Views
  • Uploaded on
  • Presentation posted in: General

Enhancing Matrix Factorization Through Initialization for Implicit Feedback Databases. Balázs Hidasi Domonkos Tikk Gravity R&D Ltd. Budapest University of Technology and Economics. CaRR workshop , 14th February 2012, Lisbon. Outline. Matrix factoriztion Initialization concept

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Enhancing Matrix Factorization Through Initialization for Implicit Feedback Databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Enhancing matrix factorization through initialization for implicit feedback databases

Enhancing Matrix Factorization Through Initialization forImplicit Feedback Databases

Balázs Hidasi

Domonkos Tikk

Gravity R&D Ltd.

Budapest University of Technology and Economics

CaRR workshop, 14th February 2012, Lisbon


Outline

Outline

  • Matrixfactoriztion

  • Initializationconcept

  • Methods

    • Naive

    • SimFactor

  • Results

  • Discussion


Matrix factorization

MatrixFactorization

  • CollaborativeFiltering

  • One of the most commonapproaches

  • Approximatestheratingmatrixasproduct of low-rankmatrices

Items

R

P

Q

Users


Matrix factorization1

Matrixfactorization

  • Initialize P and Q withsmallrandomnumbers

  • Teach P and Q

    • AlternatingLeastSquares

    • GradientDescent

    • Etc.

  • Transformsthedatato a featurespace

    • Separatelyforusers and items

    • Noisereduction

    • Compression

    • Generalization


Implicit feedback

Implicit feedback

  • No ratings

  • User-iteminteractions (events)

  • Muchnoisier

    • Presence of an event mightnot be positivefeedback

    • Absence of an event doesnotmeannegativefeedback

    • No negativefeedback is available!

  • More commonproblem

  • MF for implicit feedback

    • Less accurateresultsduetonoise

    • Mostly ALS is used

    • Scalabilityproblems (ratingmatrix is dense)


Concept

Concept

  • Good MF model

    • The featurevectors of similarentitiesaresimilar

    • Ifdata is toonoisy similarentitieswon’t be similarbytheirfeatures

  • Start MF from a „good” point

    • Featurevectorsimilaritiesare OK

  • Data is more thanjustevents

    • Metadata

      • Infoaboutitems/users

    • Contextualdata

      • Inwhatcontextdidtheeventoccured

    • Canweincorporatethosetohelp implicit MF?


Naive approach

Naiveapproach

  • Describeitemsusinganydatawehave (detailedlater)

    • Long, sparsevectorsforitemdescription

  • Compressthesevectorstodensefeaturevectors

    • PCA, MLP, MF, …

    • Length of desiredvectors = Numberoffeaturesin MF

  • Usethesefeaturesas starting points


Naive approach1

Naiveapproach

  • Compression and alsonoisereduction

  • Doesnotreallycareaboutsimilarities

  • Butoftenfeaturesimilaritiesarenotthatbad

  • If MF is used

    • Half of theresults is thrown out

Descriptors

Descriptorfeatures

Description of items

Itemfeatures

Items


Simfactor algorithm

SimFactoralgorithm

  • Trytopreservesimilaritiesbetter

  • Starting from an MF of itemdescription

  • Similarities of items: DD’

    • Somemetricsrequiretransformationon D

Descriptors

Descriptorsfeatures

Description of items

(D)

Itemfeatures

Items

Description of items

(D’)

Itemsimilarities

(S)

Description of items

(D)

=


Simfactor algorithm1

SimFactoralgorithm

  • Similarityapproximation

  • Y’Y  KxKsymmetric

    • Eigendecomposition

Itemfeatures (X’)

Itemsimilarities

(S)

Itemfeatures (X)

Descriptorsfeatures

(Y’)

Descriptorsfeatures

(Y)

Itemfeatures (X’)

Itemsimilarities

(S)

Itemfeatures (X)

Y’Y


Simfactor algorithm2

SimFactoralgorithm

  • λ diagonal λ = SQRT(λ) * SQRT(λ)

  • X*U*SQRT(λ) = (SQRT(λ)*U’*X’)’=F

  • F is MxKmatrix

  • S  F * F’  Fusedforinitialization

Y’Y

U

λ

U’

=

Itemfeatures (X’)

Itemsimilarities

(S)

Itemfeatures (X)

U

SQRT

(λ)

SQRT

(λ)

U’

F’

Itemsimilarities

(S)

F


Creating the description matrix

Creatingthedescriptionmatrix

  • „Any” dataabouttheentity

    • Vector-spacereprezentation

  • ForItems:

    • Metadatavector (title, category, description, etc)

    • Eventvector (whoboughttheitem)

    • Context-statevector (inwhichcontextstatewasitbought)

    • Context-event (inwhichcontextstatewhoboughtit)

  • ForUsers:

    • Allaboveexceptmetadata

  • Currently: Chooseonesourcefor D matrix

  • Contextused: seasonality


Experiments similarity preservation

Experiments: Similaritypreservation

  • Real life dataset: online grocery shopping events

  • SimFactorapproximatessimilaritiesbetter


Experiments initialization

Experiments: Initialization

  • Usingdifferentdescriptionmatrices

  • And bothnaiveandSimFactorinitialization

  • Baseline: random init

  • Evaluationmetric: recall@50


Experiments grocery db

Experiments: Grocery DB

  • Upto 6% improvement

  • Best methodsuseSimFactor and usercontextdata


Experiments implicitized movielens

Experiments: „Implicitized” MovieLens

  • Keeping 5 starratings implicit events

  • Upto 10% improvement

  • Best methodsuseSimFactor and itemcontextdata


Discussion of results

Discussion of results

  • SimFactoryieldsbetterresultsthannaive

  • Contextinformationyieldsbetterresultsthanotherdescriptions

  • Contextinformationseparateswellbetweenentities

    • Grocery: Usercontext

      • People’sroutines

      • Differenttypes of shoppingsindifferenttimes

    • MovieLens: Itemcontext

      • Differenttypes of movieswatchedondifferenthours

  • Context-basedsimilarity


Why context

Whycontext?

  • Groceryexample

    • Correlationbetweencontextstatesbyusers low

  • Whycancontextawarealgorithms be efficient?

    • Differentrecommendationsindifferentcontextstates

    • Contextdifferentiateswellbetweenentities

      • Easiersubtasks


Conclusion future work

Conclusion & Futurework

  • SimFactor Similaritypreservingcompression

  • Similaritybased MF initialization:

    • Descriptionmatrixfromanydata

    • ApplySimFactor

    • Use output asinitialfeaturesfor MF

  • Contextdifferentitatesbetweenentitieswell

  • Futurework:

    • Mixed descriptionmatrix (multipledatasources)

    • Multipledescriptionmatrix

    • Usingdifferentcontextinformation

    • Usingdifferentsimilaritymetrics


Thanks for your attention

Thanksforyourattention!

For more of my recommender systems related research visit my website: http://www.hidasi.eu


  • Login