Enhancing matrix factorization through initialization for implicit feedback databases
This presentation is the property of its rightful owner.
Sponsored Links
1 / 20

Enhancing Matrix Factorization Through Initialization for Implicit Feedback Databases PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on
  • Presentation posted in: General

Enhancing Matrix Factorization Through Initialization for Implicit Feedback Databases. Balázs Hidasi Domonkos Tikk Gravity R&D Ltd. Budapest University of Technology and Economics. CaRR workshop , 14th February 2012, Lisbon. Outline. Matrix factoriztion Initialization concept

Download Presentation

Enhancing Matrix Factorization Through Initialization for Implicit Feedback Databases

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Enhancing matrix factorization through initialization for implicit feedback databases

Enhancing Matrix Factorization Through Initialization forImplicit Feedback Databases

Balázs Hidasi

Domonkos Tikk

Gravity R&D Ltd.

Budapest University of Technology and Economics

CaRR workshop, 14th February 2012, Lisbon


Outline

Outline

  • Matrixfactoriztion

  • Initializationconcept

  • Methods

    • Naive

    • SimFactor

  • Results

  • Discussion


Matrix factorization

MatrixFactorization

  • CollaborativeFiltering

  • One of the most commonapproaches

  • Approximatestheratingmatrixasproduct of low-rankmatrices

Items

R

P

Q

Users


Matrix factorization1

Matrixfactorization

  • Initialize P and Q withsmallrandomnumbers

  • Teach P and Q

    • AlternatingLeastSquares

    • GradientDescent

    • Etc.

  • Transformsthedatato a featurespace

    • Separatelyforusers and items

    • Noisereduction

    • Compression

    • Generalization


Implicit feedback

Implicit feedback

  • No ratings

  • User-iteminteractions (events)

  • Muchnoisier

    • Presence of an event mightnot be positivefeedback

    • Absence of an event doesnotmeannegativefeedback

    • No negativefeedback is available!

  • More commonproblem

  • MF for implicit feedback

    • Less accurateresultsduetonoise

    • Mostly ALS is used

    • Scalabilityproblems (ratingmatrix is dense)


Concept

Concept

  • Good MF model

    • The featurevectors of similarentitiesaresimilar

    • Ifdata is toonoisy similarentitieswon’t be similarbytheirfeatures

  • Start MF from a „good” point

    • Featurevectorsimilaritiesare OK

  • Data is more thanjustevents

    • Metadata

      • Infoaboutitems/users

    • Contextualdata

      • Inwhatcontextdidtheeventoccured

    • Canweincorporatethosetohelp implicit MF?


Naive approach

Naiveapproach

  • Describeitemsusinganydatawehave (detailedlater)

    • Long, sparsevectorsforitemdescription

  • Compressthesevectorstodensefeaturevectors

    • PCA, MLP, MF, …

    • Length of desiredvectors = Numberoffeaturesin MF

  • Usethesefeaturesas starting points


Naive approach1

Naiveapproach

  • Compression and alsonoisereduction

  • Doesnotreallycareaboutsimilarities

  • Butoftenfeaturesimilaritiesarenotthatbad

  • If MF is used

    • Half of theresults is thrown out

Descriptors

Descriptorfeatures

Description of items

Itemfeatures

Items


Simfactor algorithm

SimFactoralgorithm

  • Trytopreservesimilaritiesbetter

  • Starting from an MF of itemdescription

  • Similarities of items: DD’

    • Somemetricsrequiretransformationon D

Descriptors

Descriptorsfeatures

Description of items

(D)

Itemfeatures

Items

Description of items

(D’)

Itemsimilarities

(S)

Description of items

(D)

=


Simfactor algorithm1

SimFactoralgorithm

  • Similarityapproximation

  • Y’Y  KxKsymmetric

    • Eigendecomposition

Itemfeatures (X’)

Itemsimilarities

(S)

Itemfeatures (X)

Descriptorsfeatures

(Y’)

Descriptorsfeatures

(Y)

Itemfeatures (X’)

Itemsimilarities

(S)

Itemfeatures (X)

Y’Y


Simfactor algorithm2

SimFactoralgorithm

  • λ diagonal λ = SQRT(λ) * SQRT(λ)

  • X*U*SQRT(λ) = (SQRT(λ)*U’*X’)’=F

  • F is MxKmatrix

  • S  F * F’  Fusedforinitialization

Y’Y

U

λ

U’

=

Itemfeatures (X’)

Itemsimilarities

(S)

Itemfeatures (X)

U

SQRT

(λ)

SQRT

(λ)

U’

F’

Itemsimilarities

(S)

F


Creating the description matrix

Creatingthedescriptionmatrix

  • „Any” dataabouttheentity

    • Vector-spacereprezentation

  • ForItems:

    • Metadatavector (title, category, description, etc)

    • Eventvector (whoboughttheitem)

    • Context-statevector (inwhichcontextstatewasitbought)

    • Context-event (inwhichcontextstatewhoboughtit)

  • ForUsers:

    • Allaboveexceptmetadata

  • Currently: Chooseonesourcefor D matrix

  • Contextused: seasonality


Experiments similarity preservation

Experiments: Similaritypreservation

  • Real life dataset: online grocery shopping events

  • SimFactorapproximatessimilaritiesbetter


Experiments initialization

Experiments: Initialization

  • Usingdifferentdescriptionmatrices

  • And bothnaiveandSimFactorinitialization

  • Baseline: random init

  • Evaluationmetric: [email protected]


Experiments grocery db

Experiments: Grocery DB

  • Upto 6% improvement

  • Best methodsuseSimFactor and usercontextdata


Experiments implicitized movielens

Experiments: „Implicitized” MovieLens

  • Keeping 5 starratings implicit events

  • Upto 10% improvement

  • Best methodsuseSimFactor and itemcontextdata


Discussion of results

Discussion of results

  • SimFactoryieldsbetterresultsthannaive

  • Contextinformationyieldsbetterresultsthanotherdescriptions

  • Contextinformationseparateswellbetweenentities

    • Grocery: Usercontext

      • People’sroutines

      • Differenttypes of shoppingsindifferenttimes

    • MovieLens: Itemcontext

      • Differenttypes of movieswatchedondifferenthours

  • Context-basedsimilarity


Why context

Whycontext?

  • Groceryexample

    • Correlationbetweencontextstatesbyusers low

  • Whycancontextawarealgorithms be efficient?

    • Differentrecommendationsindifferentcontextstates

    • Contextdifferentiateswellbetweenentities

      • Easiersubtasks


Conclusion future work

Conclusion & Futurework

  • SimFactor Similaritypreservingcompression

  • Similaritybased MF initialization:

    • Descriptionmatrixfromanydata

    • ApplySimFactor

    • Use output asinitialfeaturesfor MF

  • Contextdifferentitatesbetweenentitieswell

  • Futurework:

    • Mixed descriptionmatrix (multipledatasources)

    • Multipledescriptionmatrix

    • Usingdifferentcontextinformation

    • Usingdifferentsimilaritymetrics


Thanks for your attention

Thanksforyourattention!

For more of my recommender systems related research visit my website: http://www.hidasi.eu


  • Login