1 / 62

Multimedia Databases

Multimedia Databases. LSI and SVD. Text - Detailed outline. text problem full text scanning inversion signature files clustering information filtering and LSI. Information Filtering + LSI. [Foltz+,’92] Goal: users specify interests (= keywords)

Download Presentation

Multimedia Databases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multimedia Databases LSI and SVD

  2. Text - Detailed outline • text • problem • full text scanning • inversion • signature files • clustering • information filtering and LSI

  3. Information Filtering + LSI • [Foltz+,’92] Goal: • users specify interests (= keywords) • system alerts them, on suitable news-documents • Major contribution: LSI = Latent Semantic Indexing • latent (‘hidden’) concepts

  4. Information Filtering + LSI Main idea • map each document into some ‘concepts’ • map each term into some ‘concepts’ ‘Concept’:~ a set of terms, with weights, e.g. • “data” (0.8), “system” (0.5), “retrieval” (0.6) -> DBMS_concept

  5. Information Filtering + LSI Pictorially: term-document matrix (BEFORE)

  6. Information Filtering + LSI Pictorially: concept-document matrix and...

  7. Information Filtering + LSI ... and concept-term matrix

  8. Information Filtering + LSI Q: How to search, eg., for ‘system’?

  9. Information Filtering + LSI A: find the corresponding concept(s); and the corresponding documents

  10. Information Filtering + LSI A: find the corresponding concept(s); and the corresponding documents

  11. Information Filtering + LSI Thus it works like an (automatically constructed) thesaurus: we may retrieve documents that DON’T have the term ‘system’, but they contain almost everything else (‘data’, ‘retrieval’)

  12. SVD - Detailed outline • Motivation • Definition - properties • Interpretation • Complexity • Case studies • Additional properties

  13. SVD - Motivation • problem #1: text - LSI: find ‘concepts’ • problem #2: compression / dim. reduction

  14. SVD - Motivation • problem #1: text - LSI: find ‘concepts’

  15. SVD - Motivation • problem #2: compress / reduce dimensionality

  16. Problem - specs • ~10**6 rows; ~10**3 columns; no updates; • random access to any cell(s) ; small error: OK

  17. SVD - Motivation

  18. SVD - Motivation

  19. SVD - Detailed outline • Motivation • Definition - properties • Interpretation • Complexity • Case studies • Additional properties

  20. SVD - Definition A[n x m] = U[n x r]L [ r x r] (V[m x r])T • A: n x m matrix (eg., n documents, m terms) • U: n x r matrix (n documents, r concepts) • L: r x r diagonal matrix (strength of each ‘concept’) (r : rank of the matrix) • V: m x r matrix (m terms, r concepts)

  21. SVD - Definition • A = ULVT - example:

  22. SVD - Properties THEOREM [Press+92]:always possible to decomposematrix A into A = ULVT , where • U,L,V: unique (*) • U, V: column orthonormal (ie., columns are unit vectors, orthogonal to each other) • UTU = I; VTV = I (I: identity matrix) • L: eigenvalues are positive, and sorted in decreasing order

  23. SVD - Example • A = ULVT - example: retrieval inf. lung brain data CS x x = MD

  24. SVD - Example • A = ULVT - example: retrieval CS-concept inf. lung MD-concept brain data CS x x = MD

  25. SVD - Example doc-to-concept similarity matrix • A = ULVT - example: retrieval CS-concept inf. lung MD-concept brain data CS x x = MD

  26. SVD - Example • A = ULVT - example: retrieval ‘strength’ of CS-concept inf. lung brain data CS x x = MD

  27. SVD - Example • A = ULVT - example: term-to-concept similarity matrix retrieval inf. lung brain data CS-concept CS x x = MD

  28. SVD - Example • A = ULVT - example: term-to-concept similarity matrix retrieval inf. lung brain data CS-concept CS x x = MD

  29. SVD - Detailed outline • Motivation • Definition - properties • Interpretation • Complexity • Case studies • Additional properties

  30. SVD - Interpretation #1 ‘documents’, ‘terms’ and ‘concepts’: • U: document-to-concept similarity matrix • V: term-to-concept sim. matrix • L: its diagonal elements: ‘strength’ of each concept

  31. SVD - Interpretation #2 • best axis to project on: (‘best’ = min sum of squares of projection errors)

  32. SVD - Motivation

  33. minimum RMS error SVD - interpretation #2 SVD: gives best axis to project v1

  34. SVD - Interpretation #2

  35. x x = v1 SVD - Interpretation #2 • A = ULVT - example:

  36. SVD - Interpretation #2 • A = ULVT - example: variance (‘spread’) on the v1 axis x x =

  37. SVD - Interpretation #2 • A = ULVT - example: • UL gives the coordinates of the points in the projection axis x x =

  38. x x = SVD - Interpretation #2 • More details • Q: how exactly is dim. reduction done?

  39. SVD - Interpretation #2 • More details • Q: how exactly is dim. reduction done? • A: set the smallest eigenvalues to zero: x x =

  40. SVD - Interpretation #2 x x ~

  41. SVD - Interpretation #2 x x ~

  42. SVD - Interpretation #2 x x ~

  43. SVD - Interpretation #2 ~

  44. SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: x x =

  45. SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: l1 x x = u1 u2 l2 v1 v2

  46. l1 l2 u1 u2 vT1 vT2 SVD - Interpretation #2 Equivalent: ‘spectral decomposition’ of the matrix: m = + +... n

  47. l1 l2 u1 u2 vT1 vT2 SVD - Interpretation #2 ‘spectral decomposition’ of the matrix: m r terms = + +... n n x 1 1 x m

  48. l1 l2 u1 u2 vT1 vT2 SVD - Interpretation #2 approximation / dim. reduction: by keeping the first few terms (Q: how many?) m = + +... n assume: l1 >= l2 >= ...

  49. l1 l2 u1 u2 vT1 vT2 SVD - Interpretation #2 A (heuristic - [Fukunaga]): keep 80-90% of ‘energy’ (= sum of squares of li ’s) m = + +... n assume: l1 >= l2 >= ...

  50. SVD - Interpretation #3 • finds non-zero ‘blobs’ in a data matrix x x =

More Related