indexing and data mining in multimedia databases n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Indexing and Data Mining in Multimedia Databases PowerPoint Presentation
Download Presentation
Indexing and Data Mining in Multimedia Databases

Loading in 2 Seconds...

play fullscreen
1 / 79
darrion

Indexing and Data Mining in Multimedia Databases - PowerPoint PPT Presentation

126 Views
Download Presentation
Indexing and Data Mining in Multimedia Databases
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU www.cs.cmu.edu/~christos

  2. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • Resources C. Faloutsos

  3. Problem Given a large collection of (multimedia) records, find similar/interesting things, ie: • Allow fast, approximate queries, and • Find rules/patterns C. Faloutsos

  4. Sample queries • Similarity search • Find pairs of branches with similar sales patterns • find medical cases similar to Smith's • Find pairs of sensor series that move in sync • Find shapes like a spark-plug C. Faloutsos

  5. Sample queries –cont’d • Rule discovery • Clusters (of branches; of sensor data; ...) • Forecasting (total sales for next year?) • Outliers (eg., unexpected part failures; fraud detection) C. Faloutsos

  6. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • related projects @ CMU and resourses C. Faloutsos

  7. Indexing - Multimedia Problem: • given a set of (multimedia) objects, • find the ones similar to a desirable query object C. Faloutsos

  8. $price $price $price 1 1 1 365 365 365 day day day distance function: by expert C. Faloutsos

  9. ‘GEMINI’ - Pictorially eg,. std S1 F(S1) 1 365 day F(Sn) Sn eg, avg 1 365 day C. Faloutsos

  10. Remaining issues • how to extract features automatically? • how to merge similarity scores from different media C. Faloutsos

  11. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • Visualization: Fastmap • Relevance feedback: FALCON • Data Mining / Fractals • Conclusions C. Faloutsos

  12. ~100 ~1 FastMap ?? C. Faloutsos

  13. FastMap • Multi-dimensional scaling (MDS) can do that, but in O(N**2) time • We want a linear algorithm: FastMap [SIGMOD95] C. Faloutsos

  14. Applications: time sequences • given n co-evolving time sequences • visualize them + find rules [ICDE00] DEM rate JPY HKD time C. Faloutsos

  15. Applications - financial • currency exchange rates [ICDE00] FRF GBP JPY HKD USD(t) USD(t-5) C. Faloutsos

  16. FRF DEM HKD JPY USD GBP Applications - financial • currency exchange rates [ICDE00] USD(t) USD(t-5) C. Faloutsos

  17. Application: VideoTrails [ACM MM97] C. Faloutsos

  18. VideoTrails - usage • scene-cut detection (about 10% errors) • scene classification (eg., dialogue vs action) C. Faloutsos

  19. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • Visualization: Fastmap • Relevance feedback: FALCON • Data Mining / Fractals • Conclusions C. Faloutsos

  20. Merging similarity scores • eg., video: text, color, motion, audio • weights change with the query! • solution 1: user specifies weights • solution 2: user gives examples  • and we ‘learn’ what he/she wants: rel. feedback (Rocchio, MARS, MindReader) • but: how about disjunctive queries? C. Faloutsos

  21. ‘FALCON’ Vs Inverted Vs Trader wants only ‘unstable’ stocks C. Faloutsos

  22. “Single query point” methods + + + x + + + Rocchio C. Faloutsos

  23. + + + + + + + + + + + + “Single query point” methods + + + x x x + + + Rocchio MindReader MARS The averaging affect in action... C. Faloutsos

  24. Main idea: FALCON Contours [Wu+, vldb2000] + + feature2 eg., frequency + + + feature1 (eg., temperature) C. Faloutsos

  25. Conclusions for indexing + visualization • GEMINI: fast indexing, exploiting off-the-shelf SAMs • FastMap: automatic feature extraction in O(N) time • FALCON: relevance feedback for disjunctive queries C. Faloutsos

  26. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • Resourses C. Faloutsos

  27. Data mining & fractals – Road map • Motivation – problems / case study • Definition of fractals and power laws • Solutions to posed problems • More examples C. Faloutsos

  28. Problem #1 - spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol) • - ‘spiral’ and ‘elliptical’ galaxies • (stores & households ; mpg & MTBF...) • - patterns? (not Gaussian; not uniform) • attraction/repulsion? • separability?? C. Faloutsos

  29. Problem#2: dim. reduction • given attributes x1, ... xn • possibly, non-linearly correlated • drop the useless ones (Q: why? A: to avoid the ‘dimensionality curse’) C. Faloutsos

  30. Answer: • Fractals / self-similarities / power laws C. Faloutsos

  31. What is a fractal? = self-similar point set, e.g., Sierpinski triangle: zero area; infinite length! ... C. Faloutsos

  32. Definitions (cont’d) • Paradox: Infinite perimeter ; Zero area! • ‘dimensionality’: between 1 and 2 • actually: Log(3)/Log(2) = 1.58… (long story) C. Faloutsos

  33. Q: fractal dimension of a line? Intrinsic (‘fractal’) dimension Eg: #cylinders; miles / gallon C. Faloutsos

  34. Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a) Intrinsic (‘fractal’) dimension C. Faloutsos

  35. Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a) Q: fd of a plane? A: nn ( <= r ) ~ r^2 fd== slope of (log(nn) vs log(r) ) Intrinsic (‘fractal’) dimension C. Faloutsos

  36. log(#pairs within <=r ) 1.58 log( r ) Sierpinsky triangle == ‘correlation integral’ C. Faloutsos

  37. Road map • Motivation – problems / case studies • Definition of fractals and power laws • Solutions to posed problems • More examples • Conclusions C. Faloutsos

  38. Solution#1: spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol - ‘BOPS’ plot - [sigmod2000]) • clusters? • separable? • attraction/repulsion? • data ‘scrubbing’ – duplicates? C. Faloutsos

  39. Solution#1: spatial d.m. log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

  40. Solution#1: spatial d.m. [w/ Seeger, Traina, Traina, SIGMOD00] log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

  41. r1 r2 r2 r1 spatial d.m. Heuristic on choosing # of clusters C. Faloutsos

  42. Solution#1: spatial d.m. log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

  43. Solution#1: spatial d.m. log(#pairs within <=r ) • - 1.8 slope • - plateau! • repulsion!! ell-ell spi-spi -duplicates spi-ell log(r) C. Faloutsos

  44. Problem #2: Dim. reduction C. Faloutsos

  45. Solution: • drop the attributes that don’t increase the ‘partial f.d.’ PFD • dfn: PFD of attribute set A is the f.d. of the projected cloud of points [w/ Traina, Traina, Wu, SBBD00] C. Faloutsos

  46. Problem #2: dim. reduction global FD=1 PFD=1 PFD~1 PFD=0 PFD=1 PFD~1 C. Faloutsos

  47. Problem #2: dim. reduction global FD=1 PFD=1 PFD=1 Notice: ‘max variance’ would fail here PFD=0 PFD=1 PFD~1 C. Faloutsos

  48. Problem #2: dim. reduction global FD=1 PFD=1 PFD~1 Notice: SVD would fail here PFD=0 PFD=1 PFD~1 C. Faloutsos

  49. Road map • Motivation – problems / case studies • Definition of fractals and power laws • Solutions to posed problems • More examples • fractals • power laws • Conclusions C. Faloutsos

  50. #bytes time disk traffic • Not Poisson, not(?) iid - BUT: self-similar • How to model it? C. Faloutsos