Indexing and Data Mining in Multimedia Databases

1 / 79

# Indexing and Data Mining in Multimedia Databases - PowerPoint PPT Presentation

##### Indexing and Data Mining in Multimedia Databases

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU www.cs.cmu.edu/~christos

2. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • Resources C. Faloutsos

3. Problem Given a large collection of (multimedia) records, find similar/interesting things, ie: • Allow fast, approximate queries, and • Find rules/patterns C. Faloutsos

4. Sample queries • Similarity search • Find pairs of branches with similar sales patterns • find medical cases similar to Smith's • Find pairs of sensor series that move in sync • Find shapes like a spark-plug C. Faloutsos

5. Sample queries –cont’d • Rule discovery • Clusters (of branches; of sensor data; ...) • Forecasting (total sales for next year?) • Outliers (eg., unexpected part failures; fraud detection) C. Faloutsos

6. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • related projects @ CMU and resourses C. Faloutsos

7. Indexing - Multimedia Problem: • given a set of (multimedia) objects, • find the ones similar to a desirable query object C. Faloutsos

8. \$price \$price \$price 1 1 1 365 365 365 day day day distance function: by expert C. Faloutsos

9. ‘GEMINI’ - Pictorially eg,. std S1 F(S1) 1 365 day F(Sn) Sn eg, avg 1 365 day C. Faloutsos

10. Remaining issues • how to extract features automatically? • how to merge similarity scores from different media C. Faloutsos

11. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • Visualization: Fastmap • Relevance feedback: FALCON • Data Mining / Fractals • Conclusions C. Faloutsos

12. ~100 ~1 FastMap ?? C. Faloutsos

13. FastMap • Multi-dimensional scaling (MDS) can do that, but in O(N**2) time • We want a linear algorithm: FastMap [SIGMOD95] C. Faloutsos

14. Applications: time sequences • given n co-evolving time sequences • visualize them + find rules [ICDE00] DEM rate JPY HKD time C. Faloutsos

15. Applications - financial • currency exchange rates [ICDE00] FRF GBP JPY HKD USD(t) USD(t-5) C. Faloutsos

16. FRF DEM HKD JPY USD GBP Applications - financial • currency exchange rates [ICDE00] USD(t) USD(t-5) C. Faloutsos

17. Application: VideoTrails [ACM MM97] C. Faloutsos

18. VideoTrails - usage • scene-cut detection (about 10% errors) • scene classification (eg., dialogue vs action) C. Faloutsos

19. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • Visualization: Fastmap • Relevance feedback: FALCON • Data Mining / Fractals • Conclusions C. Faloutsos

20. Merging similarity scores • eg., video: text, color, motion, audio • weights change with the query! • solution 1: user specifies weights • solution 2: user gives examples  • and we ‘learn’ what he/she wants: rel. feedback (Rocchio, MARS, MindReader) • but: how about disjunctive queries? C. Faloutsos

21. ‘FALCON’ Vs Inverted Vs Trader wants only ‘unstable’ stocks C. Faloutsos

22. “Single query point” methods + + + x + + + Rocchio C. Faloutsos

23. + + + + + + + + + + + + “Single query point” methods + + + x x x + + + Rocchio MindReader MARS The averaging affect in action... C. Faloutsos

24. Main idea: FALCON Contours [Wu+, vldb2000] + + feature2 eg., frequency + + + feature1 (eg., temperature) C. Faloutsos

25. Conclusions for indexing + visualization • GEMINI: fast indexing, exploiting off-the-shelf SAMs • FastMap: automatic feature extraction in O(N) time • FALCON: relevance feedback for disjunctive queries C. Faloutsos

26. Outline Goal: ‘Find similar / interesting things’ • Problem - Applications • Indexing - similarity search • New tools for Data Mining: Fractals • Conclusions • Resourses C. Faloutsos

27. Data mining & fractals – Road map • Motivation – problems / case study • Definition of fractals and power laws • Solutions to posed problems • More examples C. Faloutsos

28. Problem #1 - spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol) • - ‘spiral’ and ‘elliptical’ galaxies • (stores & households ; mpg & MTBF...) • - patterns? (not Gaussian; not uniform) • attraction/repulsion? • separability?? C. Faloutsos

29. Problem#2: dim. reduction • given attributes x1, ... xn • possibly, non-linearly correlated • drop the useless ones (Q: why? A: to avoid the ‘dimensionality curse’) C. Faloutsos

30. Answer: • Fractals / self-similarities / power laws C. Faloutsos

31. What is a fractal? = self-similar point set, e.g., Sierpinski triangle: zero area; infinite length! ... C. Faloutsos

32. Definitions (cont’d) • Paradox: Infinite perimeter ; Zero area! • ‘dimensionality’: between 1 and 2 • actually: Log(3)/Log(2) = 1.58… (long story) C. Faloutsos

33. Q: fractal dimension of a line? Intrinsic (‘fractal’) dimension Eg: #cylinders; miles / gallon C. Faloutsos

34. Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a) Intrinsic (‘fractal’) dimension C. Faloutsos

35. Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a) Q: fd of a plane? A: nn ( <= r ) ~ r^2 fd== slope of (log(nn) vs log(r) ) Intrinsic (‘fractal’) dimension C. Faloutsos

36. log(#pairs within <=r ) 1.58 log( r ) Sierpinsky triangle == ‘correlation integral’ C. Faloutsos

37. Road map • Motivation – problems / case studies • Definition of fractals and power laws • Solutions to posed problems • More examples • Conclusions C. Faloutsos

38. Solution#1: spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol - ‘BOPS’ plot - [sigmod2000]) • clusters? • separable? • attraction/repulsion? • data ‘scrubbing’ – duplicates? C. Faloutsos

39. Solution#1: spatial d.m. log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

40. Solution#1: spatial d.m. [w/ Seeger, Traina, Traina, SIGMOD00] log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

41. r1 r2 r2 r1 spatial d.m. Heuristic on choosing # of clusters C. Faloutsos

42. Solution#1: spatial d.m. log(#pairs within <=r ) - 1.8 slope - plateau! - repulsion! ell-ell spi-spi spi-ell log(r) C. Faloutsos

43. Solution#1: spatial d.m. log(#pairs within <=r ) • - 1.8 slope • - plateau! • repulsion!! ell-ell spi-spi -duplicates spi-ell log(r) C. Faloutsos

44. Problem #2: Dim. reduction C. Faloutsos

45. Solution: • drop the attributes that don’t increase the ‘partial f.d.’ PFD • dfn: PFD of attribute set A is the f.d. of the projected cloud of points [w/ Traina, Traina, Wu, SBBD00] C. Faloutsos

46. Problem #2: dim. reduction global FD=1 PFD=1 PFD~1 PFD=0 PFD=1 PFD~1 C. Faloutsos

47. Problem #2: dim. reduction global FD=1 PFD=1 PFD=1 Notice: ‘max variance’ would fail here PFD=0 PFD=1 PFD~1 C. Faloutsos

48. Problem #2: dim. reduction global FD=1 PFD=1 PFD~1 Notice: SVD would fail here PFD=0 PFD=1 PFD~1 C. Faloutsos

49. Road map • Motivation – problems / case studies • Definition of fractals and power laws • Solutions to posed problems • More examples • fractals • power laws • Conclusions C. Faloutsos

50. #bytes time disk traffic • Not Poisson, not(?) iid - BUT: self-similar • How to model it? C. Faloutsos