1 / 76

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications. C. Faloutsos Multimedia Indexing. General Overview. Relational model – SQL; db design Indexing; Q-opt;Transaction processing Advanced topics Distributed Databases RAID Authorization / Stat. DB

zeheb
Download Presentation

Carnegie Mellon Univ. Dept. of Computer Science 15-415 - Database Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Carnegie Mellon Univ.Dept. of Computer Science15-415 - Database Applications C. Faloutsos Multimedia Indexing

  2. General Overview • Relational model – SQL; db design • Indexing; Q-opt;Transaction processing • Advanced topics • Distributed Databases • RAID • Authorization / Stat. DB • Spatial Access Methods • Multimedia Indexing 15-415 - C. Faloutsos

  3. Multimedia - Detailed outline • multimedia • Motivation / problem definition • Main idea / time sequences • images • sub-pattern matching • automatic feature extraction / FastMap 15-415 - C. Faloutsos

  4. Problem Given a large collection of (multimedia) records (eg. stocks) Allow fast, similarity queries 15-415 - C. Faloutsos

  5. Applications • time series: financial, marketing (click-streams!), ECGs, sound; • images: medicine, digital libraries, education, art • higher-d signals: scientific db (eg., astrophysics), medicine (MRI scans), entertainment (video) 15-415 - C. Faloutsos

  6. Sample queries • find medical cases similar to Smith's • Find pairs of stocks that move in sync • Find pairs of documents that are similar (plagiarism?) • find faces similar to ‘Tiger Woods’ 15-415 - C. Faloutsos

  7. Detailed problem defn.: Problem: • given a set of multimedia objects, • find the ones similar to a desirable query object • for example: 15-415 - C. Faloutsos

  8. $price $price $price 1 1 1 365 365 365 day day day distance function: by expert (eg, Euclidean distance) 15-415 - C. Faloutsos

  9. Types of queries • whole match vs sub-pattern match • range query vs nearest neighbors • all-pairs query 15-415 - C. Faloutsos

  10. Design goals • Fast (faster than seq. scan) • ‘correct’ (ie., no false alarms; no false dismissals) 15-415 - C. Faloutsos

  11. Multimedia - Detailed outline • multimedia • Motivation / problem definition • Main idea / time sequences • images • sub-pattern matching • automatic feature extraction / FastMap 15-415 - C. Faloutsos

  12. $price $price $price 1 1 1 365 365 365 day day day Main idea • Eg., time sequences, ‘whole matching’, range queries, Euclidean distance 15-415 - C. Faloutsos

  13. Main idea • Seq. scanning works - how to do faster? 15-415 - C. Faloutsos

  14. Idea: ‘GEMINI’ (GEneric Multimedia INdexIng) Extract a few numerical features, for a ‘quick and dirty’ test 15-415 - C. Faloutsos

  15. S1 F(S1) 1 365 1 365 F(Sn) day day Sn ‘GEMINI’ - Pictorially eg,. std eg, avg 15-415 - C. Faloutsos

  16. GEMINI Solution: Quick-and-dirty' filter: • extract n features (numbers, eg., avg., etc.) • map into a point in n-d feature space • organize points with off-the-shelf spatial access method (‘SAM’) • discard false alarms 15-415 - C. Faloutsos

  17. GEMINI Important: Q: how to guarantee no false dismissals? A1: preserve distances (but: difficult/impossible) A2: Lower-bounding lemma: if the mapping ‘makes things look closer’, then there are no false dismissals 15-415 - C. Faloutsos

  18. GEMINI Important: Q: how to extract features? A: “if I have only one number to describe my object, what should this be?” 15-415 - C. Faloutsos

  19. Time sequences Q: what features? 15-415 - C. Faloutsos

  20. Time sequences Q: what features? A: Fourier coefficients (we’ll see them in detail soon) 15-415 - C. Faloutsos

  21. Time sequences white noise brown noise Fourier spectrum ... in log-log 15-415 - C. Faloutsos

  22. Time sequences Eg.: 15-415 - C. Faloutsos

  23. Time sequences conclusion: colored noises are well approximated by their first few Fourier coefficients colored noises appear in nature: 15-415 - C. Faloutsos

  24. brown noise: stock prices (1/f2 energy spectrum) pink noise: works of art (1/f spectrum) black noises: water reservoirs (1/fb , b>2) (slope: related to ‘Hurst exponent’, for self-similar traffic, like, eg. Ethernet/web [Schroeder], [Leland+] Time sequences 15-415 - C. Faloutsos

  25. Time sequences - results keep the first 2-3 Fourier coefficients faster than seq. scan NO false dismissals (see book) total time cleanup-time r-tree time # coeff. kept 15-415 - C. Faloutsos

  26. improvements/variations: [Kanellakis+Goldin], [Mendelzon+Rafiei] could use Wavelets, or DCT could use segment averages [Yi+2000] Time sequences - improvements: 15-415 - C. Faloutsos

  27. Multimedia - Detailed outline • multimedia • Motivation / problem definition • Main idea / time sequences • images (color, shapes) • sub-pattern matching • automatic feature extraction / FastMap 15-415 - C. Faloutsos

  28. Images - color what is an image? A: 2-d array 15-415 - C. Faloutsos

  29. Images - color Color histograms, and distance function 15-415 - C. Faloutsos

  30. Images - color Mathematically, the distance function is: 15-415 - C. Faloutsos

  31. Problem: ‘cross-talk’: Features are not orthogonal -> SAMs will not work properly Q: what to do? A: feature-extraction question Images - color 15-415 - C. Faloutsos

  32. possible answers: avg red, avg green, avg blue it turns out that this lower-bounds the histogram distance -> no cross-talk SAMs are applicable Images - color 15-415 - C. Faloutsos

  33. Images - color time performance: seq scan w/ avg RGB selectivity 15-415 - C. Faloutsos

  34. Multimedia - Detailed outline • multimedia • Motivation / problem definition • Main idea / time sequences • images (color; shape) • sub-pattern matching • automatic feature extraction / FastMap 15-415 - C. Faloutsos

  35. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them? Images - shapes 15-415 - C. Faloutsos

  36. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them? A: divide by standard deviation) Images - shapes 15-415 - C. Faloutsos

  37. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions? Images - shapes 15-415 - C. Faloutsos

  38. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions? A1: turning angle A2: dilations/erosions A3: ... ) Images - shapes 15-415 - C. Faloutsos

  39. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction? Images - shapes 15-415 - C. Faloutsos

  40. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction? A: Karhunen-Loeve (= centered PCA/SVD) Images - shapes 15-415 - C. Faloutsos

  41. Performance: ~10x faster Images - shapes log(# of I/Os) all kept # of features kept 15-415 - C. Faloutsos

  42. Case study: Informedia • Video database system, developed at CMU • 2+ TB of video data (broadcast news) • retrieval by text, image and face similarity www.informedia.cs.cmu.edu/ 15-415 - C. Faloutsos

  43. Case study: Informedia • next foils: visualization features • by space • by time • by concept 15-415 - C. Faloutsos

  44. geo mapping • automatic place recognition • ambiguity resol. + • lookup 15-415 - C. Faloutsos

  45. 15-415 - C. Faloutsos

  46. time line 15-415 - C. Faloutsos

  47. concept space 15-415 - C. Faloutsos

  48. Multimedia - Detailed outline • multimedia • Motivation / problem definition • Main idea / time sequences • images (color; shape) • sub-pattern matching • automatic feature extraction / FastMap 15-415 - C. Faloutsos

  49. Sub-pattern matching • Problem: find sub-sequences that match the given query pattern 15-415 - C. Faloutsos

  50. $price $price 1 400 1 365 day day $price 30 1 300 day 15-415 - C. Faloutsos

More Related