1 / 30

# Multimedia DBs

Multimedia DBs. PAA and APCA. Another approach: segment the time series into equal parts, store the average value for each part. Use an index to store the averages and the segment end points. X. X. X. X'. X'. X'. SVD. DFT. DWT. eigenwave 0. 0. Haar 0. eigenwave 1. 1. 0. 0. 0.

## Multimedia DBs

E N D

### Presentation Transcript

1. Multimedia DBs

2. PAA and APCA • Another approach: segment the time series into equal parts, store the average value for each part. • Use an index to store the averages and the segment end points

3. X X X X' X' X' SVD DFT DWT eigenwave 0 0 Haar 0 eigenwave 1 1 0 0 0 20 20 20 80 80 80 100 100 100 40 40 40 140 140 140 60 60 60 120 120 120 Haar 1 2 eigenwave 2 Haar 2 3 eigenwave 3 Haar 3 4 eigenwave 4 5 Haar 4 6 eigenwave 5 Haar 5 7 eigenwave 6 Haar 6 eigenwave 7 Haar 7 Feature Spaces Korn, Jagadish, Faloutsos 1997 Chan & Fu 1999 Agrawal, Faloutsos, Swami 1993

4. sv6 sv1 value axis sv7 sv5 sv4 sv2 sv3 sv8 time axis Piecewise Aggregate Approximation (PAA) Original time series (n-dimensional vector) S={s1, s2, …, sn} n’-segment PAA representation (n’-d vector) S = {sv1 ,sv2, …, svn’} PAA representation satisfies the lower bounding lemma (Keogh, Chakrabarti, Mehrotra and Pazzani, 2000; Yi and Faloutsos 2000)

5. sv6 sv1 sv7 sv5 sv4 sv2 sv3 sv8 Adaptive Piecewise Constant Approximation (APCA) sv3 n’/2-segment APCA representation (n’-d vector) S= { sv1, sr1, sv2, sr2, …, svM , srM } (M is the number of segments = n’/2) sv1 sv2 sv4 sr1 sr2 sr3 sr4 Can we improve upon PAA? n’-segment PAA representation (n’-d vector) S = {sv1 ,sv2, …, svN}

6. Reconstruction error PAAReconstruction error APCA APCA approximates original signal better than PAA Improvement factor = 3.77 1.69 1.21 1.03 3.02 1.75

7. APCA Representation can be computed efficiently • Near-optimal representation can be computed in O(nlog(n)) time • Optimal representation can be computed in O(n2M) (Koudas et al.)

8. Exact (Euclidean) distance D(Q,S) S Q S S Q Q’ DLB(Q’,S) D(Q,S) D(Q,S) DLB(Q’,S) Distance Measure Lower bounding distance DLB(Q,S)

9. R1 R1 R3 R2 R4 S2 S5 S3 R3 S1 S4 S6 R4 R2 R3 R2 S8 R4 S9 S8 S7 S9 S1 S2 S3 S4 S5 S6 S7 2M-dimensional APCA space Index on 2M-dimensional APCA space Any feature-based index structure can used (e.g., R-tree, X-tree, Hybrid Tree)

10. MINDIST(Q,R2) MINDIST(Q,R3) R1 S5 S2 R3 S3 S1 S4 Q S6 MINDIST(Q,R4) R2 S8 R4 S9 S7 k-nearest neighbor Algorithm • For any node U of the index structure with MBR R, MINDIST(Q,R) £ D(Q,S) for any data item S under U

11. smax3 smax1 smax2 smax4 smin1 smin3 smin2 smin4 Index Modification for MINDIST Computation APCA point S= { sv1, sr1, sv2, sr2, …, svM, srM } R1 S2 S5 sv3 R3 S3 S1 S6 S4 sv1 R2 S8 R4 sv2 S9 sv4 S7 sr2 sr3 sr1 sr4 APCA rectangle S= (L,H) where L= { smin1, sr1, smin2, sr2, …, sminM, srM } and H = { smax1, sr1, smax2, sr2, …, smaxM, srM }

12. REGION 2 H= { h1, h2, h3, h4 , h5, h6 } h3 value axis l3 h1 l1 h5 REGION 3 l5 REGION 1 l2 l4 h4 l6 h2 h6 L= { l1, l2, l3, l4 , l5, l6 } time axis MBR Representation in time-value space We can view the MBR R=(L,H) of any node U as two APCA representations L= { l1, l2, …, l(N-1), lN }and H= { h1, h2, …, h(N-1), hN }

13. REGION i h(2i-1) l(2i-1) h2i l(2i-2)+1 REGION 2 h3 l3 h1 value axis REGION 3 h5 l1 l5 REGION 1 l2 l4 h4 h6 h2 l6 time axis Regions M regions associated with each MBR; boundaries of ith region:

14. t1 t2 Regions • ith region is active at time instant t if it spans across t • The value st of any time series S under node U at time instant t must lie in one of the regions active at t (Lemma 2) REGION 2 h3 value axis l3 h1 REGION 3 h5 l1 l5 REGION 1 l2 l4 h4 h6 h2 l6 time axis

15. t1 MINDIST(Q,R) = MINDIST Computation For time instant t, MINDIST(Q, R, t) = minregion G active at t MINDIST(Q,G,t) MINDIST(Q,R,t1) =min(MINDIST(Q, Region1, t1), MINDIST(Q, Region2, t1)) =min((qt1 - h1)2 , (qt1 - h3)2 ) =(qt1 - h1)2 REGION 2 h3 l3 h1 REGION 3 h5 l1 l5 REGION 1 l2 l4 h4 h6 h2 l6 Lemma3: MINDIST(Q,R) £ D(Q,C) for any time series C under node U

16. Approximate Search • A simpler definition of the distance in the feature space is the following: • But there is one problem… what? DLB(Q’,S)

17. Multimedia dbs • A multimedia database stores also images • Again similarity queries (content based retrieval) • Extract features, index in feature space, answer similarity queries using GEMINI • Again, average values help!

18. Images - color what is an image? A: 2-d array

19. Images - color Color histograms, and distance function

20. Images - color Mathematically, the distance function is:

21. Problem: ‘cross-talk’: Features are not orthogonal -> SAMs will not work properly Q: what to do? A: feature-extraction question Images - color

22. possible answers: avg red, avg green, avg blue it turns out that this lower-bounds the histogram distance -> no cross-talk SAMs are applicable Images - color

23. Images - color time performance: seq scan w/ avg RGB selectivity

24. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them? Images - shapes

25. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them? A: divide by standard deviation) Images - shapes

26. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions? Images - shapes

27. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions? A1: turning angle A2: dilations/erosions A3: ... ) Images - shapes

28. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction? Images - shapes

29. distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction? A: Karhunen-Loeve (= centered PCA/SVD) Images - shapes

30. Performance: ~10x faster Images - shapes log(# of I/Os) all kept # of features kept

More Related