1 / 30

300 likes | 446 Views

Multimedia DBs. PAA and APCA. Another approach: segment the time series into equal parts, store the average value for each part. Use an index to store the averages and the segment end points. X. X. X. X'. X'. X'. SVD. DFT. DWT. eigenwave 0. 0. Haar 0. eigenwave 1. 1. 0. 0. 0.

Download Presentation
## Multimedia DBs

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**PAA and APCA**• Another approach: segment the time series into equal parts, store the average value for each part. • Use an index to store the averages and the segment end points**X**X X X' X' X' SVD DFT DWT eigenwave 0 0 Haar 0 eigenwave 1 1 0 0 0 20 20 20 80 80 80 100 100 100 40 40 40 140 140 140 60 60 60 120 120 120 Haar 1 2 eigenwave 2 Haar 2 3 eigenwave 3 Haar 3 4 eigenwave 4 5 Haar 4 6 eigenwave 5 Haar 5 7 eigenwave 6 Haar 6 eigenwave 7 Haar 7 Feature Spaces Korn, Jagadish, Faloutsos 1997 Chan & Fu 1999 Agrawal, Faloutsos, Swami 1993**sv6**sv1 value axis sv7 sv5 sv4 sv2 sv3 sv8 time axis Piecewise Aggregate Approximation (PAA) Original time series (n-dimensional vector) S={s1, s2, …, sn} n’-segment PAA representation (n’-d vector) S = {sv1 ,sv2, …, svn’} PAA representation satisfies the lower bounding lemma (Keogh, Chakrabarti, Mehrotra and Pazzani, 2000; Yi and Faloutsos 2000)**sv6**sv1 sv7 sv5 sv4 sv2 sv3 sv8 Adaptive Piecewise Constant Approximation (APCA) sv3 n’/2-segment APCA representation (n’-d vector) S= { sv1, sr1, sv2, sr2, …, svM , srM } (M is the number of segments = n’/2) sv1 sv2 sv4 sr1 sr2 sr3 sr4 Can we improve upon PAA? n’-segment PAA representation (n’-d vector) S = {sv1 ,sv2, …, svN}**Reconstruction error PAAReconstruction error APCA**APCA approximates original signal better than PAA Improvement factor = 3.77 1.69 1.21 1.03 3.02 1.75**APCA Representation can be computed efficiently**• Near-optimal representation can be computed in O(nlog(n)) time • Optimal representation can be computed in O(n2M) (Koudas et al.)**Exact (Euclidean) distance D(Q,S)**S Q S S Q Q’ DLB(Q’,S) D(Q,S) D(Q,S) DLB(Q’,S) Distance Measure Lower bounding distance DLB(Q,S)**R1**R1 R3 R2 R4 S2 S5 S3 R3 S1 S4 S6 R4 R2 R3 R2 S8 R4 S9 S8 S7 S9 S1 S2 S3 S4 S5 S6 S7 2M-dimensional APCA space Index on 2M-dimensional APCA space Any feature-based index structure can used (e.g., R-tree, X-tree, Hybrid Tree)**MINDIST(Q,R2)**MINDIST(Q,R3) R1 S5 S2 R3 S3 S1 S4 Q S6 MINDIST(Q,R4) R2 S8 R4 S9 S7 k-nearest neighbor Algorithm • For any node U of the index structure with MBR R, MINDIST(Q,R) £ D(Q,S) for any data item S under U**smax3**smax1 smax2 smax4 smin1 smin3 smin2 smin4 Index Modification for MINDIST Computation APCA point S= { sv1, sr1, sv2, sr2, …, svM, srM } R1 S2 S5 sv3 R3 S3 S1 S6 S4 sv1 R2 S8 R4 sv2 S9 sv4 S7 sr2 sr3 sr1 sr4 APCA rectangle S= (L,H) where L= { smin1, sr1, smin2, sr2, …, sminM, srM } and H = { smax1, sr1, smax2, sr2, …, smaxM, srM }**REGION 2**H= { h1, h2, h3, h4 , h5, h6 } h3 value axis l3 h1 l1 h5 REGION 3 l5 REGION 1 l2 l4 h4 l6 h2 h6 L= { l1, l2, l3, l4 , l5, l6 } time axis MBR Representation in time-value space We can view the MBR R=(L,H) of any node U as two APCA representations L= { l1, l2, …, l(N-1), lN }and H= { h1, h2, …, h(N-1), hN }**REGION i**h(2i-1) l(2i-1) h2i l(2i-2)+1 REGION 2 h3 l3 h1 value axis REGION 3 h5 l1 l5 REGION 1 l2 l4 h4 h6 h2 l6 time axis Regions M regions associated with each MBR; boundaries of ith region:**t1**t2 Regions • ith region is active at time instant t if it spans across t • The value st of any time series S under node U at time instant t must lie in one of the regions active at t (Lemma 2) REGION 2 h3 value axis l3 h1 REGION 3 h5 l1 l5 REGION 1 l2 l4 h4 h6 h2 l6 time axis**t1**MINDIST(Q,R) = MINDIST Computation For time instant t, MINDIST(Q, R, t) = minregion G active at t MINDIST(Q,G,t) MINDIST(Q,R,t1) =min(MINDIST(Q, Region1, t1), MINDIST(Q, Region2, t1)) =min((qt1 - h1)2 , (qt1 - h3)2 ) =(qt1 - h1)2 REGION 2 h3 l3 h1 REGION 3 h5 l1 l5 REGION 1 l2 l4 h4 h6 h2 l6 Lemma3: MINDIST(Q,R) £ D(Q,C) for any time series C under node U**Approximate Search**• A simpler definition of the distance in the feature space is the following: • But there is one problem… what? DLB(Q’,S)**Multimedia dbs**• A multimedia database stores also images • Again similarity queries (content based retrieval) • Extract features, index in feature space, answer similarity queries using GEMINI • Again, average values help!**Images - color**what is an image? A: 2-d array**Images - color**Color histograms, and distance function**Images - color**Mathematically, the distance function is:**Problem: ‘cross-talk’:**Features are not orthogonal -> SAMs will not work properly Q: what to do? A: feature-extraction question Images - color**possible answers:**avg red, avg green, avg blue it turns out that this lower-bounds the histogram distance -> no cross-talk SAMs are applicable Images - color**Images - color**time performance: seq scan w/ avg RGB selectivity**distance function: Euclidean, on the area, perimeter, and 20**‘moments’ (Q: how to normalize them? Images - shapes**distance function: Euclidean, on the area, perimeter, and 20**‘moments’ (Q: how to normalize them? A: divide by standard deviation) Images - shapes**distance function: Euclidean, on the area, perimeter, and 20**‘moments’ (Q: other ‘features’ / distance functions? Images - shapes**distance function: Euclidean, on the area, perimeter, and 20**‘moments’ (Q: other ‘features’ / distance functions? A1: turning angle A2: dilations/erosions A3: ... ) Images - shapes**distance function: Euclidean, on the area, perimeter, and 20**‘moments’ Q: how to do dim. reduction? Images - shapes**distance function: Euclidean, on the area, perimeter, and 20**‘moments’ Q: how to do dim. reduction? A: Karhunen-Loeve (= centered PCA/SVD) Images - shapes**Performance: ~10x faster**Images - shapes log(# of I/Os) all kept # of features kept

More Related