Multimedia dbs
Download
1 / 36

Multimedia DBs - PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on

Multimedia DBs. Multimedia dbs. A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find the images in the database that are similar (or you can “describe” the query image)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Multimedia DBs' - nevaeh


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Multimedia dbs1
Multimedia dbs

  • A multimedia database stores text, strings and images

  • Similarity queries (content based retrieval)

    • Given an image find the images in the database that are similar (or you can “describe” the query image)

  • Extract features, index in feature space, answer similarity queries using GEMINI

  • Again, average values help!

    (Used QBIC –IBM Almaden)


Image features
Image Features

  • Features extracted from an image are based on:

    • Color distribution

    • Shapes and structure

    • …..


Images color
Images - color

what is an image?

A: 2-d RGB array


Images color1
Images - color

Color histograms,

and distance function


Images color2
Images - color

Mathematically, the distance function between

a vector x and a query q is:

D(x, q) = (x-q)T A (x-q) = S aij (xi-qi) (xj-qj)

A=I ?


Images color3

Problem: ‘cross-talk’:

Features are not orthogonal ->

SAMs will not work properly

Q: what to do?

A: feature-extraction question

Images - color


Images color4

possible answers:

avg red, avg green, avg blue

it turns out that this lower-bounds the histogram distance ->

no cross-talk

SAMs are applicable

Images - color


Images color5
Images - color

time

performance:

seq scan

w/ avg RGB

selectivity


Images shapes

distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

(Q: how to normalize them?

Images - shapes


Images shapes1

distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

(Q: how to normalize them?

A: divide by standard deviation)

Images - shapes


Images shapes2

distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

(Q: other ‘features’ / distance functions?

Images - shapes


Images shapes3

distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

(Q: other ‘features’ / distance functions?

A1: turning angle

A2: dilations/erosions

A3: ... )

Images - shapes


Images shapes4

distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

Q: how to do dim. reduction?

Images - shapes


Images shapes5

distance function: Euclidean, on the area, perimeter, and 20 ‘moments’

Q: how to do dim. reduction?

A: Karhunen-Loeve (= centered PCA/SVD)

Images - shapes


Images shapes6

Performance: ~10x faster ‘moments’

Images - shapes

log(# of I/Os)

all kept

# of features kept


Dimensionality reduction
Dimensionality Reduction ‘moments’

  • Many problems (like time-series and image similarity) can be expressed as proximity problems in a high dimensional space

  • Given a query point we try to find the points that are close…

  • But in high-dimensional spaces things are different!


Effects of high dimensionality
Effects of High-dimensionality ‘moments’

  • Assume a uniformly distributed set of points in high dimensions [0,1]d

  • Let’s have a query with length 0.1 in each dimension  query selectivity in 100-d 10-100

  • If we want constant selectivity (0.1) the length of the side must be ~1!


Effects of high dimensionality1
Effects of High-dimensionality ‘moments’

  • Surface is everything!

  • Probability that a point is closer than 0.1 to a (d-1) dimensional surface

    • D=2 0.36

    • D = 10 ~1

    • D=100 ~1


Effects of high dimensionality2
Effects of High-dimensionality ‘moments’

  • Number of grid cells and surfaces

    • Number of k-dimensional surfaces in a d-dimensional hypercube

    • Binary partitioning  2d cells

  • Indexing in high-dimensions is extremely difficult “curse of dimensionality”


X tree
X-tree ‘moments’

  • Performance impacted by the amount of overlap between index nodes

    • Need to follow different paths

    • Overlap, multi-overlap, weighted overlap

  • R*-tree when overlap is small

  • Sequential access when overlap is large

  • When an overflow occurs

    • Split into two nodes if overlap is small

    • Otherwise create a super-node with twice the capacity

    • Tradeoffs made locally over different regions of data space

  • No performance comparisons with linear scan!


Pyramid tree
Pyramid Tree ‘moments’

  • Designed for Range queries

  • Map each d-dimensional point to 1-d value

  • Build B+-tree on 1-d values

  • A range query is transformed into a set of 1-d ranges

  • More efficient than X-tree, Hilbert order, and sequential scan


Pyramid transformation
Pyramid transformation ‘moments’

pyramids

  • 2d pyramids with top at

  • center of data-space

  • points in different pyramids

  • ordered based on pyramid id

  • points within a pyramid

  • ordered based on height

  • value(v) = pyramid(v) + height(v)


Vector approximation va file
Vector Approximation (VA) file ‘moments’

  • Tile d-dimensional data-space uniformly

  • A fixed number of bits in each dimensions (8)

  • 256 partitions along each dimension

  • 256d tiles

  • Approximate each point by corresponding tile

  • size of approximation = 8d bits = d bytes

  • size of each point = 4d bytes (assuming a word per dimension)

  • 2-step approach, the first using VA file


Simple nn searching
Simple NN searching ‘moments’

  • δ = distance to kth NN so far

  • For each approximation ai

    • If lb(q,ai) < δ then

      • Compute r = distance(q,vi)

      • If r < δ then

        • Add point i to the set of NNs

        • Update δ

  • Performance based on ordering of vectors and their approximations


Near optimal nn searching
Near-optimal NN searching ‘moments’

  • δ = kth distant ub(q,a) so far

  • For each approximation ai

    • Compute lb(q,ai) and ub(q,ai)

    • If lb(q,ai) <= δ then

      • If ub(q,ai) < δ then

        • Add point i to the set of NNs

        • Update δ

        • InsertHeap(Heap,lb(q,ai),i)


Near optimal nn searching 2
Near-optimal NN searching (2) ‘moments’

  • δ = distance to kth NN so far

  • Repeat

    • Examine the next entry (li,i) from the heap

    • If δ < li then break

    • Else

      • Compute r = distance(q,vi)

      • If r < δ then

        • Add point i to the set of NNs

        • Update δ

    • Forever

  • Sub-linear (log n) vectors after first phase


Ss tree and sr tree
SS-tree and SR-tree ‘moments’

  • Use Spheres for index nodes (SS-tree)

    • Higher fanout since storage cost is reduced

  • Use rectangles and spheres for index nodes

    • Index node defined by the intersection of two volumes

    • More accurate representation of data

    • Higher storage cost


Metric tree m tree
Metric Tree (M-tree) ‘moments’

  • Definition of a metric

    • d(x,y) >= 0

    • d(x,y) = d(y,x)

    • d(x,y) + d(y,z) >= d(x,z)

    • d(x,x) = 0

  • Non-vector spaces

    • Edit distance

    • d(u,v) = sqrt ((u-v)TA(u-v) ) used in QBIC


Basic idea
Basic idea ‘moments’

x,d(x,p),r(x)

y,d(y,p),r(y)

Parent p

y

x

d(y,z) <= r(y)

z

Index entry = (routing object, distance to parent,covering radius)

All objects in subtree are within a distance of “covering radius”

from routing object.


Range queries
Range queries ‘moments’

x,d(x,p),r(x)

y,d(y,p),r(y)

Parent p

y

Query q with range t

x

t

q

z

d(q,z) >= d(q,y) - d(y,z)

d(y,z) <= r(y)

So, d(q,z) >= d(q,y) -r(y)

if d(q,y) - r(y) > t then d(q,z) > t

Prune subtree y if d(q,y) - r(y) > t (C1)


Range queries1
Range queries ‘moments’

x,d(x,p),r(x)

y,d(y,p),r(y)

Parent p

y

Query q with range t

x

t

q

z

Prune subtree y if d(q,y) - r(y) > t (C1)

d(q,y) >= d(q,p) - d(p,y)

d(q,y) >= d(p,y) - d(q,p)

So, d(q,y) >= |d(q,p) - d(p,y)|

if |d(q,p) - d(p,y)| - r(y) > t then d(q,y) - r(y) > t

Prune subtree y if |d(q,p) - d(p,y)| - r(y) > t (C2)


Range query algorithm
Range query algorithm ‘moments’

  • RQ(q, t, Root, Subtrees S1, S2, …)

    • For each subtree Si

      • prune if condition C2 holds

      • otherwise compute distance to root of Si and prune if condition C1 holds

      • otherwise search the children of Si


Nearest neighbor query
Nearest neighbor query ‘moments’

  • Maintain a priority list of k NN distances

  • Minimum distance to a subtree with root x dmin(q,x) = max(d(q,x) - r(x), 0)

    • |d(q,p) - d(p,x)| - r(x) <= d(q,x) - r(x)

    • may not need to compute d(q,x)

  • Maximum distance to a subtree with root x dmax(q,x) = d(q,x) + r(x)

x

q

d(q,z) + r(x) >= d(q,x)

d(q,z) >= d(q,x) - r(x)

r(x)

d(q,z) <= d(q,x) + r(x)

z


Nearest neighbor query1
Nearest neighbor query ‘moments’

  • Maintain an estimate dp of the kth smallest maximum distance

  • Prune a subtree x if dmin(q,x) >= dp


References
References ‘moments’

  • Christos Faloutsos, Ron Barber, Myron Flickner, Jim Hafner, Wayne Niblack, Dragutin Petkovic, William Equitz: Efficient and Effective Querying by Image Content. JIIS 3(3/4): 231-262 (1994)

  • Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data. VLDB 1996: 28-39

  • Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel: The Pyramid-Technique: Towards Breaking the Curse of Dimensionality. SIGMOD Conference 1998: 142-153

  • Roger Weber, Hans-Jörg Schek, Stephen Blott: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces. VLDB 1998: 194-205

  • Paolo Ciaccia, Marco Patella, Pavel Zezula: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997: 426-435


ad