CS338

CS338 Additional topics

Entertainment: Video on demand Interactive television Games, virtual worlds, etc. Education Distance learning Hypermedia and/or multimodal courseware Digital libraries Communications Teleconferencing Web Emails Business E-commerce E-business E-banking Law enforcement Hypermedia/multimodal search and archive Multimedia criminal archive Military, Intelligence Multimedia databases Virtual simulation of battlefields Medicine Multimodal medical databases Virtual diagnosis Telesurgery Art and Music Digital sound and music Computerized art etc. What Are the Applications?

Media Indexing • Different media modality requires different indexing schemes and possibly uses different data structure in storage in the database • Text --- 0 dimensional media ASCII strings • Audio --- 1 dimensional media • Image --- 2 dimensional media • Video --- 3 dimensional media time y x Frame# y time

Text Fundamentals • One of the most popular modalities • May be viewed as a linear stream of data • Typically represented as an ASCII string for each document • Content indexing still requires certain level of understanding • No general, satisfactory solutions, but the problem is not as acute as in image domain • If certain requirements may be relaxed, there are “general” solutions available • Retrieval problem: • User wants to find documents related to a topic T • The search program typically tries to find the documents in the “document database” that contains the string T • Two essential problems: • Synonymy: Given a word T (i.e., specifying a topic), the word T does not occur anywhere in a document D, even though the document D is in fact closely related to the topic T in question • Polysemy: The same word may mean many different things in different contexts

Basic Terminologies • Term: an indexed word in an archive (i.e., database) • Term space: the whole set of words indexed in an archive • Document: represented by a set of words that are indexed • Document space: the whole set of documents in an archive • Frequency: the reciprocal of the number of occurrences of a word in a particular document • Frequency table: assuming all the words have been preprocessed after stop list and stemming, and assuming there are M words and N documents, the matrix of MN with each entry being the frequency of that word in the document • In all the indexing techniques, an archive database is actually represented by the frequency table • In real applications, the database is huge, so the matrix is also huge • Indexing methods are required to handle this huge matrix

Precision * • D: finite set of documents • A: any algorithm that takes as input, a topic string T, returns as output, a set S of documents • T: finite set of topics, i.e., a finite set of strings • Precision of A w.r.t. the predicate relevantand the topic t T is defined: 1 + card({dD| dA(t)^ relevant(t,d)}) Pt% = 100  ------------------------------------------------- 1 + card({dD | d A(t)}) • Precision of Aw.r.t. the predicate relevant and the document set D as well as the topic set T is defined: tT Pt P% = 100  ------------ card(T) • How many of the answers returned are in fact correct

Recall* • Recall of Aw.r.t. the predicate relevant and the topic tT is defined: 1 + card({dD| dA(t)^ relevant(t,d)}) Rt% = 100  ------------------------------------------------- 1 + card({dD | relevant(t,d)}) • Recall of A w.r.t. the predicate relevant and the document set D as well as the topic set T is defined: tTRt R% = 100  ------------ card(T) • How many of the correct documents are in fact retrieved by the query

Text Indexing • Preprocessing • Key word detection • Proper noun detection • Word expansion • Stemming • Stop list • Light parsing • Named entity tagging • Layout analysis • Indexing • Inverted files • Vector space indexing • Regular vector space indexing • Latent semantic indexing • Signature files

Preprocessing • Key word detection • String matching for a prepared list of words • Sometimes use heuristics, e.g., proper noun detection looking for capitalized words • Stop list • A set of words that do not “discriminate” between the documents in a given archive • There are general stop words (e.g., a, the); but in most situations stop words are archive-specific (e.g., in CS literature, computer could be a stop word) • Needs to explicitly come up with a list, and then excludes them from all the documents by direct string matching • Word stemming • Many words share the same semantic stem meaning but with small syntactic variations (e.g., names, naming, named) • They may be all represented using their stem words (e.g., name) • Requires stemming rules; typically uses heuristics; needs to pay attention to special cases (e.g., run, running, ran)

Preprocessing, Cont’d • Named entity tagging • A method of Light Parsing --- a step towards natural language understanding • A “direct” way to resolve polysemy issue, but more applied to proper noun processing • Many words may have different meanings in different contexts: • I am leaving for Washington, D.C. • One dollar bill is George Washington. • Layout analysis • A multimedia document is typically presented with data in different modalities • A specific presentation specifies a spatial layout indicating spatial correlation between data in different modalities • This correlation is lost in the hypertext source --- needs to find it back --- layout analysis • In general, it is a hard problem; typically relies on heuristics • In certain situations, it is relatively easy to find a solution

Inverted Files • For each term, maintains a list of pointers (posting_list) with each of the pointers points to a document in which this term appears • Typically the inverted files are organized using sophisticated primary-key access methods (e.g., B-trees, hashing) • Pros: • Easy to implement • Fast to execute • Supports synonym indexing (e.g., using threaded list) to a certain degree • Cons: • Storage overhead (can reach up to 300% of the original file size if too much information kept) • Cost of updating and reorganizing the index if the database changes significantly • Still not a complete solution to synonymy problem in a broad sense • Widely used in many applications (most popularly used) and even in different languages

Inverted Files May Not Always Work • It is not a complete solution to solving for the synonymy problem in a broad sense --- it does not address the semantics contained in a document • Example: • Titanic • Maritime aviation tragedies • The semantic correlation between the two documents are obvious, but they don’t share any common words; nor are they synonyms • Requires semantics understanding and correlation • Techniques • NLP --- understanding --- expensive and not reliable • LSI --- directly find correlation --- inexpensive and reliable

Zipf’sLaw* • Assume that all the vocabulary is sorted in terms of word occurrence frequency in decreasing order • Occurrence frequency is inversely proportional to its rank in the sorting 1 f = ---------------- r ln(1.78 V) where f is the occurrence frequency of the word; r is its rank in the sorting; V is the size of the vocabulary • Means that a few vocabulary words appear very often, while the majority of vocabulary words appear once or twice; also true for other languages

Latent Semantic Indexing* • Tries to find a relatively small subset of K words which discriminate between the M documents in the archive • In the literature, K may be as small as 200 • Now only needs to search the k nearest neighbors in K dimensional space, as opposed to in N dimensional space; a significant saving of computation • Question is how to find such a relatively small subset of words that discriminate between the M documents in the archive • Algorithm: • Given the database represented as the matrix A • Singular value decompose this matrix A into U, , V • Keep the first Ksingular values in  to become ’, and accordingly truncate U and V to become U’ and V’ • Now each document may be indexed by the corresponding row vector in U’, which is in a space of the dimension K << N • Given a query, it may be viewed as a document, and thus represented as a vector q • Since U’ = AV’’ , the corresponding query vector in the new term space: q’ = q V’ ’ • Finally, find the first k nearest neighbors in the new space in dimension Kw.r.t. q’ in database U’ -1 T T -1

Diagrammatical Illustration of LSI* A  U = T V NxN NxN MxN MxN U’ A  ’ V’ T KxN MxK KxK MxN

Remaining Question* • Even though K << N and saves significantly in computation and in memory storage, it is still high enough to be indexed in “usual” data structures • The solution is to use “unusual” data structures --- e.g., TV trees

SBC Scheme • Time/Frequency Mapping: a filter bank or FFT to decompose the input audio signal into subbands • Psychoacoustic Model: looks at the subbands as well as the original signal and determines masking thresholds dynamically using psychoacoustic information • Quantizer and Coding: each of the subband samples is quantized and encoded so as to keep the quantization noise below the masking threshold • Frame Packing: assemble all the quantized samples into frames • Frame Unpacking: frames are unpacked • Reconstruction: subband samples are decoded • Frequency/Time Mapping: turn decoded subband signal samples into single signal • Example of SBC: MPEG --- 3 different layers, each is a self-contained SBC coder with its own time-frequency mapping, psychoacoustic model, and quantizer --- tradeoff between computation burden and compression performance • Layer 1: simplest, fastest, but poorest compression • Layer 2:moderate in both computation and compression • Layer 3: most complicated, most expensive, but best compression

Image characteristics

Image Fundamentals • Digital images are like digital audio signals, obtained from two levels of digitization: • Spatial level --- In addition to the Nyquist sampling theorem that governs the resolution, representational geometry also needs to pay attention to for analysis, even though in practice, each sampling point is just a blob • Square • Triangle • Hexagon • Intensity level --- the same as the sampling in audio signals, called quantization; depending on the intensity signal nature, may be signal intensity value or an intensity vector (e.g., color) • Each spatial digitization element is called a pixel; the corresponding signal quantization intensity is called pixel value • A digital image may be viewed as a mathematical matrix, with all the pixel values as the matrix entry values

Basic Image Operations • Algebraic Operations: • Addition: I = I1 + I2 same resolutions, pixelwise addition • Subtraction: I = I1 – I2 same resolutions, pixelwise subtraction • Scalar Multiplication: I = I1 pixelwise multiplication • Scalar Division: I = I1/ pixelwise division • Logic Operations: • AND: I = I1 AND I2 same resolutions, pixelwise, bitwise AND • OR: I = I1 OR I2 same resolutions, pixelwise, bitwise OR • NOT: I = NOT I1 pixelwise, bitwise NOT • Correlation: • Image I: NxN, Mask G: (2k+1)x(2k+1) • Ignoring “boundary effect”: (I  G)(i,j) = [x=-k,k][y=-k,k]I(i+x,j+y)G(x+k,y+k) • How to handle the “boundary effect”? • Ignore the boundary pixels of IG, i.e., counts i,j: k, … , N-k • Expand I to (N+2k)x(N+2k) by adding k rows and columns at the four boundaries of I with pixel values 0 before the correlation • Convolution: • Same as the correlation except for (I  G)(i,j) = [x=-k,k][y=-k,k]I(i-x,j-y)G(x+k,y+k) • Same ways to handle the “boundary effect”

Edge Detection • Edges  discontinuities, caused by: • Changes in surface orientation • Changes in depth • Changes in surface reflectance • Cast shadows • Edge detectors cannot distinguish between the various kinds of discontinuities; nor can they detect illusory edges Depth discontinuities Surface discontinuities Reflectance discontinuities Illumination discontinuities

Edge Detection, Cont’d • Goal: extraction of “significant” edges from images • Hope to distinguish local image edges from 3D edges • Only information available is the image intensity surface • Edges: characterized by rapid changes in the image intensity surface or its derivatives • Observations: • Edges are visually apparent where abrupt changes occur in an image feature (e.g., image brightness or color) • Local edges have a location, an orientation, and a strength (“edge contrast”) • Local edges are only loosely related to lines, curves, object boundaries, occlusion boundaries, etc.

Ideal vs. Real Discontinuities • Ideal “Step” • Ideal “Ramp” • Ideal “Strip” • Ideal “Roof” • Real Edges

Edge Detection Techniques* • Edges are high-frequency components of an image • High-frequency components may be detected by taking derivatives of an image • Taking derivatives of an image may be approximated by taking differences between two adjacent pixels • Typically an edge pixel contains two components: • Strength S = sqrt(Px(i,j)^2 + Py(i,j)^2) • Orientation  = arctan(Py(i,j)/Px(i,j)) • Edge detectors cannot distinguish between “true” edges and noise • Examples of edge detection masks: • 12: dx: dy: • 13: dx: dy: • 22 Roberts • 33 Sobel: -1 1 x -1 1 x -1 -1 0 x 1 0 x 1 x 0 0 x 1 1 0 -1 -1 0 -1 0 1 1 2 1 -2 0 x 2 0 0x 0 -1 -2 -1 -1 0 1

Region Detection • Regions are opposite to edges; look for continuities or homogeneities • Regions stand for low-frequency components in an image • Like edges, in real images, very difficult to distinguish “true region” boundary and noise • Goal: partitioning an image into different regions (i.e., connected components), each having uniform properties in certain defined image features: • Intensity values • Color values • Texture • Local gradient • Formal definition*: a region detection of an image I is a partition of the set of pixels of I into a set of regions {Rj}, j = 1, … , k, s.t. • I = U[j=1,k]Rj, every pixel belongs to one region at least • RiRj = , if i  j, no pixel belongs to more than one region • p connected to p’ for all p, p’  Rj, spatial coherence • For certain predicate P, if P(Rj) true for some j, then P(Rj U Ri) false if i  j, Rj, Ri adjacent

Region Detection Basic Approaches • Two basic approaches • Region Growing • Start with many trivial regions (e.g., pixels) • Merge regions into larger regions based on some similarity criteria • Continue merging till no further merges are possible • Region Splitting • Start with a single large region (e.g., an entire image) • Split into several smaller regions based on some “splitting” criteria • Continue until no further splits are possible (i.e., regions are uniform) • Split and Merge: hybrid approach • Combination: split followed by merges, or vice versa • Split and merge decisions can be either • Local: • A pixel and its immediate neighbors • A region and its immediate neighbors • Global: on the basis of a large number of pixels scattered through the image

Image indexing

Image Indexing:Image Features and Similarity Matching • Image feature based similarity matching is the essential approach to image indexing and retrieval • Every similarity function depends on a set of well defined image features • Image features • Color features • Color histograms, color correlogram • Texture features • Gabor wavelet features, Fractal features • Statistical features • Histograms, moments • Transform features in other domains • Fourier features, wavelet features, fractal features • Intensity profile features • Gaussian features

Histogram • A statistical feature of an image: count the number of pixels for each intensity bucket • An intensity bucket may be a group of intensity values, or may just be each intensity value • A vector that can be displayed as a 1D signal • Example: 3 5 1 2 5 15 16 1 3 12 18 4 0 5 10 15 20 4 5 3 2 Original Image Region Segmentation with T=9 Histogram General Case Histogram Region Segmentation through Thresholding in Valleys

Histogram Manipulations Change pixels’ intensity values by manipulating the histogram; typically global effect • Stretch • Shrink • Slide

Histogram Based Similarity* • Given two images I and I’, and their normalized histograms H and H’ assuming H and H’ both having the same number of buckets n • Function 1: if || H – H’ || < Threshold, then I and I’ are similar • || H – H’ || in L2 distance: [i=1,n](H(i)-H’(i))^2 • || H – H’ || in L1 distance: [i=1,n]|H(i)-H’(i)| • Function 2: H  H’  [-1, 1] and threshold the value for similarity matching • Function 3: normalized intersection of histograms [i=1,n]min(H(i),H’(i)) S(I,I’) = ----------------------------------  [0,1] [i=1,n]H(i) • Pros: insensitive to change in image resolution, histogram size, occlusion, depth, and viewpoints • Cons: expensive • Improvements • Only use a small number of “peaks” in the histogram • Divide whole image into subimages and conduct histogram matching in each subimages • For color images, H is just a set of (e.g., 3) vectors; definitions may be changed accordingly

Moments* • Given image f(x,y), the (p+q)th order moment is defined: m(p,q) = f(x,y)x^py^q, p, q = 0, 1, 2, … • Moment representation theorem: The infinite set of moments {m(p,q), p, q=0, 1, 2, …} uniquely determine f(x,y), and vice versa • Statistic to characterize an image • According to the theorem, only the whole set of the moments can uniquely characterize the image. Can we truncate into the first finite number of moments? • In practice, only a finite number of moments can be used for similarity matching, making it a necessary condition for similarity matching, just like histograms • Given moments m(p,q) of f(x,y) up to the order p+q=N, a “similar” function may be reconstructed g(x,y) = [i,j=0,N]h(i,j)x^iy^j by solving for the unknowns h(i,j) in a set of equations by equating the moments of g(x,y) to m(p,q) • Problem: when more moments are available, we will have to resolve for all the unknowns h(i,j) in the set of equations --- moments defined this way are coupled • The solution is to change to the orthogonal moments

Orthogonal Moments* • Define the orthogonal Legendre polynomials: P_0(x) = 1 P_n(x) = (1/(n!2^n))(d^n (x^2 – 1)^n)/dx^n), n = 1, 2, … [-1,1]P_n(x)P_m(x)dx = 2 (m-n)/(2n+1) (x) = 1, if x = 0; 0, otherwise • Given an image f(x,y), x,y[-1,1] W.L.O.G., the orthogonal moments (p,q) is defined: (p,q) = ((2p+1)(2q+1)/4)  [-1,1]f(x,y)P_p(x)P_q(y)dxdy • Similarly, f(x,y) can be reconstructed as f(x,y) =  [p,q=0,] (p,q)P_p(x)P_q(y) • The relationship between (p,q) and m(p,q) is: P_m(x) = [j=0,m]c(m,j) x^j (p,q) = ((2p+1)(2q+1)/4) [j=0,p] [k=0,q]c(p,j)c(q,k) m(j,k) • c(m,j) is the combination number choosing j from m • Now an approximation to f(x,y) can be obtained by truncating (p,q) at a given finite order p+q=N f(x,y)  g(x,y) = [p=0,N] [q=0,N-p] (p,q) P_p(x) P_q(y) • (p,q) do not need to be updated when more or less moments are available

Moment Invariants* • Certain functions of moments are invariant under geometric transforms such as translation, scaling, and rotation • Goal: these invariant functions may be used for image similarity matching • Translation: define the central moments (p,q) = (x-)^p(y-)^q f(x,y)dxdy,  = m(1,0)/m(0,0),  = m(0,1)/m(0,0) • Scaling: under a scale change, x’ = ax, y’ = ay, the moments of f(ax, ay) change to ’(p,q) = (p,q) / a^(p+q+2); the normalized moments defined below are invariant to scaling change: (p,q) = ’(p,q) / (’(0,0))^((p+q+2)/2) • Rotation and reflection: change coordinates x’ = a1x + a2y, y’ = a3x + a4y, the transformed moments (p,q) are invariant in terms of the following functions for rotation (a1=a4=cos , a2=-a3=sin ) or reflection (a1=-a4= cos , a2=a3= sin ): • For first order moments: (0,1) = (1,0) = 0 • For second order moments: 1 = (2,0) + (0,2) 2 = ((2,0) - (0,2))^2 - 4 (1,1)^2

Gaussian Invariants* • Observation: Intensity functions are not continuous • Derivative of a possibly discontinuous function can be made well posed if it is convolved with the derivative of a smooth function • Use Gaussian function as the smooth function G(x,) = (1/(2^2)) exp(-x^2/(2^2)) where taking 1D case as an example; • in 2D, x is replaced by a vector (x,y) • Given an image I, its complete n derivatives at scale  at point x: I_[i1,i2,…,in, ](x) = (I  G_[i1,i2,…,In](x, ) • Where i1,…,in are x, or y, indicating the order of the derivatives; and  convolution • Define J^N[I](x, ) = {I_[i1,…,in, ]| n = 0,…,N} called N-Jet • Example: N=2 J^2[I](x, ) = {I_ (x), I_[x, ](x), I_[y, ](x), I_[xx, ](x), I_[xy, ](x), I_[yy, ](x)} • Theorem: for any order N, the local N-Jet at scale  locally contains all the information required to reconstruct I at the scale of observation  up to order N

Gaussian Invariants, Cont’d* • In practice, in order to allow scale-invariant, compute the N-Jet at each location for multiple scale {J^N[I](x, 1), J^N[I](x, 2), … ,J^N[I](x, K)} • Example: N=2, K=3, the set of invariants at each location: d0 = I --- intensity d1 = (I_x)^2 + (I_y)^2 --- magnitude d2 = I_[xx] + I_[yy] --- Laplacian d3 = I_[xx]I_xI_x + 2I_[xy] I_xI_y + I_[yy]I_yI_y d4 = (I_[xx])^2 + 2(I_[xy])^2 + (I_[yy])^2 • d0 is omitted as it is sensitive to gray-level shifts • For each location, sample d1 to d4 for three scale (1, 2, 3) --- forming a 12 element vector D • For each image, uniformly sample each location, and for each location, D is computed • Similarity matching becomes finding whether an image contains a set of D’s that are similar to the D’s from the query image

Fourier Transform* • Motivation: decomposition of a function f(x) into the weighted summation of an infinite number of sine and cosine basis functions • Formally, given f(x), its Fourier transform F(u) is defined: FT: F(u) = f(x)exp(-j2xu)dx IFT: f(x) = F(u)exp(j2ux)du • Here j = sqrt(-1)  F(u) in general is a complex function • Similarly, given 2D discrete image f(m,n) and its Fourier transform F(k,l) are defined: F(k,l) = [[m=0,M-1][n=0,N-1]f(m,n)exp(-j2km/M)exp(-j2ln/N)]R(M,N) f(m,n) = (1/(MN))[[m=0,M-1][n=0,N-1]F(k,l)exp(j2mk/M)exp(j2nl/N)]R(M,N) • Where the image f(m,n) are limited to m = 0, …, M-1; n = 0, …, N-1 • R(M,N) is defined 1 for m = 0, … , M-1 and n = 0, …, N-1, and 0 elsewhere

Fourier Features* • Fourier transform properties: given f(x)  F(u) f(ax)  (1/|a|) F(u/a) f(x-a)  F(u)exp(-j2a) For 2D f(x,y)  F(u,v)  f(x,y) rotates   F(u,v) rotates  • Given an image f(x,y), index |F(u,v)|  • Location invariant (after using central coordinates) • Scaling invariant (after normalization) • Rotation invariant (after using central coordinates in frequency domain) • Caution: only indexing to |F(u,v)| may cause false indexing • Given f(x,y)  F(u,v) = |F(u,v)| exp(j(u,v)) • f1(x,y)  f2(x,y)  |F1(u,v)| = |F2(u,v)| • In fact, most information is contained in (u,v) • Given constant A, IFT{A exp(j(u,v)}  f(x,y) • But IFT{|F(u,v)| exp(jA)}  nothing

Color Fundamentals • Colorimetry --- Psychophysics of color perception • Basic result: Trichromatic Theory --- Most of the colors observed in daily life can be perfectly reproduced by a mixture of three fixed colors, and that the proportions of the mixtures are uniquely determined • Only “most of the colors”, not all the colors --- there are exceptions • “Mixture” means algebraic addition, i.e., components could be negative • The three “basic” colors are called primaries • The choices of primaries are broad, not unique, as long as independent, i.e., none of them may be obtained from the other two • Mixtures obey addition and proportional laws, i.e., if x, y, z are the primaries, then • U = ax+by+cz, V = mx+ny+pz U+V = (a+m)x+(b+n)y+(c+p)z • U = ax+by+cz  sU = sax+sby+scz • Metamerism: different spectral energy distributions can yield an identical color

Video

MPEG Standard • Only specified as a standard --- actual CODECs are up to many different algorithms, most of them are proprietary • All MPEG algorithms are intended for both applications • Asymmetric: frequent use of decompression process while compression process is performed once (e.g., movie on demand, electronic publishing, e-education, distance learning) • Symmetric: equal use of compression and decompression processes (e.g., multimedia mail, video conferencing) • Decoding is easy • MPEG1 decoding in software on most platforms • Hardware decoders widely available with low prices • Windows graphics accelerators with MPEG decoding now entering market (e.g., Diamond) • Encoding is expensive • Sequential software encoders are 20:1 real-time • Real-time encoders use parallel processing • Real-time hardware encoders are expensive • MPEG standard consists of 3 parts: • Synchronization and multiplexing of video and audio • Video • Audio

Compression Mode of H.261* • Selection depends on answers to several key questions: • Should a MC be transmitted? • Inter vs. Intra compression? • Should the quantizer step size be changed? • Specifically, selection is based on the following values • Variance of the original macroblock • The macroblock difference (bd) • The displaced macroblock difference (dbd) • Selection algorithm: • If the variance of dbd < bd as determined by a threshold, select mode Inter + MC, and the MV needs to be transmitted as side information • Else, MV will not be transmitted; if the original MB has a smaller variance, select Intra mode where DCT of each 8x8 block of the original picture elements are computed; else, select Inter mode (with zero MV), and the difference blocks (prediction error) are DCT encoded

H.261 Coding Scheme* • In each MB, each block is 64 point DCT coded; this applies to the four luminance blocks and the two chroma blocks (U and V) • A variable thresholding is applied before quantization to increase the number of zero coefficients; the accuracy of the coefficients is 12 bits with dynamic range in [-2048, 2047] • Within a MB the same quantizer is used for all coefficients except for the Intra DC; the same quantizer is used for both luminance and chromainance coding; the Intra DC coefficient is separately quantized • After quantization, coefficients are zigzag coded by a series of pairs (the run length of the number of zeros preceding the coefficient, the coefficient value) • Example: 3 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 …  (0, 3), (1, 2), (7, 1), EOB • In most implementations the quantizer step size is adjusted based on a measure of buffer fullness to obtain the desired bit rate; the buffer size is chosen not to exceed the maximum allowable coding delay (150 ms)

Comparison b/w H.261 and MPEG-1* H.261 MPEG-1 Sequential access Random access One basic frame rate Flexible frame rate CIF and QCIF images only Flexible image size I and P frames only I, P, and B frames MC over 1 frame MC over 1 or more frames 1 pixel MV accuracy ½ pixel MV accuracy Variable threshold + uniform quantization Quantization matrix (predefined) No GOP structure GOP structure GOB structure Slice structure

 Differences b/w MPEG-2 and MPEG-1 Video* • Bandwidth requirement: MPEG-1 --- 1.2 Mbps  MPEG-2 2-20 Mbps • MB structures: alternative subsampling of the chroma channels  3 subsampling formats • MPEG-2 accepts both progressive and interlaced inputs • Progressive video: like MPEG-1, all pictures are frame pictures • Interlaced video: encoder consists of a sequence of fields; two options: • Every field is encoded independently (field pictures) • Two fields are encoded together as a composite frame (frame pictures) • Allowed to switch between frame pictures and field pictures on a frame to frame basis  frame encoding is preferred for relatively still images while field encoding is preferred for images with significant motion 4:2:0 4:2:2 4:4:4                                                                             1 MB = 6 blocks (4Y,1Cr,1Cb) 1 MB = 8 blocks (4Y,2Cr,2Cb) 1 MB = 12 blocks (4Y,4Cr,4Cb)

MPEG-4 • Finalized in October of 1998; available in standards in early 1999 • Technical features: • Represent units of aural, visual, or audiovisual content, called “media objects”. These media objects can be of natural or synthetic origin; this means they could be recorded with a camera or microphone, or generated with a computer • Describe the composition of these objects to create compound media objects that form audiovisual scenes • Multiplex and synchronize the data associated with media objects, so that they can be transported over network channels providing a QoS appropriate for the nature of the specific media objects • Interact with the audiovisual scene generated at the receiver’s end • Enables • Authors to produce content with greater reusability and flexibility • Network service providers to have transparent information to maintain QoS • End users to interact with content at higher levels within the limits set by the authors

MPEG History • MPEG-1 is targeted for video CD-ROM • MPEG-2 is targeted for Digital Television • MPEG-3 was initiated for HDTV, later was found to be absorbed into MPEG-2  abandoned • MPEG-4 targeted to provide the standardized technological elements enabling the integration of the production, distribution, and content access paradigms of the fields of digital television, interactive graphics, and interactive multimedia • MPEG-7, formally named “Multimedia Content Description Interface”, is targeted to create a standard for describing the multimedia content data that will support some degree of interpretation of the information’s meaning, which can be passed onto, or accessed by, a device or a computer code; MPEG-7 is not aimed at any one application in particular; rather, the elements that MPEG-7 standardizes will support as broad a range of applications as possible • MPEG-21 is now under design and review

Buffer Retrieval Scheduling* • FCFS: Given an interval of time, process data (sectors) in the order of request arrival • Seek time = [i=1,k]|s_i – s_(i-1)|/v • s_i --- sector location • v --- head velocity • SCAN: Given an interval of time, first sort all the requests in terms of sector numbers, then process requests from the beginning • Seek time is much smaller than that of FCFS, but needs sorting • Example: requests: 25, 5, 35, 5, 10, and initially at 1 • SCAN-EDF: Also consider deadlines of requests; process requests with earlier deadlines first; in each “group” of requests, use SCAN • Example: Job ID Sector Deadline 1 15 10 2 20 5 3 10 10 4 35 10 5 50 5 Processes {2, 5} group job first, then {1, 3, 4}. For each group, use SCAN

Placement Algorithms* • How to layout the data over a CD-ROM to optimize the retrieval? • Real time files (RTFs): a triple (lf, bf, pf) such that: • lf: number of blocks of the file f • bf: number of sectors in each block of the file f • pf: total number of sectors in each block of file f Example: (4, 2, 7)  • Start Assignment Problem (SAP) • Given start position of a RTF, the sectors of block i: occi(f) = {j|st(f) + (i-1)pf j  st(f) + (i-1)pf + bf – 1} • Note that the sectors number from 0 • All the sectors of f are: occ(f) = [i=1,lf]occi(f) Example: (4, 2, 7), st(f) = 3  ooc1(f) = {3, 4}, ooc2(f) = {10, 11}, ooc3(f) = {17, 18}, ooc4(f) = {24, 25} • Non-collision Axiom: for all fi, fj  F (RTFs map to {1,…,N} sectors), fi  fj occ(fi)occ(fj)=. If such a function st exists, it is called a placement function • SAP is trying to find st(fi) for all fi  F such that the non-collision axiom follows  NP-Hard problem in general

CS338

CS338

Presentation Transcript