未经压缩的图形、视频和音频数据需要非常可观的存储容量，即便使用光盘存储技术，未压缩过去时视频常常是不实用的。

存储空间 • 未经压缩的图形、视频和音频数据需要非常可观的存储容量，即便使用光盘存储技术，未压缩过去时视频常常是不实用的。 • Uncompressed graphics ,audio and video data require considerable storage capacity which in the case of uncompressed video is often not even feasible given today’s CD technology .

编码需求 • 图像比文字需要更多的存储空间，音频和视频数据更是如此。除了占用大量的存储空间外，连续媒体的通信量也是巨大的。 • Images have considerably higher storage requirements than text; audio and video have even more demanding properties for data storage.Not only is a huge amount of storage required, but the data rates for the communication of continuous media are also significant .

为比较不同视觉媒体（文字、图形和图像）数据的存储和带宽需求，以求描述都是基于典型屏幕窗口大小子640×480像素点。 • To compare data storage and bandwidth requirements of different visual media (text,graphics,image),the following specifications are based on a typical window size of 640*480 pixels on a screen:

为表示文字，每个字符占用2个字节，这就满足了一些东方语言文字的变化。每一个字符通常以8×8像素点显示，这对ASCII字符已足够。为表示文字，每个字符占用2个字节，这就满足了一些东方语言文字的变化。每一个字符通常以8×8像素点显示，这对ASCII字符已足够。 • For the representation of text,two bytes are used for each character,allowing for some Asian language variants.Each character is displayed using 8*8pixels ,which is sufficient for the display of ASCII characters.

为表示矢量图形，典型的表态图像由500线组成[BHS91]，每条线由它的水平和垂直位置和一个8位属性值所定义。水平坐标用10位（log2（6400）），垂直坐标用品位（log2(480)）来表示。为表示矢量图形，典型的表态图像由500线组成[BHS91]，每条线由它的水平和垂直位置和一个8位属性值所定义。水平坐标用10位（log2（6400）），垂直坐标用品位（log2(480)）来表示。 • For the presentation of vector-graphics,a typical still image is composed of 500 lines[BHS91]. Each line is defined by its horizontal position ,vertical position and an 8-bit attribute field.The horizontal axis is represented using 10bits (log2(640)),and the vertical axis is coded using 9 bits (log2(480)).

在非常简单的彩色显示模式下，位图中的一个像素点可用256种不同的颜色表示；每个像素需要用一个字节。在非常简单的彩色显示模式下，位图中的一个像素点可用256种不同的颜色表示；每个像素需要用一个字节。 • In very simple color display modes ,a single pixel of a bitmap can be represented by 256 different colors;therefore,one byte per pixel is needed.

在以下例子中，描述了连续的媒体以及一种钟回放所需的存储空间。在以下例子中，描述了连续的媒体以及一种钟回放所需的存储空间。 • The next examples specify continuous media and derive the amount of storage required for one second of playback :

未经压缩的电话质量的音频信号以8kHz采样频率采样，每个采样点用时位表示。这就导致带宽的需求为64Kb／s，回放信号每秒的存储空间为64kb . • An uncompressed audio signal of telephone quality is sampled at 8kHz and quantized with 8 bits per sample .This leads to a bandwidth requirement of 64 kbits to store one second of playback.

未经压缩的CD质量的音频信号以44.1kHz采样频率采样，每个采样点用不16位表示；因此，一秒的回放需占用（44.1kH×16位=705.6×103位存储空间，带宽（吞吐量需求为705.5×103bs。未经压缩的CD质量的音频信号以44.1kHz采样频率采样，每个采样点用不16位表示；因此，一秒的回放需占用（44.1kH×16位=705.6×103位存储空间，带宽（吞吐量需求为705.5×103bs。 • An uncompressed stereo audio signal of CD quality is sampled at a rate of 44.1kHz and is quantized with 16 bits per sample ;hence,the storage requirement is (44.1kHz*16bits)=705.6*1000 bits to store one second of playback and the bandwidth (throughput )requirement is 705.5*1000 bits/second .

按照欧洲的PAL制式，视频信号每秒定义为625线，每秒25帧。亮度和色差信号分开编码，所产生的数字流采用复合技术（4:2:2）传输。遵照CCIR601（数字视频的制作标准），亮度信号Y以13.5MHz的频率采样，彩色信号（R-Y和B-Y）以6.75MHz的频率采样。按照欧洲的PAL制式，视频信号每秒定义为625线，每秒25帧。亮度和色差信号分开编码，所产生的数字流采用复合技术（4:2:2）传输。遵照CCIR601（数字视频的制作标准），亮度信号Y以13.5MHz的频率采样，彩色信号（R-Y和B-Y）以6.75MHz的频率采样。 • According to the European PAL standard,video is defined by 625 lines and 25 frames per second.The luminance and color difference signals are encoded separately.The resulting digital data streams are transformed using a multiplexing technique (4:2:2).Corresponding to CCIR 601(a studio standard for digital video),a sampling rate of 13.5MHz is used for luminance Y.The sampling rate for chrominance (R-Y and B-Y) is 6.75MHz.

如果每一采样点都以8位来编码，则带宽需求为（13.5MHz＋6.75MHz＋6.75MHz）×8位＝216×106b／s。HDTV双倍于上述行数，宽高比为16:9，那么与与相比，数据流量增加了5.33倍。如果每一采样点都以8位来编码，则带宽需求为（13.5MHz＋6.75MHz＋6.75MHz）×8位＝216×106b／s。HDTV双倍于上述行数，宽高比为16:9，那么与与相比，数据流量增加了5.33倍。 • If the result is a uniform 8-bits coding of each sample ,the bandwidth requirement is (13.5MHz+6.75MHz+6.75MHz)*8bits =216*1000000bits/second.HDTV doubles the number of lines and uses an aspect ratio of 16/9.This leads to data rate which increases by a factor of 5.33 compared to today’s TV rate .

在处理视频时，一个集成的多媒体系统处理未压缩的数据所需的二级存储空间起码在千兆字节左右，缓存则在兆字节左右。在不同系统间必须传输的吞吐量高达140Mb／s，以今天的技术或在不久将来采用可接受价格的硬件而言，这样的数据传输率都是不现实的。 • In the case of video ,processing uncompressed data streams in an integrated multimedia system leads to secondary storage requirements in the range of at least giga-bytes,and in the range of meg-bytes for buffer storage.The throughput in a between different systems.This kind of data transfer rate is not realizable with today’s technology,or in the near future with reasonably priced hardware.

在检索模式的应用中，存在下列需求： • 多媒体数据中快速信息搜索。 • 随机读取数据流中任意一幅图像和一帧音频，寻址时间小于0.5s，寻址时间应比一般频系统快并保存应用的交互特性。 • In a retrieval and mode application,the following demands arise: • Fast forward and backward retrieval with simultaneous display should be possible .This implies a fast search for information in multimedia databases. • Random access to single images and audio frames of a data stream should be possible ,making the access time less than 0.5 second.This access should be faster than a conventional CD audio system to maintain the interactive character of the application .

图像、视频或音频的解压应该与其他数据单元无关，这就保证了随机读取和编辑。图像、视频或音频的解压应该与其他数据单元无关，这就保证了随机读取和编辑。 • Decompression of images ,video or audio should be possible without a link to other data units.This allows random access and editing.

对于对话和检索两种模式，都有下列需求： 为了支持在不同系统中的视频信号的可缩放性，有必要定义独立的帧尺寸和视频帧率。 • 应该支持多种不同的音频和视频的传输率，通常这会导致图像质量的不同。因此，针对特定的状况，数据速率可以调整 • For both dialogue and retrieval mode ,the following requirements apply: • To support scalable video in different systems ,it is necessary to define a format independent of frame size and video frame rate. • Various audio and video data rates should be supported;usually this leads to different qualities.Thus,depending on specific system conditions,the data rates can be adjusted.

必须有可能使音频与视频数据同步，同样也可与其他媒体信号同步。必须有可能使音频与视频数据同步，同样也可与其他媒体信号同步。 • 为了实现经济电路VLSI芯片（对于高质量的解决方案）来实现。 • 一个多媒体系统上产生的信号在其他的系统上应该可以再生这些数据。压缩技术应该是兼容的。 • It must be possible to synchronize audio with video data ,as well as with other media . • To make an economical solution possible ,coding should be realized using software (for a cheap and low-quality solution)or VLSI chips (for a high-quality solution). • It should be possible to generate data on one multimedia system and reproduce these data on another system.The compression technique should be compatible .

源编码 .熵编码.编码 • 压缩技术可归于不同的类型。对于它们在多媒体系统中的使用，我们可以用源、熵和编码来分辨它们。熵编码是无损压缩，而源编码是有损压缩，大部分多媒体系统使用混合在一起。 • Compression techniques fit into different categories,as shown in Table 6.1 .For their use in multimedia systems,we can distinguish among entropy,source and hybrid encoding .Entropy encoding is a lossless process ,while source encoding is a lossy process.Most multimedia systems use hybrid techniques ,which are a combination of the two.

Table 6.1:A rough classification of coding/compression techniques in multimedia systems.

使用熵编码不考虑媒体的特殊性质。数据流的压缩被考虑成简单的数字序列，数据的相关性不予考虑。熵编码是一个无损的编码例子，因为压缩过程完全恢复了原数据。行程编码就是一个熵编码的例子，常被用作文件系统的数据压缩。使用熵编码不考虑媒体的特殊性质。数据流的压缩被考虑成简单的数字序列，数据的相关性不予考虑。熵编码是一个无损的编码例子，因为压缩过程完全恢复了原数据。行程编码就是一个熵编码的例子，常被用作文件系统的数据压缩。 • Entropy coding is used regardless of the media’s specific characteristics. The data stream to be compressed is considered is considered to be a simple digital sequence and the semantics of the data is ignored.Entropy encoding is an example of lossless encoding as the decompression process regenerates the data completely.Run-length coding is an example of entropy encoding that is used for data compression in file systems.

源编码考虑了数据的上下文关系。源编码所能达到的压缩率取决于数据内容。对于在损压缩技术，源数据流和编码后的数据流之间存在着单向关系，数据相似但不相同.源编码考虑了数据的上下文关系。源编码所能达到的压缩率取决于数据内容。对于在损压缩技术，源数据流和编码后的数据流之间存在着单向关系，数据相似但不相同. • Source coding takes into account the semantics of the data.The degree of compression that can be reached by source encoding depends on the data contents.In the case of lossy compression techniques,a one-way relation between the original data stream and the encoded data stream exists; the data streams are similar but not identical.

不同的源编码技术充分利用了特定媒体的特性，一个例子就是声音源编码，这里声音从时域转换为频域，接着是对声音特征谱进行编码（见第3章），这一转换实际上使数据量大为减少。 • Different source encoding techniques make extensive use of the characteristics of the specific medium.An example is the sound source coding ,where sound is transformed from time-dependent to frequency-dependent sound concatenations,followed by the encoding of the formants (see Chapter3-Speech Generation).This transformation substantially reduces the amount of data.

一些基本压缩技术 • 采样后的图像、音频和视频数据流经常包含相同字节的序列。只要将该字节出现的次数取代这些重复的字节，就可以大量地减少数据，这就是所谓的“行程编码”，它通过一个不在数据流本身出现的标志字来表示，这一标志字节也可以通过使用不同于压缩数据流中的确个字节的某个字节来实现。 • Sampled images,audio and video data streams often contain sequences of the same bytes.By replacing these repeated byte sequences with the number of occurrences,a substantial reduction of data can be achieved.This is called run-length coding,which is indicated by a special flag that does not occur as a part of the data stream itself.This flag byte can also be realized by using any other of the 255 different bytes in the compressed data stream.

例如，我们定义一惊汉号“！”这特殊标志字。当解压缩时，单独出现这个惊叹时就被解释为一特殊标志字。两个连续的惊叹号被解释为在数据中出现的惊汉号。全部的行程编码过程可以描述如下：当一个字节连续出现4次，就要对重复次数，从而允许把4∽259个字节压缩3个字节。例如，我们定义一惊汉号“！”这特殊标志字。当解压缩时，单独出现这个惊叹时就被解释为一特殊标志字。两个连续的惊叹号被解释为在数据中出现的惊汉号。全部的行程编码过程可以描述如下：当一个字节连续出现4次，就要对重复次数，从而允许把4∽259个字节压缩3个字节。 • To illustrate such a byte-stuffing ,we define the exclamation mark”!” as a special flag.A single occurrence of this exclamation flag is interpreted as a special flag during decompression.Two consecutive exclamation flags are interpreted as an exclamation mark occurring within the data.The overall run-length coding procedure can be described as follows:if a byte occurs at least four consecutive times,the number of occurrences is counted.The compressed data contains this byte followed by the special flag and the number of its occurrences.This allows the compression of between 4 and 259 bytes into three bytes number of occurrences can start with an offset of 4.

值得注意的是，所压缩的图像数据至少有４个连续重复字节，这样实际重复次数为记数值加上偏移量（4）。并且重复的例子中，字符“C”连续出现8次，被“压缩”成3个字节，“C！8”：值得注意的是，所压缩的图像数据至少有４个连续重复字节，这样实际重复次数为记数值加上偏移量（4）。并且重复的例子中，字符“C”连续出现8次，被“压缩”成3个字节，“C！8”： • 未压缩前数据：ABCCCCCCCCDFFGGG • 行程编码：ABC！8DEFGGG • Depending on the algorithm,one or more bytes can be used to indicate the length.In the following example,the character”c”occurs 8 consecutive times and is “compressed”to 3 characters”C!8”: • Uncompressed data:ABCCCCCCCCDEFGGG • Run-length coded: ABC!8DEFGGG

行程编码是一种零抑制方法，它假设在序列中仅有一个符号经常出现。出现在文本中的空格（空符号 空格）就是这样的符号；单个空格或一对空格都被忽略。以3个空格开始的序列，它们被一个M字节（M-字节与先前的惊叹符号有着相同的功能）和说明序列中空格数量的字节所代替。 • Run-length encoding is a generalization of zero suppression,which assumes that just one symbol appears particularly often in sequences.The blank (null character-space )in text is such a symbol;single blanks or pairs of blanks are ignored.Starting with a sequence of three blanks,they are replaced by an M-byte has the same function as the exclamation mark before )and a byte that specifies the number of blanks of this sequence.

3到最大革命58个字节的序列可以仅用功个字节。重复出现的次数在表示时带上一个值为－3的偏移量（原因是默认这个序列已有3个空格）。更进一步的变化就是用来替代指定个数零（或空格）的不同的M-字节的定义。标志M4-字节的M5-字节可以代表性4个零字节。3到最大革命58个字节的序列可以仅用功个字节。重复出现的次数在表示时带上一个值为－3的偏移量（原因是默认这个序列已有3个空格）。更进一步的变化就是用来替代指定个数零（或空格）的不同的M-字节的定义。标志M4-字节的M5-字节可以代表性4个零字节。 • The number of occurrences can be indicated with an offset of –3(because three blanks are being suppressed).Further variations are tabulators used to substitute a specific number of zeros (or blanks). The flag M-4 could replace 8 zero bytes, and another M5-byte could substitute a sequence of 16 zero bytes.An M5-byte followed by an M4-byte would represent 24 zero bytes.

JPEG 技术 • JPEG可应用于彩色和灰度表态图像[LOW91，Mpal91]。静态图像的快速编码和解码同样可应用于视频序列，即众所财知的动态Motion LPEG。今天，早已有单独的JPEG，软件包或硬件。 • JPEG applies to color and gray-scaled still images [LOW91,MP91,Wal91].A fast coding and decoding of still images is also used for video sequences known as Motion JPEG.Today,parts of JPEG are already available as software-only packages or together with specific hardware support.

JPEG的实现应该与图像的尺寸无关。 • JPEG应该可以应用于任何图像和像素宽高比。 • 色彩的表示应该与实现方法无关。 • 图像内容可以任意复杂，并可以有任意统计特性 • The JPEG implementation should be independent of image size . • The JPEG implementation should be applicable to any image and pixel aspect ratio. • Color representation itself should be independent of the special implementation . • Image content may be of any complexity,with any statistical characteristics.

考虑到压缩因子和完成的图像质量，JPEG标准的说明应该非常完善或接近完善。考虑到压缩因子和完成的图像质量，JPEG标准的说明应该非常完善或接近完善。 • 处理的复杂度必须允许尽可能多地在标准处理器上以软件方式运行，另外，使用专用的硬件应该能增强图像质量。 • The JPEG standard specification should be state-of-the-art (or near)regarding the compression factor and achieved image quality. • Processing complexity must permit a software solution to run on as many available standard processors as possible .Additionally,the use of specialized hardware should substantially enhance image quality.

支持顺序解码（一行一行地）和渐近解码（改时整幅图像）。对同一幅图像的类似Photo-CD的以多分辨率的、无损和层次化编码也应得到支持。支持顺序解码（一行一行地）和渐近解码（改时整幅图像）。对同一幅图像的类似Photo-CD的以多分辨率的、无损和层次化编码也应得到支持。 • 图6.3概括了对应于6.1所有过程的JPEG压缩步骤，图像压缩过程中的个不同变量导致有4种模式。每个模式又饮食以下组合： • Sequential decoding (line-by-line) and progressive decoding (refinement of the whole image)should be possible.A lossless,hierarchical coding of the same image with different resolutions similar to Photo-CD images should be supported. • Figure 6.3 outlines the steps of JPEG compression in accordance with the overall scheme shown in Figure 6.1. Four different variants of image compression can be determined that lead to four modes.Each mode itself includes further combinations:

Entropy Encoding Picture Preparation Picture Processing Run-length Quanti-zation Pixel Predictor Huffman FDCT Arithmetic Block,MCU Figure 6.3: Steps of the JPEG compression process

所有JPEG的实现必须支持模式有损的、顺序的和基于DCT的模式（基本处理过程）。所有JPEG的实现必须支持模式有损的、顺序的和基于DCT的模式（基本处理过程）。 • 扩展的有损的基于DCT模式提供了一系列对基本处理过程的增强模式。 • 无损模式压缩比低，允许完美重现图像。 • 层次化模式允许图像有不同分辨率并能从上述3种模式中选择其相应算法。 • The lossy sequential DCT-based mode (baseline process)must be supported by every JPEG implementation. • The expanded lossy DCT-based mode provides a set of further enhancements to the baseline process. • The lossless mode has a low compression ratio that allows perfect reconstruction of the original image. • The hierarchical mode accommodates images of different resolutions and selects its algorithms from the three modes defined above.

基本处理过程采用下列技术：MCU、FDCT、行程编码和哈夫曼编码。本节将详细地解释这些技术和其他相关技术，下一节给出了所有模式的图像准备方法，并描述了剩下的图像处理、量化和熵编码过程。基本处理过程采用下列技术：MCU、FDCT、行程编码和哈夫曼编码。本节将详细地解释这些技术和其他相关技术，下一节给出了所有模式的图像准备方法，并描述了剩下的图像处理、量化和熵编码过程。 • The baseline process takes the following techniques:Block,MCU,FDCT,Run-length and Huffman,which are explained with the other modes in more detail in this section.In the next section ,image preparation ,for all modes is presented;the remaining steps of image processing ,quantization and entropy encoding are described.

H.261(P*64)技术 • H.261视频编码标准是为ISDN而设计的，ISDN连接的两个B通道可以用来传输视频和音频数据。这就意味着通过B通道相连的用户使用相同“codec”来处理视频信号，这里“codec”意思是编码器，即编码／解码和压缩／解压缩。 • The driving force behind the H.261(px64) video coding standard is ISDN.The two B-channels of an ISDN connection (or part of them) can be used to transfer video in addition to audio data.This implies that both users connected via the B-channel have to use the same codec for video signals.Note that codec means encoder and decoder,I.e.,encoding and decoding,compression and decompression .

在ISDN连接的情况下，用户可以使用两个B通道和1个D通道。欧洲ISDN体系允许有30个B通道，原来是为PABX设计的。这里，我们用B通道来指定一个或多个ISDN通道，ISDN的主要应用是可视电话与视频会议。对于这些对话式应用，编码与解码必须实时完成。在ISDN连接的情况下，用户可以使用两个B通道和1个D通道。欧洲ISDN体系允许有30个B通道，原来是为PABX设计的。这里，我们用B通道来指定一个或多个ISDN通道，ISDN的主要应用是可视电话与视频会议。对于这些对话式应用，编码与解码必须实时完成。 • In the case of an ISDN connection ,exactly two B-channel are available at the user interface.The European ISDN hierarchy allows a connection with 30 B-channels,which were originally intended for PABX.Here,we use B-channels to specify one or more ISDN channels.The prime considered ISDN applications were videophone and video conferencing systems.For these dialogue applications ,coding and decoding must be carried out be carried out in real-time .

CCITT的H.261标准是针对实时编码与解码而设计的，压缩和解压缩的信号延迟不超过150ms。如果端到端的延迟过长，使用该技术的应用将受到很大的影响。CCITT的H.261标准是针对实时编码与解码而设计的，压缩和解压缩的信号延迟不超过150ms。如果端到端的延迟过长，使用该技术的应用将受到很大的影响。 • The CCITT recommendation H.261 was developed for real-time processing of encoding and decoding.The maximum signal delay of both compression and decompression must not exceed 150 milliseconds.If the end-to-end delay is too long,an application using this technology will be affected considerably.

MPEG 标准 • MPEC标准由ISO/IEC JTC1/SC 29/WG11 根据ISO/IEC标准化过程来解决视频/音频编码。考虑到CD大容量数据存储技术的发展，MPEG使用1.2Mb/s传输率，它是典型的CD-ROM传输率。MPEG可以达到最大1856000b/s的传输率。音频数据流在32∽44Kb/s之间，这个传输率可以让视频/音频达到可以接受的质量。 • The MPEG standard was developed by ISO/IEC JTC1/SC29/WG11 to cover motion video as well as audio coding according to the ISO/IEC standardization process.Considering the state of the art in CD-technology digital mass storage,MPEG strives for a data stream compression rate of about 1.2 Mbits/second,which is today’s typical CD-ROM data transfer rate.MPEG can deliver a data rate of at most 1856000 bits/second,which should not be exceeded[ISO93a].Data rates for audio are between 32 and 448 Kbits/second;this data rate enables video and audio compression of acceptable quality.

MPEG标准考虑了以下标准的功能： • JPEG：由于视频序列可以被认为是静止图像的序列，而且JPEG标准开发的早，MPEG利用了JPEG。 • H.261：由于在制定MPEG标准时，H.261标准已经确立，MPEG力求与H.261兼容。 • The MPEG standard explicitly considers functionalities of other standards: • JPEG. Since a video sequence can be regarded as a sequence of still images,and the JPEG standard development was always ahead of the MPEG standard,the MPEG standard makes use of JPEG. • H.261.Since the H.261 standard was already available during the work on the MPEG standard,the working group strived for compatibility with this standard.

MPEG不仅适用于对称压缩，而且也适用于非对称压缩。非对称压缩中，压缩时所花的代价比解压缩时大，往往是压缩一次，而解压缩多次，典型的应用是检索系统。对称压缩在压缩与解压缩时花同样的代价。因为有严格的端到端延时限制交互式对话系统利用了该压缩（编码）技术。 MPEG不仅适用于对称压缩，而且也适用于非对称压缩。非对称压缩中，压缩时所花的代价比解压缩时大，往往是压缩一次，而解压缩多次，典型的应用是检索系统。对称压缩在压缩与解压缩时花同样的代价。因为有严格的端到端延时限制交互式对话系统利用了该压缩（编码）技术。 • MPEG is suitable for symmetric as well as asymmetric compression.Asymmetric compression requires more effort for coding than for decoding .Compression is carried out once ,whereas decompression is performed many times . A typical application area is retrieval systems.Symmetric compression is known to expect equal effort for the compression and decompression processes.Interactive dialogue applications make use of this encoding technique ,where a restricted end-to-end delay is required.

数据流 音频流 • MPEG指定交错存放音/视频数据流的语法，声音流由帧组成，并被分成声音访问单元，每个声音访问单元由槽组成。在第一层中，每个音槽由4个字节组成，在其他层，由不1个字节组成，它是可以被独立解码的最小单元. • MPEG specifies a syntax for the interleaved audio and video data streams.An audio data stream consists of frames,which are divided into audio access units.Each audio access unit is composed of slots.At the lowest complexity(layer 1),a slot consists of four bytes.In any other layer,it consists of one byte.A frame always consists of a fixed number of samples .Most important is the audio access unit,which is the smallest possible audio sequence of compressed data that can be completely decoded independent of all other data.

音频访问单元在48kHz下播放8ms，在44.1kHz下播放 8.7ms，在立体声信号情况下，两个通道的数据合并成一帧。 • The audio access units of one frame lead to a playing time of 8 milliseconds at 48 kHz,of 8.7 milliseconds at 44.1kHz ,and 12milliseconds at 32kHz.In the case of stereo signals,data from both channels are merged into one frame.

视频流 • 视频流有6层：（1）在最高层，即序列层，数据缓冲被处理。数据流在存储空间上仅需较小空间，因此，在序列层的开始，有如下两个条目：序列的恒定位率与解码所需的空间。在处理时，视频缓冲检验器插入在量化器后面，所得数据传输率被用来验证解码所引起的延迟。 A video data stream is comprised of six layers: （1）At the highest level,the sequence layer,data buffering is handled.A data stream should have low requirements in terms of storage capacity.For this reason,at the beginning of the sequence layer there are the following two entries:the constant bit rate of the sequence and the storage capacity that is needed for decoding.In the processing scheme,a video-buffer-verifier is inserted after the quantizer.The resulting data rate is used to verify the delay caused by decoding.

视频缓冲验检器影响量化器并形成一种控制循环，几个相继的序列可能会有可变的数据率。在对几个紧跟的序列进行解码时，在一个序列的结束与另一个序列的开始之间并没有直接关系，重新设置解码器的参数并开始初始化工作。视频缓冲验检器影响量化器并形成一种控制循环，几个相继的序列可能会有可变的数据率。在对几个紧跟的序列进行解码时，在一个序列的结束与另一个序列的开始之间并没有直接关系，重新设置解码器的参数并开始初始化工作。 • The video-buffer-verifier influences the quantizer and forms a kind of control loop.Several successive sequences could have a varying data rate.During decoding of several immediately following sequences there is no direct relationship between the end of one sequence and the beginning of the next one.The basic parameters of the decoder are ser again and an initialization is executed at this time.

（2）图片群层是第二层。该层至少有一个I帧（作为第一帧），可以随机访问。在流层中，可以区分数据流中的图像顺序，数据的第一帧图像总是I帧，解码器首先解码并存储该参考帧，在播放时，B帧可以在I帧之前出现。（2）图片群层是第二层。该层至少有一个I帧（作为第一帧），可以随机访问。在流层中，可以区分数据流中的图像顺序，数据的第一帧图像总是I帧，解码器首先解码并存储该参考帧，在播放时，B帧可以在I帧之前出现。 • The group of pictures layer is the next layer .This layer consists of a minimum of one I-frame,which is the first frame.Random access to this image is always possible.At this layer ,it is possible to distinguish the order of images in a data stream and during display.The first image of a data stream always has to be an I-frame.Therefore,the decoder decodes and stores the reference frame first.In the order of display,a B-frame can occur before an I-frame. .

3）图像层包含了整个图像。时域睥参考由图像序列号决定。在这层中定义了MPEG没有用到的数据，解码器不允许使用这些数据，它们是为了可以扩充准备的。3）图像层包含了整个图像。时域睥参考由图像序列号决定。在这层中定义了MPEG没有用到的数据，解码器不允许使用这些数据，它们是为了可以扩充准备的。 • The picture layer contains a whole picture.The temporal reference is defined by an image number.Note that there are data fields defined in this layer which are not yet used in MPEG. The decoder is not allowed to used these data fields ,as they are designated for future extensions

下一层是片层，每片由一些宏块组成，它们在不同帧里并不相同，而且，每个宏块的DCT量化表被指定。下一层是片层，每片由一些宏块组成，它们在不同帧里并不相同，而且，每个宏块的DCT量化表被指定。 • 第五层是宏块层，它包含了每个宏块的特征。 • 最低层是块层（见上面介绍）。 • The next layer is the slice layer .Each slice consists of a number of macro blocks that may vary from one image to the next.Additionally,the DCT quantization of each macro block of a slice is specified. • The fifth layer is the macro block layer.It contains the sum of the features of each macro block as described above. • The lowest layer is the block layer (described above).

MPEG-2 技术 • MPEG组织制定了MPEG-2视频标准（MPEG-2 Video），它为高质量数字视频指定了数据流。作为一种兼容扩展，MPEG-2视频建立在MPEG-1标准之上，支持交错视频格式及大量先进性，包括HDTV。 • The MPEG group developed the MPEG-2 video standard,which specifies the coded bit stream for high-quality digital video.As a compatible extension ,MPEG-2 video builds upon the completed MPEG-1 standard by supporting interlaced video formats and a number of other advanced features,including those to support HDTV.

MPEG-2使用了与JPEG层次结构类似的结构，该层次压缩运动图像的不同缩放级组成，即视频在不同的质量等级上编码，缩放可以对以下不同的参数起作用。MPEG-2使用了与JPEG层次结构类似的结构，该层次压缩运动图像的不同缩放级组成，即视频在不同的质量等级上编码，缩放可以对以下不同的参数起作用。 • MPEG-2 considers a structure similar to that of the hierarchical mode of JPEG.The hierarchy consists of the compressed motion images scaling ,I.e.,video is encoded at different qualities [Lip91,GV92].The scaling may act on the following different parameters.

MPEG-4 技术 • MPEG工作组的另一项计划是开发甚低位率的音频视频编码方法，这项工作开始于1993年9月，在1995到1996年间形成草案。 • 该工作需开发完全不同的新算法，包括基于模型的图像编码（用于多媒体系统中的人机交互）及用在欧洲全球移动电话系统中的低位率音频编码。 • Work on another MPEGinitiative for very low bit rate coding of audio-visual programs started in September 1993 at ISO/IEC JTC1.It is scheduled to achieve CD status in 1995 or 1996. • This work will require the development of fundamentally new algorithmic techniques,including model-based image coding of human interaction with multimedia environments ,and low-rate speech coding for use in environments like the European Mobile Telephony System(GSM).

数字视频交互技术 • 交互数字视频（DVI）是包括编码算法的技术，其基本成分包括：VLSI芯片（专为视频子系统设计）、视频/音频数据格式、音频/视频核心（AVK）的应用用户接口、压缩及解压缩算法。在这一节中，我们重点介绍其中的压缩解压缩技术。DVI可以处理数据、文字、图形、图像、音频与视频。原先的特征是视频非对称压缩／解压缩，即表示层视频。 • Digital Video Interactive (DVI) is a technology that includes coding algorithms. The fundamental components are a VLSI chip set for the video subsystem,a well-specified data format for audio and video files,an application user interface to the audio-visual kernel (AVK,the kernel software interface to the DVI hardware) and compression ,as well as decompression ,algorithms [HKL+91,Lut91,Rip89].In this section,we will concentrate mainly on compression and decompression.DVI can process data ,text,graphics,still images,video and audio.The original essential characteristic was the asymmetric technique of video compression and decompression known as Presentation-Level Video(PLV).

未经压缩的图形、视频和音频数据需要非常可观的存储容量，即便使用光盘存储技术，未压缩过去时视频常常是不实用的。

未经压缩的图形、视频和音频数据需要非常可观的存储容量，即便使用光盘存储技术，未压缩过去时视频常常是不实用的。

Presentation Transcript