Distance-Reciprocal Distortion Measure for Binary Document Images

Distance-Reciprocal Distortion Measure for Binary Document Images IEEE Signal Processing Letters, vol.11,No.2, Feb. 2004

I. Introduction • It is inevitable to introduce visible artifacts when a Binary Image has been watermark embedded or data hided. Therefore, an effective visual distortion measure is a must for performance comparison or evaluation of such an application.

Subjective measure: • costly but important since a human is the ultimate viewer! • Objective measure: • repeatable and easier to implement. • Objective measure does not always agree with the subjective one • Objective measure based on Human Visual Perception Model is the preferred one.

g(x,y) f(x,y) Processing • The peak signal-to-noise ratio (PSNR) is a popular distortion measure used in image and video processing. (1) • Where M and N are the dimensions of the image, and P is the maximum peak-to-peak signal swing, e.g., P=255 for 8-bit image

For Binary Images, PSNR does not match well with subjective assessment, since it is a point-based measurement, and mutual relations between pixels are not taken into account

The perception of distortion in (document) binary images is very different from that in natural images • In a particular “Language”, such as English, people know very well what a certain alphabetic character should look like. •  • (i) Distortion in document images could be more obtrusive than distortion in natural images • (ii) The distortion measures proposed for color/gray-level images are not often applicable to binary images

II. Distance Reciprocal Distance Measure • A number of single-letter images are used to study distortion in binary document images. • Each single-letter image is converted from a letter typed in Microsoft Word with a font size of 10 or 12, including both uppercase and lowercase, using Adobe Acrobat 5.0 with a resolution of 150 dots per inch (dpi).

Observation • For a binary document image, the distance between two pixels play a major role in their mutual interference perceived by human eyes. • Since readers are so familiar with alphabetic characters that even a single-pixel distortion can be perceived easily. • the main factor in distortion perception is focusing, i.e., whether the distortion is in a viewer’s focus.

The distortion (flipping) of one pixel is more visible when the field of view of the pixel is in focus. • The nearer the two pixels are, the more sensitive it is to change one pixel when focusing on the other. • Further, from a magnified viewing, each pixel is essentially a black or white square. Therefore, a diagonal neighbor pixel is considered to be further away from a pixel in focus than a horizontal or vertical neighbor one. • Hence, diagonal neighbors have less effect on a center pixel in focus than horizontal or vertical neighbors.

The distortion of a processed image g(x,y) compared with the original image f(x,y) is measured by using a weighted matrix with each of its weights determined by the reciprocal of a distance measured from the center pixel.  Distance-Reciprocal Distortion Measure. • Assume the weight matrix Wmis of size mxm, where m = 2n+1 and . The center element of this matrix is at where

(2) • The normalized weight matrix is defined as (3)

Suppose that there are S flipped pixels in g(x,y). Each pixel will have a distance-reciprocal distortion DRDk , k=1,2,…,S. • For the k-th flipped pixel at (x,y)k in the output image g(x,y), the resulted distortion is calculated from an mxm block Bk in f(x,y) that is centered at (x,y)k . • The Distortion DRDk measured for this flipped pixel is given by (4)

Where the (i,j)-th element of the different matrix Dk is given by (5) • Thus, DRDk equals to the weighted sum of the pixels in the block Bk of the original image that differ from the flipped pixel in the processed image.

The pixel does not contribute directly to DRDk ,since its weight is always zero. • For the possibly flipped pixels near the image edge or corner, where an mxm neighborhood may not exist, it is possible to expand the rest of the mxm neighborhood with the same value as which is equivalent to just ignoring the rest of the neighbors.

After walks over all the S flipped pixel positions, we sum the distortion as seen from each flipped pixel visited to get the distortion in g(x,y) as (6)

Where NUBN is to estimate the valid (nonempty) area in the image and it is defined as the number of non-uniform (not all black or white pixels) 8x8 blocks in f(x,y) • The total pixel numbers MxN is not used in the denominator because uniform areas (e.g., all white pixel blocks) are common in binary document images and they may have a significant effect on the distortion value if used.

III. Experimental Results • The original image is of size 198x109 and there are 122 (≈39%) non-uniform 8x8 blocks out of total 312 blocks

The design criteria for generating independent test images: • the number of flipped pixels is the same in each test image, test image generated through this distortion generator should have a wide variety in term of “how noticeable the flipping is” • There are 1763 black pixels in Fig.2. • We flip 40 pixels in the original image with some randomness to generate the test images with various amount of visual distortion.

1) The positions of all 1763 black pixels are recorded in a 2x1763 matrix. • 2) Forty black pixels out of 1763 are randomly chosen using a random number generator with uniform distribution. • 3) For each black pixel, one pixel is flipped in its neighboring area. As shown in Fig. 3, the pixel to be flipped is randomly selected from the Band 1 pixel (the black pixel itself), or eight Band 2 pixels, or 16 Band 3 pixels, with probability of p1, p2, and p3 , respectively, and p1+p2+p3=1. For the Band 2 and Band 3 pixels, one neighbor is randomly chosen among the band pixels.

4) A total of 60 000 test images are generated in the experiment by running the generator 10,000 times for each of the six sets of p1, p2, and p3 , with p3=0, 0.2, 0.4, 0.6, 0.18 and 1, p1=(1- p3)/10, and p2=9 ·p1. • 5) The images generated with the number of flipped pixels less than 40 are ignored. That is, the cases where at least one pixel is flipped more than once are dropped.

Since all the test images are generated from the same original image and they have the same number of flipped pixels, they have the same PSNR of 27.33 dB according to (1). • One set of the test images generated is shown in Fig.4.

Next, we divided all the test images generated into four groups according to the DRD values computed, with group 1 having smallest values, group 4 having largest values, and so on. • The subjective assessment is done by 60 observers. Each observer is given the original image and four sets of test images, which are printed on a piece of 80 GSM quality paper using a HP LaserJet 4100 printer.

Each set of test images consists of four test images randomly chosen from the four groups. The observers are asked to rank the visual quality of the four images in each set according to the visual distortion that he or she perceives when he or she views the images at a comfortable distance under normal indoor lighting conditions in labs. • A smaller ranking score indicates less distortion. There are four rankings (1, 2, 3, and 4) with score 1 for the least distortion and 4 for the most distortion perceived.

The ranking scores collected from the 60 observers are analyzed and compared with the rankings according to the average DRD values (with m=5), as shown in Table II. • Although the PSNR is the same for all the test images, the DRD values obtained are different for different distorted images and their average values for the 4 groups have a normalized correlation of 0.964 with the mean subjective rankings, indicating a very good match between our objective measure and the subjective evaluation.

The distribution of the subjective ranking scores for each group is shown in a subfigure in Fig.5. • In each subfigure, the abscissa represents 4 ranking scores (1, 2, 3, and 4), and the ordinate shows the counts of the corresponding ranking scores given by the 60 human evaluators. • Since each of the 60 observers is given 4 sets of test images, there are 240 scores in total for each group.

Distance-Reciprocal Distortion Measure for Binary Document Images