Fast Class Rendering Using Multiresolution Classification in Discrete Cosine Transform Domain

Fast Class Rendering Using Multiresolution Classification inDiscrete Cosine Transform Domain Presented by Li-Jen Kao July, 2005

Outline • Introduction • Feature Extraction • Classification Scheme • Experimental Results • Conclusion

1 Introduction • Classification of objects (or patterns) into a number of predefined classes has been extensively studied in wide variety of applications such as • optical character recognition (OCR) • speech recognition • face recognition • We may consider the design of classification systems in terms of two subproblems: • feature extraction • classification.

Feature extraction: • Features are functions of the measurements performed on a class of objects • It has not found a general solution in most applications. • Our purpose is to design a general classification scheme, which is less dependent on domain-specific knowledge. • Reliable and general features are required

Discrete Cosine Transform (DCT) • It helps separate an image into parts of differing importance with respect to the image's visual quality. • Due to the energy compacting property of DCT, much of the signal energy has a tendency to lie at low frequencies.

Four advantages in applying DCT • The features extracted by DCT are general and reliable. It can be applied to most of the vision-oriented applications. • The amount of data to be stored can be reduced tremendously. • Multiresolution classification and progressive matching can be achieved by nature. • The DCT is scale-invariant and less sensitive to noise and distortion.

Two philosophies of classification • Statistical • the measurements that describe an object are treated only formally as statistical variables, neglecting their “meaning • Structural • regards objects as compositions of structural units, usually called primitives.

2 Feature Extraction via DCT • The DCT coefficients C(u, v) of an N×N image represented by x(i, j) can be defined as where

Figure 1. The DCT coefficients of the character image “為”.

Figure 2. Illustratation of the multiresolution ability of DCT (a) (b) (c) (d) (a) The original image of size 48×48; (b) The reconstructed image of size 8×8; (c) The reconstructed image of size 16×16; (d) The reconstructed image of size 32×32.

3. The Proposed Classification Scheme • The ultimate goal of classification is to classify an unknown pattern x to one of M possible classes (c1, c2,…, cM). • Each pattern is represented by a set of D features, viewed as a D-dimensional feature vector.

3.1. Our classification model • In the training mode: • the feature extraction module finds the appropriate features for representing the input patterns, and the classifier is trained. • In the classification mode: • the trained classifier assigns the input pattern to one of the pattern classes based on the measured features.

To alleviate the burden of classification process, the process is usually divided into two stages: • Coarse Classification • Fine Classification

Figure 3. Model for multiresolution classification

3.2. Coarse classification module • In the training mode: • The features of each training sample are first extracted by DCT and quantized. • Then the most D significant quantized DCT features of each training sample are transformed to a code, called grid code (GC), which corresponds to a grid of feature space partitioned by the quantization method. • The training samples with the same GC are similar and can be classified into a coarse class. • Therefore, the information about all possible GCs is gathered in the training mode.

In the classification mode: • The classes with the same GC as that of the test sample are chosen as the candidates of the test sample.

3.2.1. Quantization • The 2-D DCT coefficient F(u,v) is quantized to F’(u,v) according to the following equation: • Most of the high frequency coefficients will be quantized to zero and only the most significant coefficients will be retained.

3.2.2. Grid Code Transformation • After the quantization process, the most D significant quantized DCT features of sample Oi are obtained, say [qi1, qi2, .., qiD]. • The significance of each DCT coefficient is decided according to the following zigzag order: F(0,0), F(0,1), F(1,0), F(2,0), F(1,1), F(0,2), F(0,3), F(1,2), F(2,1), F(3,0), F(3,1),…, and so on. • Because the value of qij may be negative, for the ease of operation, we transform qij to positive integer dij by adding a number, say kj, to qij. • In this way, object Oi can be transformed to a D-digit GC. • This process is called the grid code transformation (GCT).

3.2.3. Grid Code Sorting and Elimination • After the GCT, we obtain a list of triplets (Ti, Ci, GCi) • Ti is the ID of a training sample • Ci is the Class ID the training sample belongs to • GCi is the grid code of the training sample. • Then the list is sorted according to the GC ascendingly. • Given the GC of a test sample, we can get a list of candidate classes of the same GC for the test sample.

Elimination of Redundancy • Redundancy occurs as the training samples belonging to the same class have the same GC. • This redundancy can be eliminated by establishing an abstract lookup table that only contains the information about the GCs and their corresponding classes. • Then, given a GC, this table can tell the relevant classes very quickly by binary search.

3.3. The fine classification module • Progressive matching method • Adding more DCT coefficients usually imply increasing the resolution level of an image. • If current resolution is not high enough to distinguish one character from the others, we have to raise the level of resolution such that the discrimination power can also be improved. • The establishment of the templates for each class • Templates are established in the DCT domain. The average DCT coefficients of size N×N are obtained from the set of training samples with respect to the class. • Such that M sets of average DCT coefficients are obtained and served as the templates for each class.

The sum of squared differences (SSD) is used as the matching criterion. • The matching of x and Ti is decomposed into K iterations, each of which corresponds to the matching under the block of size nk×nk. • After the kth iteration, the block size is enlarged from nk×nk to nk+1×nk+1 (nk+1= nk+d). • The process is repeated until one of the stop criterions is satisfied: • 1) to preserve enough signal energy in the block, and • 2) to reject unqualified classes as soon as possible.

4 Experimental Results • 18600 samples (about 640 categories) are extracted from Kin-Guan (金剛) bible. • Each character image was transformed into a 48×48 bitmap. • 1000 of the 18600 samples are used for testing and the others are used for training. • The most D significant DCT coefficients were quantized and transformed to a GC for each sample.

Figure 3. Reduction and accuracy rate using our coarse classification scheme

Figure 4. Accuracy rate using both coarse and fine classification

6 Conclusions • This paper presents a multiresolution classification scheme based on DCT for vision-based applications. • The DCT features of a pattern can be extracted progressively according to their significance. • On classifying an unknown object, most of the improbable candidate classes for the object can be eliminated at lower resolution levels. • Experiments were conducted for recognizing handwritten characters in Chinese palaeography and showed that our approach performs well in this application domain.

Future Works • Since only preliminary experiment has been made to test our approach, a lot of works should be done to improve this system. • For example, since features of different types complement one another in classification performance, by using different types of vision-oriented features simultaneously, classification accuracy could be improved.

Fast Class Rendering Using Multiresolution Classification in Discrete Cosine Transform Domain