資訊所洪詡淮 P76994610

Image and Vision Computing 25 (2007) 1802–1813 Deformation tolerant generalized Hough transform for sketch-based image retrieval in complex scenes M. Anelli, L. Cinque, Enver Sangineto 資訊所洪詡淮 P76994610

Outline • 1.Introduction • 2. Methods • 3. Results • 4.Conclusion

Introduction

Introduction(1/4) • In the last 12–15 years the availability of digital visual information has grown very quickly. • Content Based Image Retrieval (CBIR) is aresearch area whose aim is the development of tools forretrieval of visual information using its perceptual content.

Introduction(2/4) • In Image Retrieval by Sketch the query is a stylizedsketch drawn by the user in order to specifythe shape features she is interested to find in the imageswithin the system’s database. • The issue of inexact matching between the sketch and the images and the issue of segmentation are the two main problems which a sketch-based image retrieval system has to deal with.

Introduction(3/4) • Most of the methods and techniques for shape-based image retrieval can be classified in three main categories: • statistical techniques • deformable template matching • multiscale representations

Introduction(4/4) • modified the GHT • First of all, we spread the voting result in order to deal with small local deformationswithout increasing the whole asymptotic computationalspace and time complexity. • Moreover, once the most likelyposition of the sketch in the image has been localized usingthe votes in the accumulator, shape segmentation is furtherverified.

Methods

前處理 • Canny edge detection • The first filter aims at deleting edge pixels surrounded by a disordered and thick texture. • The second filter deals with ordered textures (e.g., a sheaf of parallel lines).

The first filter • C(p) : a square mask of n 1 x n 1 pixels centered at pixel p. • N : the number of edge pixels in C(p). • φ(p’) : the gradient direction of a generic edge pixel p’ • we cancel the edge pixel p from the edge map if:N > θ1Λσ2> θ2, where θ1 and θ2 are two pre-fixed thresholds.(n 1 = 40, θ1 = 260, θ2 = 0.165)

The second filter • Let N’ bethe number of edge pixels p’ belonging to the maskD(p)n2 x n2 and such that φ(p’) = φ(p’). • We cancel p if N’ > θ3(n2 = 20, θ3 = 120).

前處理後的邊緣圖 • From now on we will denote with I the edge map of the currently analyzed image of the system’s database after the salience filter application.

Generalized Hough Transform(GHT) • GHT分成兩個階段，分別是樣版(template)建立階段和形狀偵測階段。 • I.樣版建立階段演算法：建立 R-Table • Step 1：決定位於樣版形狀內部的一個參考點─通常是重心點(Xc, Yc) • Step 2 ：建立一個空的R-Table，其索引值為角度Φi , i=1,2,K, 其角度增量為π/K ，亦即Φ從0變化到180度 • Step 3 ：針對每一個邊緣點 (X,Y) ，計算 (r,α) 參數

Generalized Hough Transform(GHT) • Step 4：計算φ (切線方向)，並將 (r,α) 加入與 φ 值最接近的Φi • Step 5：重複Step 4和5，直到所有的邊緣點都完成測試，得到R-table

Generalized Hough Transform(GHT) • II.形狀偵測階段 • Step 1、建立一個2D Hough table H(xc, yc)，將其初值全設為0 • Step2、針對每一個邊緣點 (x,y) ，計算切線方向的夾角 φ’ • Step 3、在R-Table中，找出最接近 φ’ 的 φi，對於 φi內存放的所有 (r,α) ，計算 • Step 4、將 H(xc, yc)累加 1，重複步驟 2 和 3，直到所有的邊緣點都完成測試。 • Step 5、找出 H(xc, yc)中的區域最大值，其(xc, yc)即為偵測形狀的重心位置。

Deformation tolerant GHT (DTGHT) • I：經過前處理後的邊緣圖 • S：user-drawn sketch • Seg：I上的一系列線段 • T：R-Table • m：cardinality of T(m = #T = #S) • 建立R-Table • if pk is a point of S, then: T[k] = pr-pk, pr being the centroid of S

Deformation tolerant GHT (DTGHT) • φ I(p) and φS[k] denote, respectively, the direction of the point p in I and pk in S. • In order to improve the accuracy, φ I(p) and φS[k] are computed using adjacent points in the same segment using the following formula: • where δ (δ = 10) is a constant and pj is the jth point in a given segment s (and analogously for φS ).

Deformation tolerant GHT (DTGHT) • Nevertheless, we do not use φS[k] to index T as in the original GHT. • In fact we aim at looking for a shape S’ contained in I which is similar but not necessarily identical to S. • Hence, we usually expect that a point p in S and a corresponding point p’ in S’ are quite differently oriented.

Voting Procedure • We perform a vote operation analogous to the original GHT voting phase. • α= π /8 • Now we have a voting result in space A.

Cluster the Votes in A • fixed vote dispersion window W • Let W2l+1x2l+1 be a square mask (l is defined below). • W(p) is the set of all the nonzero cells of A contained in the mask W when its center is positioned at p. • The ‘‘mass’’ M(p) of W(p), as the sum of the values of the elements of W(p). • The maximum of M(p) corresponds to themass of the region with the highest concentration of votes.

Compute M(p) • M(p) is incrementally built using a technique similar to the integral image. • Wi(p) represents the nonzero elements of the ith column of the mask W(p).

Compute M(p) • Let nowC(x, y) be the cumulative row sumcomputed with respect to the yth column of A

Compute M(p) • If P = arg max pєIM(p), then Pwith a high probability is the point in I corresponding tothe centroid of the shape most similar to S. • Since the deformation tolerance area delimits theregion of the points vary with SP ,from the parameter l it decides the size of the shape detailswhich will be ignored by the system in the matching process. • We set l = βd, where d is the diagonal of I and β =0.03( in our trials l = 12,which leads to a window side of 25 pixels.).

Example of system’s output

Line segment matching • SP is the projection of S on Iwith P its center of mass • Thick texturedregions and cluttered backgrounds can randomly concentrate their votes in a unique point not actually corresponding to a shape S’ similar to S.

Line segment matching • Extraneous vs. Valid Segments • A point p of to I is a validpoint if • i is a valid hypothesis for p. • We call a segment si avalid segment if #Vi ≥ k1 x #si, where k1 = 0.7 and Vi is the set of all the valid points of the segment si.

Line segment matching • A point p of to I is a nearbypoint if • We call a segment si aextraneous segment if si is not avalid segment and # Ni ≥ k2 x #si, where k2 = 0.2 and Ni is the set of all the nearby points of the segment si . • Let V be the subset of Seg composed of all the valid segments. • Let E be the subset of Seg composed of all the extraneous segments.

Matching Test • 多餘線段的總點數>有效線段的總點數則此圖結果不採用 • 計算所有valid線段裡面valid點的比率,作為一個相似度的估計值(m為true的比率)

Similarity

Similarityrank • The DTGHT, like the original GHT, is not rotation nor scale invariant. • In the off-line preprocessing of each database image we produce a pyramidal representation of I composed of 5 different resolution levels .

Similarityrank • The final scale invariant similarity estimation (SISim) between I and S is given by • we can suppose the user usually draws a sketch with its expected orientation (e.g., a ‘‘horizontal’’ car or horse, a ‘‘vertical’’ tree) and thus rotation invariance can often be ignored in order to speed up the system’s performance.

Similarityrank

Results

Computational complexity • n is the number of edge pixels of I • N1= w x h, m = #S, N = #Seg • k is the number of scale iterations (N « n,N1 ) • R-table、voting phase、find max M、construction of the sets V and E、Extraneous vs. Valid Segmentsand the Matching Test.

Computational complexity • the computational worst case cost of the original GHT is O(h(nm + N1 )) with h iterations for different discrete values of scale. • From this comparison we can state that the DTGHT and the GHT have the same asymptotic worst case behavior. • Moreover the DTGHT needs fewer iterations with respectto the GHT in order to deal with the same range of scalechanges (i.e., k < h)

Experimental results • We have implemented our method with non-optimizedJava code and tested it on a Pentium IV, 1.7 GHz. • Less than 2 s, one second on average. • Images from200 x 200 up to 380 x 350 pixels. • Include 5 different iterations per image for the 5 corresponding image scale values. • Not include the preprocessing.

Experimental results • The system’s database is composed of 283 images randomly taken by the Web. • No manual segmentation has been performed on theimages in order to separate the interesting objects fromtheir background or from other adjacent or occludingobjects. Also lighting conditions and noise degree are notfixed.

Experimental results

Experimental results • Comparison to other approaches • 跑24次DTGHT,一次轉15度 • do not apply scale iterations ,using the object’s minimum enclosing rectangle to set the scale parameters. • Kimia dataset.

Experimental results • Comparison to other approaches • we haveobtained the second best result. • our system is the only one among those mentioned inTable 2 which can be reliably applied to images containingocclusions and non-uniform backgrounds.

Experimental results • Comparison to other approaches • Caltech 101 dataset, composed of real images with significant texture and clutter. • 160 images for a given query was about 140 seconds, including 5 different scale iterations per image.

Conclusion

Conclusion • DTGHT is an effective technique to deal with the two main problems in sketch-based image retrieval: image segmentation and inexact matching. • inexact matching can be realized using a large dispersion vote window and that a dynamic programming approach makes this process efficient.

Conclusion • Segmentation is further obtained comparing the sketch with the candidate image lines. • We have also shown how, differently from most of the existing sketch-based image retrieval approaches, the DTGHT is able to efficiently deal with images with cluttered backgrounds.

Thank You!

資訊所 洪詡淮 P76994610