1 / 37

Local Gradient Histogram For Word Spotting In Unconstrained Handwritten Documents

בס"ד. Local Gradient Histogram For Word Spotting In Unconstrained Handwritten Documents. By Jose A. Rodriguez, Florent Perronnin Xerox Research Centre, Europe. Presenting : Alon Tzeiri & Gennady Arhangorodsky. Introduction The Article – Background & Contribution

bette
Download Presentation

Local Gradient Histogram For Word Spotting In Unconstrained Handwritten Documents

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. בס"ד Local Gradient Histogram For Word Spotting In Unconstrained Handwritten Documents By Jose A. Rodriguez, FlorentPerronnin Xerox Research Centre, Europe Presenting : AlonTzeiri & Gennady Arhangorodsky

  2. Introduction • The Article – Background & Contribution • The Article’s Feature • Experiments - experiments with HMM & results - experiments with DTW & results outline

  3. Handwritten Word Spotting (HWS) is the pattern classification task, which consists in detecting specific keywords in handwritten document image, text etc.. The main difficulty is the high intra-writer and inter-writer variability. The scenario of this work is an unconstrained handwritten word spotting task - This includes a variety of writer-styles ,document layouts , spontaneous writing. Artifacts or spelling mistakes. For instance, when trying to index an historical document, it’s sometimes written by various writers, hence we must distinguish between their handwriting The goal is to detect keywords in realistic, unrestricted conditions. Introduction

  4. Therefore , an important decision in the representation phase is the choice of a word Descriptor. • A word will be “described by” (i.e its value is..) a single feature-vector or by a sequence of feature-vectors • The evaluation of such feature-vectors will be shown later on Two main word descriptors: 1. Holistic – extract a single feature vector for image. -Disadvantage : has a limited performance -Advantage : Is sufficient for digit recognition or small-vocabulary word recognition. and is faster to evaluate. 2. Local\Sequential – described as a 1-D sequence of feature vectors. -Advantage - way more accurate for describing a word (as will be shown later on…..) -Disadvantage – takes more time to evaluate Introduction – cont’d

  5. Once we’ve spotted the desired keywords in the image we can * Route mails based on the presence of specified keywords * Indexing of historical documents * Extract meta-data from document images * document categorization The motivation for handwritten word spotting

  6. Introduction • The Article – Background & Contribution • The Article’s Feature • Experiments - experiments with HMM & results - experiments with DTW & results outline

  7. The main contribution of this article is a new sequential-feature-set that obtains performance well beyond the other feature-set that are usually used in an unconstrained word-spotting task. The Secondary contribution of this article is to provide an experimental comparison of local word-descriptors for word spotting. To give a comparison that is independent of the classifier employed, we will provide results from using both DTW and HMM contribution

  8. Let M be the original image and Mx=Gauss(M,x) is the image after Gaussian blur with strength x. • Let (a,b) be a point, M[a,b] is the value of the pixel at this point in image M. keypoint is a point the extremum for the function Fx(a,b)=|Mx[a,b]-M[a,b]| • Scale-invariant feature transform (SIFT) is an algorithm that uses the orientation of the gradient in the keypoints from several strength points in order to identify major objects inside the image. • This algorithm is the inspiration for this experiment because it has been using the orientation of the gradient and classifying each orientation into bins which is similar to the features that we going to use in this experiment. • Hence, we would like to describe words by generating a sequence of such descriptors as in SIFT by moving a sliding window over a word image (will elaborate later on this) SIFT – the inspiration for the experiment Find keypoints Discard those with a low contrast and those too close to the edges

  9. Introduction • The Article – Background & Contribution • The Article’s Feature • Experiments - experiments with HMM & results - experiments with DTW & results outline

  10. How does our HWS system go? 1. Segmentation – Extract sub-images that potentially represent words – employing state-of-the-art techniques based on projection profiles and clustering of gap distances 2. Fast Rejection & Pruning - A classifier using holistic features is applied to perform a 1st rejection pass, prunes about 90% of the segmented words ,falsely rejecting about 5% of the keywords

  11. How does our HWS system go? 3. Normalization – Non pruned word-images are normalized with respect to slant skew and text height 4. Feature Vector Computation – For each normalized image (i.e non pruned Word) a sequence of feature vector is computed (Later on…..) 5. Scoring assignment – For each sequence of feature-vector (i.e word) the word will be consider as a keyword if the word exceeds a predefined threshold. The study focuses on part 4, which deals with the computation of the feature vector sequence. More precisely, it deals also with the choice of a feature set. There are few feature sets, but due to their limited performance we will introduce a new feature-set and will compare it to the rest. First, lets have a brief look at the existing feature sets, since we will later compare them to our new feature set by HMM and DTW :

  12. Column Features Those features are calculated by acting upon the foreground pixels and then several features are concatenated : 1.number of the pixels of the word itself (i.e. the foreground) 2.mean 3.Second order moment , which can be one of the covariance matrix elements, or possibly the variance itself 4.Min/max of pixel positions 5.The difference from the previous column of the min/max position 6.Black white transitions 7.Number of pixels between upper line and base line State-of-the-art features

  13. State-of-the-art features

  14. State-of-the-art features Gaussian filtered : Gradient vertical : Gradient horizontal :

  15. 2. Pixel-Count Features dividing the window that spans several columns into 4x4 cells after adjusting the height to the actual content of the foreground pixels and then counting the number of pixel in each such cells. State-of-the-art features – cont’d

  16. 3.Gaussian Filter Features the DTW algorithm uses three features that shall be computed per pixel and concatenated into a feature vector for each pixel in the window: (a) Value after applying a vertical gradient filter (b) Value after applying a horizontal gradient filter (c) Value after applying a Gaussian filter. Each window was 15 pixel high and 1 pixel wide, thus we have 45 features because the each value for each feature in the final vector is the sum of all such value in every window State-of-the-art features – cont’d

  17. Local Gradient Histogram features! the experiment would prove that they lead to improved performance in a word spotting task. The proposAL

  18. 1. Sliding Window : - Given an image(Word) , I(x,y) with Height = H Width = W - we center each column of I within a window of width w<W and height H. • At each position , a feature-vector is computed that only depends on the pixel inside of that window. • The feature-vector at each position is computed • Hence, we get a sequence of W feature-vectors • I(x,y) ===> Computing the feature-vector sequence

  19. 2. Division of the window into cells: • each window is subdivided into rectangular cells employing one of the (i) split the window into M x N cells of identical dimensions (ii) same as above, but only for the window area that actually contains pixels (iii) independently subdivide the window into three parts determined by the lower and upper bounds resulting in a gird of (A+B+C) x N cells Computing the feature-vector sequence – cont’d

  20. 3. Gradient Histogram Computation: • In general , at each cell(local) gradient histogram features is extracted. • Gradients+Smoothing • We do it by taking image I(x,y) , smoothing it to L(x,y) for de-noising it. • Now we compute horizontal and vertical gradient are determined as • a Computing the feature-vector sequence - cont’

  21. 3. Gradient Histogram Computation: cont’d after choosing one of the above, we now obtain from each pixel (x,y) The magnitude m with the direction θ • a Computing the feature-vector sequence – cont’d

  22. 3. Gradient Histogram Computation: cont’d • The gradient angles are quantified into a number T of regularly spaced orientation. • For each pixel (x,y) we determine which of the T orientations is the closest to θ(x,y) • and then we add part of m(x,y) (its norm) to the corresponding bin. Computing the feature-vector sequence – Cont’d

  23. 3. Gradient Histogram Computation: cont’d Assigning gradients to the closest orientations may result in aliasing noise. To reduce its impact the gradient magnitude of a pixel can be shared between the two closest bins in dependency to the angles ratio (see last page). Then the contribution of the pixel to the two bins will be as follows Computing the feature-vector sequence - cont’d

  24. 3. Gradient Histogram Computation: cont’d Shown below is the calculation for one feature-vector which is computed for each window • As we said earlier, there are W of such M x N vectors , hence, each word will be • Characterized by a sequence of those W M x N vectors. Computing the feature-vector sequence - cont’d

  25. 4. Frame Normalization : • Feature vector at one window position is called Frame. • A frame is the concatenation of the gradient histogram computed in each cell • A performance gain is obtained when scaling each frame so that their componenets sum to 1. Computing the feature-vector sequence - cont’d

  26. 5. Summary: • If in each window there are M x N cells (in case of regular split) and each cell is represented by a histogram of T bins • each position of the sliding window will be characterized by a feature vector of M x N x T components • The word is characterized by a sequence of W such vectors Computing the feature-vector sequence - cont’d

  27. The process

  28. Example

  29. Introduction • The Article – Background & Contribution • The Article’s Feature • Experiments - experiments with HMM & results - experiments with DTW & results outline

  30. 630 letters in French • Customer department • The letters were processed by segmentation algorithms in order to extract word candidates. • We would now do the fast rejection that was depicted earlier on those word candidates • This experiment is using the 10 most frequent words and 208-750 examples for each such word • The words would be also used as our keywords The Input for the expirement

  31. Before the HMM would run , the keywords would be divided into 5 groups • Four would be used for training(i.e building) the model • The 5th group was used as the testing group i.e. we actually run the experiment on this group • We repeated those steps 5 times. • In the DTW test we would use 5 random images as our queries • Those queries are our reference for comparing between words based on the distance from the queries Comparing to other tests

  32. The negative value of the distance to the closest of this queries would be used as the similarity score. We have done 5 repetitions and then the results were averaged between the repetitions in the DTW test. Comparing to other tests – Cont’d

  33. DET (Detection error trade-off) curves are curves that compare one type of an error to a complementary type that depends on some kind of an error detection threshold. • A point in this curve indicates error rates for specific threshold. • Thus, a DET Curve is the trajectory that is formed by the movement of a point when we change the error threshold. • Usually the axes are for the respective error levels denoted as percents , thus the less the distance from the origin becomes , the better becomes the algorithm • In all the curve in this experiment the false rejection rate is plotted against the false acceptance rate: • False rejection is when an element scored under the threshold but really does correspond to the word • False acceptation is an element that had passed the threshold and does not correspond to the word. Det Curve Definition

  34. In order to calibrate the experiment the most advantageous grid configuration should be found . In this experiment the precision was averaged across the thresholds to produce the mean for different grid layouts in the moving window using HMM Calibration

  35. The optimal number of orientation bins was found to be 8. The fitted grid is better because it is reasonably restricted to the actual content inside the window Calibration – cont’d

  36. Mean (average) precision for the HMM method and DET curves • The DET curves are for the word “Résiliation” Results

  37. Again the mean (average) precision this time using DTW Results –cont’d

More Related