Handwritten English Numerals Recognition System in Segmented Form

A Method for Recognizing Handwritten English Numerals in Segmented Form Arka Prasanna Ghosh

Description of the Problem : • Handwritten Data • English numerals , alphabets • Segmented form and Cursive form Here only numerals in segmented forms will be considered .

Objective • write the data in a properly designed form • Scan the data sheet into the computer as an image . • Identify the curve in the image as an English numeral . 3 4 5 6 • Eg : machine reading handwritten zip codes in Postal System .

Statistically , • it is a classification problem • with 10 population • obs are ( continuous ) curves on the plane ( R2 ) • data available in discrete form ( as a set of points on N 2 )

Data Data collected from 95 individuals ( actually 100 , but had to throw out 5 , for different reasons ! ) on the ten numerals ( 0, 1 , 2 , … , 9 ) they are scanned and stored as image files .

Organising the Data • using built-in matlab functions ( imread , imwrite , dither etc.) the data is converted to monochrome images colorgrayscalemonochrome

The Intensity Matrix ( Grayscale) 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 181 128 181 255 255 255 255 255 255 255 049 255 255 255 255 255 255 255 128 0 0 128 0 255 255 255 255 255 255 255 255 255 255 255 255 255 0 255 255 255 255 128 0 49 255 255 255 255 255 255 255 255 255 255 0 255 255 255 255 255 255 0 0 255 255 255 255 255 255 255 255 255 255 0 0 255 255 255 255 255 0 128 255 255 255 255 255 255 255 255 255 255 0 255 255 255 255 255 255 0 255 255 255 255 255 255 255 255 255 255 255 049 255 255 255 255 049 0 0 0 255 255 255 255 255 255 255 255 255 255 255 255 255 255 181 181 255 255 0 0 0 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 0 255 0 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 0 0 181 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 0 0 128 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 0 0 255 255 255 255 255 255 255 255 255 49 255 255 255 255 255 255 255 0 0 255 255 255 255 255 255 255 255 255 0 255 255 255 255 255 255 128 0 255 255 255 255 255 255 255 255 255 255 255 0 0 0 128 128 0 0 255 255 255 255 255 255 255 255 255 255 255 255 255 49 0 0 0 181 255 255 255 255 255 193 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 189 175 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 49 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255 193

The 0-1 Matrix ( monochrome )

. Data ( cont’d) • Each obs is now in the form of 0-1 matrices ( size 34 x 44 ) . • And we have 95 x 10 of them • Data reduction : • Sufficient to record only the coordinates of the 0’s in the matrix ( positions of the black dots in the image ) . • Each observation is now a set of bivariate coordinates ( n x 2 , typically n = 100 to 150 ) ( it was a 34x44 matrix before this )

Plan - define distance ( sort of ) between two such set of coordinates • - use first 70 of each observation to develop some method of identification • - use the last 25 observation to see the performance of the method .

Cleaning • before cleaningafter cleaning • “ Stray dots “ and “holes “ in the observation are removed by 3x3 nearest nbhd smoothing method ( ? )

Standardization • - need to bring all the curves to comparable location and scale • each obs is a set of bivariate points . • Calculate the coordinate-wise mean and s.d • If s.d is not 0 ( for any coordinate ) , normalize coordinate-wise • If s.d = 0 ( for any coordinate ) , only adjust for location and not scale in that coordinate .

Defining the Distance For each point in figure 1 , minimum ( Euclidian ) distance from any of the colored dots in fig2 is calculated - that gives n1 distances . Same thing is done for each point in fig 2 – that yields n2 more distances . Distance between the two figures is defined to be theMean ( Case 1 ) (or Minimum in the Case 2 ) of these n1+n2 many distance values Fig. 1 : red dots Fig. 2 : blue dots Overlap : violet

Distance between two Figures

Classification Algorithm For any new figure ( from the test set ) , its distances from : all the 0's in the training set are calculated ( that gives 70 values ) - and Average ( case A ) or Minimum ( Case B ) is taken as the distance from the 0-set . Similarly , distance from 1-set , 2-set , … , 9-set are calculated the new figure is identified as a member of that set which has smallest distance from the new figure

Results With MeanWith Minimum . (case A)( Case B ) Without Rotation (Case 1 ) With Rotation ( Case 2 ) Grid search (-30 ,30,10 deg)

Handwritten English Numerals Recognition System in Segmented Form

Handwritten English Numerals Recognition System in Segmented Form

Presentation Transcript

Roman Numerals

Form 5 English

Form 5 English

FORM ONE ENGLISH

Segmentation for Handwritten Documents

Roman Numerals

Slide Method for Factoring Trinomials in the form

The English Attack! Method for Learning English

Numerals

Numerals

Numerals in Different Bases

Shurley Method English

Simplex method : Tableau Form

Numerals

Slide Method for Factoring Trinomials in the form

A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition

A Method for Automatically Constructing Case Frames for English

FORM ONE ENGLISH

Method For English to Arabic Translation

English in the spoken form