1 / 15

Optical Character Recognition Tool

Optical Character Recognition Tool. Bijay Dahal {2008/BCT/509} Kabindra Shrestha {2008/BCT/516} Raj Kumar Shrestha {2008/BCT/527}. Objectives. To convert alpha-numeric character from image into normal text form. To get general idea on image processing. Tools/Technology USed. Overview.

thetis
Download Presentation

Optical Character Recognition Tool

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Optical Character Recognition Tool BijayDahal {2008/BCT/509} KabindraShrestha {2008/BCT/516} Raj Kumar Shrestha {2008/BCT/527}

  2. Objectives • To convert alpha-numeric character from image into normal text form. • To get general idea on image processing.

  3. Tools/Technology USed

  4. Overview • Taking image as input . • Converts into normal text form. • Recognizes alpha-numeric characters only. • Edit and Save recognized text. Loaded Image Converted Text Editable

  5. System Architecture Get Image Bold Thin Binarization Thinning Line Segment Character Segment Feature Extraction Matrix Matching Save Text

  6. Methodology/Algorithms • Otsu BinarizationAlgorithm • HilditchSkeletonization Algorithm (Thinning)

  7. Algorithms (contd…) • Generic Segmentation

  8. (contd…) • Feature Extraction (zonning) Based on Zones • 5 horizontal and 5 vertical zones =>25 features Based on Upper and Lower profiles • 10 vertical zones => 20 features Based on Left and Right profiles • 10 horizontal zones => 20 features Total Number of features • 25 + 20 + 20 = 65

  9. Schedule OFF DAYS: Exam Time: (25 Days) Dashain Holidays: (15 Days) Tihar Holidays: (3 Days)

  10. Challenges/Problem Faced • Choosing the correct algorithm. • Hard to implement algorithm. • Implemented, but output is not accurate. • accuracy of matrix matching.

  11. Conclusion • Text from image gets converted to text file. • Simplest algorithm; accuracy is about 40%-60%.

  12. Limitation • Can’t recognize text in noisy image. • Can’t detect inclined text from image. • Matrix matching is slow. • Bad thinning & noise makes some text unrecognizable.

  13. FUTURE Enhancement • Scanner image input. • Recognize PDF and other image format. • Nepali / Devnagari font support. • Different fonts. • Output in PDF or Word file format. • Skewing & Noise reduction. • Handwritings. • Neural Network.

  14. References • Bates, K. S. (2010). Head First Java. O'Reilly. • Improving Optical Character Recognition http://www.csc.villanova.edu/~mdamian/csc3990/csrs2008/07-csrs2008-AJPalkovic.PDF • Evaluation of OCR Algorithms for Images: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.89.9539&rep=rep1&type=PDF • Otsu Thresholding - The Lab Book Pages http://www.labbookpages.co.uk/software/imgProc/otsuThreshold.html • Image Segmentation http://people.cs.uchicago.edu/~pff/segment/ • HilditchAlgorithm http://cis.k.hosei.ac.jp/~wakahara/Hilditch.c • Skeletonizationhttp://cgm.cs.mcgill.ca/~godfried/teaching/projects97/azar/skeleton.html • Java OCR | Ron Cemer'sBlog http://www.roncemer.com/software-development/java-ocr

  15. Thank You …

More Related