1 / 13

Coursework for ISSALE - 2014 Project Demonstration

Coursework for ISSALE - 2014 Project Demonstration. SINHALA LANGUAGE OCR. Kasun Perera Chamila Liyanage Tharaka Viswakula Laksri Wijerathna. Sinhala Script consists of:. 18 vowels. 40 consonants. Sinhala Script. 18 modifiers other symbols (rakaranshaya, yansaya) Font: Abhaya

cade
Download Presentation

Coursework for ISSALE - 2014 Project Demonstration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coursework for ISSALE - 2014 Project Demonstration SINHALA LANGUAGE OCR • Kasun Perera • Chamila Liyanage • Tharaka Viswakula • Laksri Wijerathna

  2. Sinhala Script consists of: 18 vowels 40 consonants

  3. Sinhala Script 18 modifiers other symbols (rakaranshaya, yansaya) Font: Abhaya Font Size :12

  4. Selected characters

  5. Document Image Image document has 16 different character types and 11 samples of each character type.

  6. Line and Main Bodysegmentation • All lines were segmented correctly • No of Lines in input Image -9 • Program Outputs 9 line segments • 100% accuracy • All Main bodies were segmented correctly(No diacritics) • 100% accuracy

  7. Decision Tree Recognition results • Creation of Training(35) and Test data(15) • Decision Tree created using Weka - using Training data • Tested accuracy using Test data Overall accuracy: 70 % Bad recognition Chars 702- නි / 708- ල් / 711- සි / 712- ත්

  8. Tesseract Recognition results Overall accuracy: 93.181%

  9. Complete OCR- DT Method Overall accuracy - 28%

  10. Complete OCR - Tesseract Overall accuracy - 92.8%

  11. Tesseract Output File

  12. Conclusion Test dataset (15) • Tesseract Accuracy- 93% • DT Accuracy- 70% Document Image • Tesseract Accuracy- 92.8% • DT Accuracy- 28%

  13. ස්තුතියි...! (Thank you...!)

More Related