1 / 17

Imaged Document Text Retrieval without OCR

Imaged Document Text Retrieval without OCR. IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒. Outline. Introduction HTD and VTD Class of Character Objects Similarity Measure of Documents Experimental Results Conclusions. Introduction. Retrieval of Imaged Documents

genero
Download Presentation

Imaged Document Text Retrieval without OCR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Imaged Document Text Retrieval without OCR IEEE Trans. on PAMI vol.24, no.6 June, 2002 報告人:周遵儒

  2. Outline • Introduction • HTD and VTD • Class of Character Objects • Similarity Measure of Documents • Experimental Results • Conclusions

  3. Introduction • Retrieval of Imaged Documents • Process with OCR v.s. without OCR • Language dependence v.s. language independence

  4. Procedure • Image Preprocessing • Feature extraction of character objects • Horizontal Traverse Density (HTD) • Vertical Traverse Density (VTD) • Clustering • To Identify classes of character objects • Document representation • Hash Table • N-Gram • To construct indexes for imaged document retrieval

  5. Features: HTD and VTD

  6. Class of Character Objects • Unsupervise Clustering with HTD and VTD • Distance measure of character objects

  7. Distance Measure of Character Objects

  8. Examples of Character Objects

  9. Similarity Measure of Documents • N-Gram Algorithm • Cosine angle between two documents

  10. Corpus • UW1 database (600 dpi)

  11. Experimental Results • Corpus I • E01-E26

  12. Experimental Results • Corpus II

  13. Experimental Results

  14. Experimental Results

  15. Experimental Results

  16. Experimental Results

  17. Conclusion and Future Work • A new method for image document retrieval without OCR • Retrieval of language independence • Improvement of robustness for different fonts and noisy documents

More Related