1 / 18

Tesseract OCR

Tesseract OCR. 97703036 資科四 李昱安. 關於 Tesseract OCR. 關於 Tesseract OCR. Tesseract OCR 是 HP 公司的研究員於 1985-1994 年間 開發的 OCR 引擎,當時是內華達州立大學 OCR 準確度 (accuracy) 競賽的前三名 。 2005 年轉由 Google 進行維護並在 2006 年以 Open Source 的方式 釋出 Google 宣稱 Tesseract OCR 是準確度最高的 Open Source OCR 引擎。. 關於 Tesseract OCR.

yitta
Download Presentation

Tesseract OCR

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tesseract OCR 97703036 資科四 李昱安

  2. 關於Tesseract OCR

  3. 關於Tesseract OCR • Tesseract OCR是HP公司的研究員於1985-1994年間開發的OCR引擎,當時是內華達州立大學OCR準確度(accuracy)競賽的前三名。 • 2005年轉由Google進行維護並在2006年以Open Source的方式釋出 • Google宣稱Tesseract OCR是準確度最高的Open Source OCR引擎。

  4. 關於Tesseract OCR • 支援30種以上的文字/語言 • 能分析頁面、支援直書 • 輸入圖檔須為:未經壓縮的TIF格式背景須為白色文字支援全彩

  5. 關於Tesseract OCR • 將字元的邊緣取多邊形逼近,再用多邊形的x-position、y-position、direction及length四維向量作為其feature

  6. 關於Tesseract OCR

  7. We‘ve already sorted through the specs, and laid our hands on its rather sexy frame, now Fujifilm'soffering up a more palatable price tag than we expected for its throwback X10 shooter. Startingsometime in early October, the X100's more affordable little brother will set nostalgic point-and-shooters back $599.99 — about $100 bones less than the estimated $715 to $860 ballpark we threw outback in September. If you'll recall, the X10 packs a 12 megapixel EXR CMOS sensor, f/2-2.8, 28-112mmmanual zoom lens, up to 12,800 ISO sensitivity, 1080p video, an optical viewfinder, and pop-up flash. Noword yet on a final release date. Full PR after the break.

  8. wevealmay 591191 u1¢911g\1 me 91995, ma had 9111 mas .111 19 num my mme, 119wn1j;aLm~s 9ef9119g up 1 19919 1191991119 P1199 mg um we 91999191 f91 19 u1¢9w\199k x19 =\199¢91. sm-ang 5919911919 an may 09191199 me x199's 19919 959199919 me \119u191 wan 591 1199111919 11919199117 =\199¢91s\199k $599.99 _ 1119111 s199\19119=1& um me 9919191911 sm 19 sa99\19up9¢kw9 u1¢9w99¢ 9991119 s9p¢ \191. 1fy911'u Emu, me x19 119919 1 12 19991911191 nxncmos 591591, f/29.9, 2s»112mm 91991111 19919 19.15, up 19 12,999 xsosensiavify, 199911 v;a99, 99 9p¢1<=1 v;9ws9a91, ma P99911 mash. N9w91ay919111s91\191m9a=¢9. n1u1m9n91u19\1m11_

  9. We‘ve alrmdy sorted through the specs, and laid our hands on is rather sexy fimlne, now Fujifi.l.m‘soffering up 1 more palatable price mg than we expected for is Lhrawback X10 shooter. Starlingsometime in mrly October, the X1oo's more afifordable little brother will set uostz.lp'c p0inl~and—shoolelsback s599.99 _ about $1ooboues 1§ than the eflimaled $115 me $86oba.l.lparkwe LhrewoulbackinSeplbet. Ifyo\|'l.llemll, the X10 packs 1 12 megppixel EXRCMOS sensor, f/2fl.B, 2E>n2m|:umm zoom lens, up to 12,800 150 sensitivity, 10801; video, an optiml viewfinder, and popllpflash.Nowordyetouzfinallelmsedale. F\|.|.lPRafien.heb!m.k.

  10. Adapting the Tesseract Open Source OCR Engine forMultilingualOCRRay Smith DariaAntonova Dar-ShyangLeeGoogle |nc., 1600 Amphitheatre Pkwy, Mountain View, CA 94043, USA.AbstractWe describe eforts to adapt the Tesseract open source OCRengine for multiple scripts and languages. Eflort has beenconcentrated on enabling genmicniulti-lingual operation suchthat negligible eust0nti:ati0n is required far a new languagebevmrrlprorirling a cmpusaftert.

  11. 為了落實國民教育的精砷 ,也為了提昇國家人力素質促進競爭力教育部多年以來 一 直致力於推動教育普及化及延長國民教育 c

  12. The (quick) [brown] {fox} jumps!Over the $43,456.78 <lazy> #90 dog& duck/goose, as 12.5% of E-mailfrom aspammer@website.com is spam.Der ,,schnelle” braune Fuchs springtfiber den faulenHund. Le renardbrun<<rapide» saute par-dessus le chienparesseux. La volpemarronerapidasaltasoprail cane pigro. El zorromarrénrépidosaltasobre el perroperezoso. A raposamarromrzipidasaltasobre 0 cfiopreguieoso.

  13. 就在十月初的時候 7 這間 日本著名的相機製造商總算肯透露其定價將訂在 US$599_99 (約 N丁$18,300 、 • HK$4,700) 之譜 7 坦白靚還直 我們,D中所想像的價位啊 ! (感覺至少會比 GRD 貴些吧 ? 直是 • 想不到 XD) 預計將於十一月初開賣 7 不過包括美國及中港台地區目前都仍未公佈確切的發售日期 7 但可 • 以確定的是您將會有更多預算可以先準偏好它的皮套 、 背帶等相闆周暹 7 讓呈晝台復古囷格的 X10 更有味 • 道 ° 透過引用來源可看到完整的新間稿 、 台灠宮網介紹以及日本宮網的賣拍樣本 °

  14. 執行時間

  15. 準確度

  16. 結論 • 英文/西歐字元辨識準確度很高,字體達一定大小,準確度都有99%以上(以字元計) • 使用官方提供的正體中文model來辨識,易產生許多誤字,遇標點、符號及數字時也容易辨識錯誤,即便字體放到很大也是如此

  17. 試看超立方暟光學文字辨識中文是否準確成式戎戍戌戒找我或咸試看超立方暟光學文字辨識中文是否準確成式戎戍戌戒找我或咸

  18. http://code.google.com/p/tesseract-ocr

More Related