1 / 12

Recognition Assistance Linguistic Feedback for Treating Errors of Recognition Processes

Recognition Assistance Linguistic Feedback for Treating Errors of Recognition Processes. Gábor Prószéky & Mátyás Naszódi. proszeky@morphologic.hu & naszodim@morphologic.hu. www. .hu. Rome, 21 May 2003. IKTA-063/2000 . Project Details. Consortium: MorphoLogic (100%)

everly
Download Presentation

Recognition Assistance Linguistic Feedback for Treating Errors of Recognition Processes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognition AssistanceLinguistic Feedback for Treating Errors of Recognition Processes Gábor Prószéky & Mátyás Naszódi proszeky@morphologic.hu & naszodim@morphologic.hu www..hu Rome, 21 May 2003

  2. IKTA-063/2000 Project Details • Consortium:MorphoLogic (100%) • Project Timeframe: 1 Sept 2000 – 28 Feb 2002 (completed) • Total budget: 19,2 M HUF[approx. 76 000 €] • Funding received: 50% Rome, 21 May 2003

  3. IKTA-063/2000 Project Objectives • Support for three recognition processes: OCR, handwriting, speech • A new principle for generalized recognition assistance: an algorithm for blinking and attentive listening – backed by linguistic knowledge • Development of necessarytechnology • Prototype of the first application Rome, 21 May 2003

  4. IKTA-063/2000 ... this is the only way Why this way? OCR Unknown characters Input character graph Spellingchecker Statistical post-processor Combinedpost-processor comparator text Input Linguistic character graph Rome, 21 May 2003

  5. IKTA-063/2000 Features of the Prototype • equipped with a module forsegmentation of continuous input • handles incorrect data, in terms of timing and quality • applies linguistic modules in a parallel manner Rome, 21 May 2003

  6. IKTA-063/2000 Goals • better recognition quality for current (low noise) input types • a solution for recognizing noisy input • „reviving” the shrinking OCR market • trying a method that has a significant international impact Rome, 21 May 2003

  7. IKTA-063/2000 The Segmentation Module • simoultaneous operation (new!) of • (1) lexical analysis,(2) morphological analyisis,(3) handling syntactic patterns describing the error model • handling underspecified input: in terms of quality and timing (new!) Rome, 21 May 2003

  8. IKTA-063/2000 Some Characteristics of OCRed Hungarian Texts • average sentence length: 40 characters • average ambiguity: in every 3 characters • an average of 3 alternatives • that would require 1,500,000 tries • our ‘traditional’ analysis speed: 1000 words/sec • this means 20 hours of experimenting • t = c·elength (t – time, c – speed) Rome, 21 May 2003

  9. IKTA-063/2000 Instead: The FSA Used in the Project • finite state automaton: 100,000 words/sec • we ask the system whether the input is a prefix of something known or not (can it be continued?) • Recognition time:‘traditional’ analysis: t = c·elengthFSA: t = c·length·log(length) Rome, 21 May 2003

  10. IKTA-063/2000 Handling OCR Alternatives Rome, 21 May 2003

  11. IKTA-063/2000 Handling phonetic alternatives OM, 2002. június 12.

  12. ... every project and every presentation must come to an end …

More Related