Laserfiche clinic 2006 2007
This presentation is the property of its rightful owner.
Sponsored Links
1 / 16

Laserfiche Clinic 2006-2007 PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on
  • Presentation posted in: General

Laserfiche Clinic 2006-2007. Adam Field. Ben Tribelhorn, PM. Advisor:. Zach Dodds. Aaron Wolin. Stephen Smith. Liaison Luncheon @ HMC, Sept. 12 th , 2006. The Problem. raw image. OCR-able image.

Download Presentation

Laserfiche Clinic 2006-2007

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Laserfiche clinic 2006 2007

Laserfiche Clinic 2006-2007

Adam Field

Ben Tribelhorn, PM

Advisor:

Zach Dodds

Aaron Wolin

Stephen Smith

Liaison Luncheon @ HMC, Sept. 12th, 2006


Laserfiche clinic 2006 2007

The Problem

raw image

OCR-able image

To convert pictures of documents taken with a digital camera into images that can be organized using Laserfiche's OCR and database technologies.

Project goal:


Laserfiche clinic 2006 2007

The Problem

raw image

OCR-able image

To convert pictures of documents taken with a digital camera into images that can be organized using Laserfiche's OCR and database technologies.

Project goal:

  • presence of paperclips and/or staples

  • varied/confusing backgrounds (including stacks of papers)

  • one or more edges off the edge of the image

  • knowing when the system has failed

  • camera perspective issues - documents not images head-on (?)

  • other important cases?

Some important cases:


Laserfiche clinic 2006 2007

Approaches

Outside - In

Inside - Out

?

  • Approach taken by previous clinic

  • Finding document corners

  • Unwarping to 8.5 x 11"

  • Possible approach taken by current clinic

  • First analyzing text-line boundaries

  • Then unwarping to straighten them


Laserfiche clinic 2006 2007

Camera Document Restoration for OCR

  • Able to detect the type of distortion or severity of the warping

  • Uses “Vertical Stroke Boundaries” VSBs of characters

VSBs

  • Several algorithms use VSBs to detect and correct the image

Lu and Tan. “Camera Document Restoration for OCR.” http://www.m.cs.osakafu-u.ac.jp/cbdar/proceedings/papers/O1-3.pdf


Laserfiche clinic 2006 2007

Finding Vertical Stroke Boundaries

  • Connected components first

  • Find the "top" and "base" lines for a line of text

  • Scan between the top and base lines, searching for pixels that form relatively orthogonal and straight lines

Tip point tracing process.

Lu, Chen, and Ko. “Perspective rectification of document images using

fuzzy set and morphological operations.” http://vlab.ee.nus.edu.sg/~bmchen/papers/ivc.pdf


Laserfiche clinic 2006 2007

A Fast Orientation and Skew Detection Algorithm

  • Uses connected components and nearest neighbors to find document skew

  • Places the text line angles into two histograms from ±90º Precisions are 1.0º and 0.1º

  • The skew angle is the histogram peak

Avila and Lins. “A Fast Orientation and Skew Detection Algorithm for Monochromatic Document Images.” http://delivery.acm.org/10.1145/1100000/1096631/p118-avila.pdf


Laserfiche clinic 2006 2007

Problem Taxonomy

Hand-writing

Magazines/Newspaper

document difficulty

Forms

Mostly text documents

Geometric

Skew

Perspective

warp severity


Laserfiche clinic 2006 2007

Problem Priorities ?

Hand-writing

Magazines/Newspaper

secondary focus

document difficulty

Forms

Mostly text documents

Geometric

Skew

Perspective

primary focus

warp severity


Laserfiche clinic 2006 2007

Pair 1's plan

Finding character strokes

Estimating warp severity

Thresholding

picture from

ben and stephen


Laserfiche clinic 2006 2007

Pair 2's plan

Least-sq. line-fitting

Visualizing the processing

Finding skew estimates

Two-tier assessment

1) reasonable?

2) OCR accuracy

picture from

aaron & adam


Laserfiche clinic 2006 2007

Tentative Schedule

Th 9/21 (11:30 am) Call - progress update

T 9/26 Initial presentation @ Harvey Mudd

Th 9/28 Prototype of each algorithm

F 10/6? Site visit and presentation @ Laserfiche

Weekly conference calls with Ed Heaney

Accessible codebase and performance updates

Other deliverables ?


Laserfiche clinic 2006 2007

Comments?


Laserfiche clinic 2006 2007

Other Papers


Laserfiche clinic 2006 2007

Hand

Writing

Magazines

Forms

Plain Text

Skew

Perspective

Geometric

Image Warping


Laserfiche clinic 2006 2007

Taxonomy

Hand-writing

Magazines/Newspaper

Forms

Mostly text documents

Geometric

Skew

Perspective


  • Login