laserfiche clinic 2006 2007
Download
Skip this Video
Download Presentation
Laserfiche Clinic 2006-2007

Loading in 2 Seconds...

play fullscreen
1 / 16

Laserfiche Clinic 2006-2007 - PowerPoint PPT Presentation


  • 141 Views
  • Uploaded on

Laserfiche Clinic 2006-2007. Adam Field. Ben Tribelhorn, PM. Advisor:. Zach Dodds. Aaron Wolin. Stephen Smith. Liaison Luncheon @ HMC, Sept. 12 th , 2006. The Problem. raw image. OCR-able image.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Laserfiche Clinic 2006-2007' - kalin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
laserfiche clinic 2006 2007

Laserfiche Clinic 2006-2007

Adam Field

Ben Tribelhorn, PM

Advisor:

Zach Dodds

Aaron Wolin

Stephen Smith

Liaison Luncheon @ HMC, Sept. 12th, 2006

slide2
The Problem

raw image

OCR-able image

To convert pictures of documents taken with a digital camera into images that can be organized using Laserfiche's OCR and database technologies.

Project goal:

slide3
The Problem

raw image

OCR-able image

To convert pictures of documents taken with a digital camera into images that can be organized using Laserfiche's OCR and database technologies.

Project goal:

  • presence of paperclips and/or staples
  • varied/confusing backgrounds (including stacks of papers)
  • one or more edges off the edge of the image
  • knowing when the system has failed
  • camera perspective issues - documents not images head-on (?)
  • other important cases?

Some important cases:

slide4
Approaches

Outside - In

Inside - Out

?

  • Approach taken by previous clinic
  • Finding document corners
  • Unwarping to 8.5 x 11"
  • Possible approach taken by current clinic
  • First analyzing text-line boundaries
  • Then unwarping to straighten them
slide5
Camera Document Restoration for OCR
  • Able to detect the type of distortion or severity of the warping
  • Uses “Vertical Stroke Boundaries” VSBs of characters

VSBs

  • Several algorithms use VSBs to detect and correct the image

Lu and Tan. “Camera Document Restoration for OCR.” http://www.m.cs.osakafu-u.ac.jp/cbdar/proceedings/papers/O1-3.pdf

slide6
Finding Vertical Stroke Boundaries
  • Connected components first
  • Find the "top" and "base" lines for a line of text
  • Scan between the top and base lines, searching for pixels that form relatively orthogonal and straight lines

Tip point tracing process.

Lu, Chen, and Ko. “Perspective rectification of document images using

fuzzy set and morphological operations.” http://vlab.ee.nus.edu.sg/~bmchen/papers/ivc.pdf

slide7
A Fast Orientation and Skew Detection Algorithm
  • Uses connected components and nearest neighbors to find document skew
  • Places the text line angles into two histograms from ±90º Precisions are 1.0º and 0.1º
  • The skew angle is the histogram peak

Avila and Lins. “A Fast Orientation and Skew Detection Algorithm for Monochromatic Document Images.” http://delivery.acm.org/10.1145/1100000/1096631/p118-avila.pdf

slide8
Problem Taxonomy

Hand-writing

Magazines/Newspaper

document difficulty

Forms

Mostly text documents

Geometric

Skew

Perspective

warp severity

slide9
Problem Priorities ?

Hand-writing

Magazines/Newspaper

secondary focus

document difficulty

Forms

Mostly text documents

Geometric

Skew

Perspective

primary focus

warp severity

slide10
Pair 1's plan

Finding character strokes

Estimating warp severity

Thresholding

picture from

ben and stephen

slide11
Pair 2's plan

Least-sq. line-fitting

Visualizing the processing

Finding skew estimates

Two-tier assessment

1) reasonable?

2) OCR accuracy

picture from

aaron & adam

slide12
Tentative Schedule

Th 9/21 (11:30 am) Call - progress update

T 9/26 Initial presentation @ Harvey Mudd

Th 9/28 Prototype of each algorithm

F 10/6? Site visit and presentation @ Laserfiche

Weekly conference calls with Ed Heaney

Accessible codebase and performance updates

Other deliverables ?

slide15
Hand

Writing

Magazines

Forms

Plain Text

Skew

Perspective

Geometric

Image Warping

slide16
Taxonomy

Hand-writing

Magazines/Newspaper

Forms

Mostly text documents

Geometric

Skew

Perspective

ad