extracting names using layout clues in genealogical books n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Extracting Names Using Layout Clues in Genealogical Books PowerPoint Presentation
Download Presentation
Extracting Names Using Layout Clues in Genealogical Books

Loading in 2 Seconds...

play fullscreen
1 / 22

Extracting Names Using Layout Clues in Genealogical Books - PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on

Extracting Names Using Layout Clues in Genealogical Books. Aaron Stewart David W. Embley March 20, 2010. Problem. Process. Finding Names. Name recognition in genealogical texts Focus: Lists, Directories. Finding Names. Which side was easier?.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Extracting Names Using Layout Clues in Genealogical Books


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
extracting names using layout clues in genealogical books

Extracting Names Using Layout Clues in Genealogical Books

Aaron Stewart

David W. Embley

March 20, 2010

finding names
Finding Names
  • Name recognition in genealogical texts
  • Focus: Lists, Directories
finding names1
Finding Names

Which side was easier?

It’s easy for us to spot names… But how does a computer do it?

finding names2
Finding Names

Natural Language Processing

Stanford Named Entity Recognizer

?

Apache UIMA Framework

MEMM

CRF

byu ontoes ontology extraction system
BYU OntoES Ontology Extraction System
  • Dictionary
  • Regular Expressions
ancestry com data
Ancestry.com Data
  • Word text
  • Word bounding boxes
  • Genres:
    • Genealogical Books
    • City Directories
    • Yearbooks
    • Newspapers
margin finder future work1
Margin Finder – Future Work
  • ABBYY FineReader handles –
    • Paragraphs
    • Newspaper columns
  • But has trouble with –
    • Hanging indents
    • Outline indentation (possibly)
pattern finding
Pattern Finding
  • Apply baseline name extractor (OntoES)
  • Apply margin finder and insert markers
  • Find left and right context for each name
  • Apply common contexts to extract more names
pattern finding1
Pattern Finding

1. Apply baseline name extractor (OntoES)

pattern finding2
Pattern Finding

2. Apply margin finder and insert markers

LEVEL 1

LEVEL 1

LEVEL 1

LEVEL 1

LEVEL 1

LEVEL 2

LEVEL 1

LEVEL 2

pattern finding3
Pattern Finding

3. Find left and right context for each name

LEVEL 1

LEVEL 1

LEVEL 1

LEVEL 1

LEVEL 1

LEVEL 2

LEVEL 1

LEVEL 2

pattern finding4
Pattern Finding

4. Apply common context patterns to extract more names

LEVEL 1

LEVEL 1

LEVEL 1

LEVEL 1

LEVEL 1

LEVEL 2

LEVEL 1

LEVEL 2

pattern finding sample results
Pattern Finding – Sample Results

Baseline Results

  • Precision: 40%
  • Recall: 31.25%
  • F1: 35.09%

Results of Most Salient Pattern

  • Precision: 51.52%
  • Recall: 53.12%
  • F1: 52.31%

Not all results are this good!

challenges
Challenges
  • Evaluation
    • More aligned data
    • Annotation tool
  • Other books
    • Centered and right-aligned text
    • Knowing when to apply patterns