automatically identifying records from the extracted data fields of genealogical microfilm l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Automatically Identifying Records from the Extracted Data Fields of Genealogical Microfilm PowerPoint Presentation
Download Presentation
Automatically Identifying Records from the Extracted Data Fields of Genealogical Microfilm

Loading in 2 Seconds...

play fullscreen
1 / 24

Automatically Identifying Records from the Extracted Data Fields of Genealogical Microfilm - PowerPoint PPT Presentation


  • 187 Views
  • Uploaded on

Automatically Identifying Records from the Extracted Data Fields of Genealogical Microfilm. Kenneth Tubbs. Microfilm Image. Input. Table Zones. The coordinates of each table cell The printed text in ASCII for each cell, if any. Whether or not the cell is empty. Table Zones. Identify

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Automatically Identifying Records from the Extracted Data Fields of Genealogical Microfilm' - albert


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
automatically identifying records from the extracted data fields of genealogical microfilm

Automatically Identifying Records from the Extracted Data Fields of Genealogical Microfilm

Kenneth Tubbs

input
Input

Table

Zones

  • The coordinates of each table cell
  • The printed text in ASCII for each cell, if any.
  • Whether or not the cell is empty.
algorithm

Table

Zones

Identify

Structure

Record

Patterns

Match

Attributes

Genealogical Ontology

Check

Constraints

Algorithm
identify structure
Identify Structure
  • Identify Table Primitives
  • Aggregate Table Primitives
  • Sort Candidates

Identify

Structure

identify structure6

Name

Identify Structure
  • Identify Table Primitives

Column:

[[table_label width] [table_value width]+] {below}

Identify

Structure

identify structure7

Name

Identify Structure
  • Identify Table Primitives

Row:

[[table_label height] [table_value height]+] {left}

Identify

Structure

identify structure8

Row Primitive

Column Primitive

Identify Structure
  • Identify Table Primitives

Printed Text

Hand-written Text

Identify

Structure

identify structure9
Identify Structure

2. Identify Table Primitives

  • Probabilistic Rules are associated with each
  • primitive type.
  • Examples
  • Column primitives should be factored left to right. (.9)
  • Row primitives factor the Column primitives below them. (.7)

Identify

Structure

identify structure10

A

B

C

D

E

F

G

H

I

J

K

L

Identify Structure

2. Aggregate Table Primitives

Identify

Structure

identify structure11

G

H

I

J

K

L

Identify Structure

2. Aggregate Table Primitives

[G H I J K L] or

[G] [ H I J K L] or

[K] [G H I J L] or

[G] [H I J [K][L]] or

Others

Identify

Structure

identify structure12
Identify Structure

2. Sort Candidates

  • The candidates are evaluated based on:
  • The confidence of the table primitive matches.
  • The probability the the rules used are correct.

Identify

Structure

identify structure13
Identify Structure

2. Sort Candidates

  • [G] [ H I J K L]
  • [G H I J K L]
  • [G] [H I J [K][L]]
  • [K] [G H I J L]
  • Others

Identify

Structure

match attributes
Match Attributes
  • Identify Possible Mappings
  • Sort Candidates

Match

Attributes

match attributes15

Name

Name

Sex

Gender

Female Age

Female, Age

Genealogical Ontology

Match Attributes
  • Identify Possible Mappings

Mapping types

Printed Text

  • Identical Matches
  • Synonym Matches
  • Composite Matches
  • Human-Aided Matches

Match

Attributes

match attributes16
Match Attributes

2. Sort Candidates

  • The candidates are evaluated based on:
  • The type of the match.
  • The confidence of the match.

Match

Attributes

check constraints
Check Constraints
  • Identify the individual records
  • Evaluate the records with the Genealogical Ontology.

Check

Constraints

check constraints18
Check Constraints

Table (Address , Age) = 4.1

Address

1

1

1

4.1

3.9

4.2

Name

Age

Gender

Check

Constraints

check constraints19
Check Constraints

Ontology (Address, Age) = 1.5 * 4.3 * .9 = 5.805

Age

Name

Gender

5

1.1

10

1.1

.9

1.1

1.5

1.3

4.3

1.3

Address

Family

Person

Check

Constraints

check constraints20
Check Constraints

Constraint_Score =

1 2

(1\(2n)) *  | Ontology(i, j) – Table(i,j) |2

  • The variables “i” and “j” are attributes.
  • The sum is over all combinations of “i” and “j”.
  • The variable “n” is number of attributes.

Check

Constraints

check constraints21
Check Constraints

The algorithm sorts the candidates by their constraint score.

The algorithm creates rules to prevent the factoring of the attributes the receive low constraint scores.

Check

Constraints

algorithm22

Table

Zones

Identify

Structure

Record

Patterns

Match

Attributes

Genealogical Ontology

Check

Constraints

Algorithm
final remarks
Final Remarks
  • The algorithm produces:
  • Record Patterns
    • Attributes for each record
    • Geometry for each record
  • 2. Attribute mappings from the table to the ontology.
final remarks24
Final Remarks
  • Given extracted values for the information written by hand,
  • the process can extract the records into an XML file.
  • Individuals can then query the XML files and index
  • back into the original microfilm images.