a system for understanding imaged infographics and its applications l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A System for Understanding Imaged Infographics and Its Applications PowerPoint Presentation
Download Presentation
A System for Understanding Imaged Infographics and Its Applications

Loading in 2 Seconds...

play fullscreen
1 / 31

A System for Understanding Imaged Infographics and Its Applications - PowerPoint PPT Presentation


  • 233 Views
  • Uploaded on

A System for Understanding Imaged Infographics and Its Applications Weihua Huang, Chew Lim Tan School of Computing National University of Singapore Outline Introduction Syntactic and semantic information in scientific charts Chart recognition Chart interpretation Applications

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A System for Understanding Imaged Infographics and Its Applications' - jaden


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a system for understanding imaged infographics and its applications
A System for Understanding Imaged Infographics and Its Applications

Weihua Huang, Chew Lim Tan

School of Computing

National University of Singapore

outline
Outline
  • Introduction
  • Syntactic and semantic information in scientific charts
  • Chart recognition
  • Chart interpretation
  • Applications
  • Experiment results
  • Conclusion
introduction
Introduction
  • Information graphics (infographics) are frequently used in various kinds of documents.
  • Recognition and interpretation of infographics is important for automatic document processing and information retrieval.
    • What are the elements/components in

an infographic? Recognition task

    • What does an infographic

try to tell? Interpretation task

  • This paper focus on one type of infographics: scientific charts
introduction4
Introduction
  • Imaged infographics are harder to recognize and interpret:

Because everything is in pixels!

outline5
Outline
  • Introduction
  • Syntactic and semantic information in scientific charts
  • Chart recognition
  • Chart interpretation
  • Applications
  • Experiment results
  • Conclusion
scientific charts

Y-axis ticks

Y-axis end

Y-axis unit

Chart Title

Y-axis label

X-axis end

Origin

X-axis Title

X-axis label

X-axis ticks

Data components

Scientific Charts
  • Syntactic elements:
scientific charts7

Comparison, trend, distribution, etc.

Graphical representation

Intended message

Tabular Data

Scientific Charts
  • Semantic information:
  • Recognition and interpretation is the reverse process
outline8
Outline
  • Introduction
  • Syntactic and semantic information in scientific charts
  • Chart recognition
  • Chart interpretation
  • Applications
  • Experiment results
  • Conclusion
chart recognition

Text/graphics

separation

Edge detection

Text components

The original image

Graphical image

Edge map

Chart Recognition
  • Preprocessing
    • Text/graphics separation: connected component analysis
    • Edge detection: Canny edge detector
chart recognition10
Chart Recognition
  • Graphical symbol construction
    • Vectorization
    • Detection of coordinate lines
      • Geometric constraint between candidate lines
      • Coverage of other lines in the candidate plot area
      • Attachment of text blocks

Edge Map

DSCC

Straight segments

Ellipse fitting

Circular arcs, Elliptic arcs

chart recognition11
Chart Recognition
  • Graphical symbol construction (cont.)
    • Construction of data components
      • Bottom up process with the vectorized edges and intersections
      • Model based parsing rules using the domain knowledge
      • Example:

BarChart = {x-axis, y-axis, BarSet}, where

BarSet = {Bar}, where number of elements ≥ 2 and

Bar = {l1, l2, l3 | l1 ┴ l3, l2 ┴ l3, l3 || x-axis, CE(l1, l3),

CE(l2, l3), EL(l1, x-axis), EL(l2, x-axis)}

Constraints: a || b: line a is parallel to line b.

a ┴ b: line a is perpendicular to b.

CE(a, b): shape a and b share one common endpoint.

EL(a, b): one end point of shape a lies on shape b.

chart recognition12
Chart Recognition
  • Text grouping
    • Yuan’s method to group connected components:
  • Text recognition
    • Omnipage Scansoft Capture SDK 12.0
    • Errors are manually corrected.
chart recognition13
Chart Recognition
  • Sample result:

Green: bars

bar1: (281,249), (345,248), (346,301), (281,302)

Bar2: (430,109), (494,108), (499,298), (435,299)

Bar3: (581,134), (645,132), (648,296), (585,298)

……

Red: axis

X: (239,304) to (994,290)

Y: (239,304) to (236,100)

Type: bar chart

outline14
Outline
  • Introduction
  • Syntactic and semantic information in scientific charts
  • Chart recognition
  • Chart interpretation
  • Applications
  • Experiment results
  • Conclusion
chart interpretation
Chart Interpretation
  • Associating text with graphics
    • Assign syntactic role to each text block
    • Label graphical symbols using the text blocks
    • 11 roles of text in the scientific charts identified
    • The problem is modeled as classification of text blocks
chart interpretation16
Chart Interpretation
  • Associating text with graphics (cont.)
    • To train the classifier and classify a new text block, 4 features are defined:
      • Distance to the nearest graphical symbol
      • Type of the nearest graphical symbol
      • Relative position of the text block and the graphical symbol
      • Type of the text string itself
      • Centricity of a text block
    • Learning algorithm C4.5 is used for building decision tree.
chart interpretation17

D1

D2

Chart Interpretation
  • Obtaining the tabular data
    • Assign label to each data entry if its label is not directly presented.

D1: Distance to nearest label on the left.

D2: Distance to nearest label on the right

If (D1 < D2) label = L1

Else if (D1 > D2) label = L2

Else label = L1 + L2

chart interpretation18

H1

H2

Chart Interpretation
  • Obtaining the tabular data (cont.)
    • Calculate value for each data entry if its value is not directly presented.

H1: Data height

H2: Unit height

Value per unit height: 30

Data value: H1 * 30 / H2

chart interpretation19
Chart Interpretation
  • Generating chart description
    • XML format description
      • Keeping data in the tabular form
      • Good for querying on data value or label
    • Natural language description
      • Fact based sentences generated from templates
      • Good for factoid question
outline20
Outline
  • Introduction
  • Syntactic and semantic information in scientific charts
  • Chart recognition
  • Chart interpretation
  • Applications
  • Experiment results
  • Conclusion
applications
Applications
  • Enriching OCR output
    • Traditional OCR output: Text + Figures
    • The information in figures are not extracted
    • The proposed system helps to extract more information
    • The tabular data obtained can be used to reproduce the document in machine readable form.

(Electronic) (Image format)

applications22

OCR

Electronic text

Imaged text

Segmentation

Layout information

Scanned document

Imaged infographic

The proposed

system

XML description

Document

Reproduction

Applications
  • Enriching OCR output (cont.)
    • Approach:
    • Question: where to insert the infographics?

Clue: Look for the figure number in the text.

applications23
Applications
  • Assisting QA systems
    • Question type 1: factoid question
    • Example: “How many fatalities were there in the year 1984?”
    • Solution: Add the NL description of the infographics into the original text
    • Question parsing and answer extraction: Cui et al’s method based on soft pattern matching
applications24
Applications
  • Assisting QA systems (cont.)
    • Question type 2: query-like question
    • Example: “What is the maximum number of fatalities among all years?”
    • Solution: Translate the question into one of the pre-defined queries
    • Question translation: Semantic parser proposed by Mooney et al
outline25
Outline
  • Introduction
  • Syntactic and semantic information in scientific charts
  • Chart recognition
  • Chart interpretation
  • Applications
  • Experiment results
  • Conclusion
experiment results
Experiment Results
  • Chart recognition and classification: using 200 scientific chart image collected
experiment results27
Experiment Results
  • Text block classification: using 200 scientific chart images collected
experiment results28
Experiment Results
  • Question answering: using 10 scanned document pages from the UW database I
outline29
Outline
  • Introduction
  • Syntactic and semantic information in scientific charts
  • Chart recognition
  • Chart interpretation
  • Applications
  • Experiment results
  • Conclusion
conclusion
Conclusion
  • A system for recognizing and interpreting imaged infographics is introduced.
  • Current focus is on scientific charts, a commonly used type of infographics
  • The system can be generalized to handle more variety of infographics
  • The system can be enhanced to handle more complex layout and special effects etc.
thank you

Thank you!

Questions?