letters from descartes in digital format
Download
Skip this Video
Download Presentation
Letters from Descartes in digital format

Loading in 2 Seconds...

play fullscreen
1 / 31

Letters from Descartes in digital format - PowerPoint PPT Presentation


  • 89 Views
  • Uploaded on

Letters from Descartes in digital format. An exercise in conversion Dirk Roorda @ eHumanities 2012-01-26. overview. the task the method the lessons the result demo. The Task: converting from . JapAM Descartes Correspondence ca. 700 letters 69,237 lines 600 formulas

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Letters from Descartes in digital format' - olina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
letters from descartes in digital format

Letters from Descartes in digital format

An exercise in conversion

Dirk Roorda

@ eHumanities 2012-01-26

overview
overview
  • the task
  • the method
  • the lessons
  • the result
    • demo
the task converting from
The Task: converting from ...

JapAM

Descartes Correspondence

ca. 700 letters

69,237 lines

600 formulas

4.2 MB (without the 311 pictures)

the task converting to
The task: converting to ...

CKCC corpus Descartes

XML : Text Encoding Initiative (TEI)

~ 35,000 elements, of which

7,200 metadata

7,700 paragraphs

6,200 formulas

6,000 text-formattings

4,200 structure

2,900 page-breaks

538 images

the re sources
The (re)Sources

EJB Metadata

EJB ‘s head

Google Books

the method
The method

observation

non-algorithmic changes

consolidation

proofs

observation
Observation

use digital equipment:

-your text-editor

-your scripting language

-your regular expressions

observation italic scopes
observation: italic scopes

replace

=(.*?)$

by

<italic>match1</italic>

???

Aargh!#@\€]

consolidating metadata
consolidating: metadata

conversion process

metadata combining

the anatomy of conversion
The anatomy of conversion

convert.pl

100 KB of program code text

=

25 densely typed pages

=

3427 lines

of which

2175 real code lines

Code/Input = 1/32

statistics
Statistics

1/3 of the tasks need 2/3 of the code

formulas: (2) 37 %

headers, openers, closers: (3) 16 %

meta and images: (3) 11 %

run time of same tasks

formulas: (2) 29 %

headers, openers, closers: (3) 6 %

meta and images (3) 10 %

total run time (25) 40 sec

the tricks of conversion
The tricks of conversion
  • Unicode is your friend
  • Split into many subtasks
  • task = configuration + workflow
  • Count and check
  • Performance matters
  • Do not give up automation
2 split into many subtasks
2. Split into many subtasks

(2a) that can be run separately

(2b) that can be reordered easily

5 performance matters
5. Performance matters!

was 30+ seconds

is now 2.07 seconds

many new subtasks based on same template

(gain = 15 * 30 = 7.5 min per run)

many, many runs before everything is OK

(gain = 100 * 7.5 = 12.5 hours CPU-time)

6 do not give up automation
6. Do not give up automation

we used a lot of expert knowledge

which has all been transferred to

  • the source
  • consolidated extra inputs

so the conversion is still repeatable and modifiable

Thank You

conversion program

ad