1 / 31

Letters from Descartes in digital format

Letters from Descartes in digital format. An exercise in conversion Dirk Roorda @ eHumanities 2012-01-26. overview. the task the method the lessons the result demo. The Task: converting from . JapAM Descartes Correspondence ca. 700 letters 69,237 lines 600 formulas

olina
Download Presentation

Letters from Descartes in digital format

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Letters from Descartes in digital format An exercise in conversion Dirk Roorda @ eHumanities 2012-01-26

  2. overview • the task • the method • the lessons • the result • demo

  3. The Task: converting from ... JapAM Descartes Correspondence ca. 700 letters 69,237 lines 600 formulas 4.2 MB (without the 311 pictures)

  4. The task: converting to ... CKCC corpus Descartes XML : Text Encoding Initiative (TEI) ~ 35,000 elements, of which 7,200 metadata 7,700 paragraphs 6,200 formulas 6,000 text-formattings 4,200 structure 2,900 page-breaks 538 images

  5. The (re)Sources EJB Metadata EJB ‘s head Google Books

  6. The tools

  7. The method observation non-algorithmic changes consolidation proofs

  8. Observation use digital equipment: -your text-editor -your scripting language -your regular expressions

  9. observation: italic scopes replace =(.*?)$ by <italic>match1</italic> ??? Aargh!#@\€]

  10. observation: greek

  11. non-algorithmic changes

  12. closers: hints

  13. consolidating: metadata conversion process metadata combining

  14. merging meta

  15. proofs: formulas

  16. proofs: formulas in gif

  17. quick formula checking

  18. The anatomy of conversion convert.pl 100 KB of program code text = 25 densely typed pages = 3427 lines of which 2175 real code lines Code/Input = 1/32

  19. Statistics 1/3 of the tasks need 2/3 of the code formulas: (2) 37 % headers, openers, closers: (3) 16 % meta and images: (3) 11 % run time of same tasks formulas: (2) 29 % headers, openers, closers: (3) 6 % meta and images (3) 10 % total run time (25) 40 sec

  20. The tricks of conversion • Unicode is your friend • Split into many subtasks • task = configuration + workflow • Count and check • Performance matters • Do not give up automation

  21. 1. Unicode is your friend

  22. 2. Split into many subtasks (2a) that can be run separately (2b) that can be reordered easily

  23. 3. task = config + workflow

  24. 4. Count and check (ad nauseam)

  25. 5. Performance matters! was 30+ seconds is now 2.07 seconds many new subtasks based on same template (gain = 15 * 30 = 7.5 min per run) many, many runs before everything is OK (gain = 100 * 7.5 = 12.5 hours CPU-time)

  26. 6. Do not give up automation we used a lot of expert knowledge which has all been transferred to • the source • consolidated extra inputs so the conversion is still repeatable and modifiable Thank You conversion program

More Related