1 / 14

Digital encoding of text

Digital encoding of text. Tomaž Erjavec. Scholarly digital editions of Slovenian literature http://nl.ijs.si/e-zrc/. Content provider: Institute of Slovenian L iterature – S cientific research centre of the Slovenian A cademy of S ciences and A rts , Ljubljana

cain
Download Presentation

Digital encoding of text

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital encoding of text Tomaž Erjavec

  2. Scholarly digital editions of Slovenian literaturehttp://nl.ijs.si/e-zrc/ Content provider: Institute of Slovenian Literature – Scientific research centre of the Slovenian Academy of Sciences and Arts, Ljubljana Technology provider:Department of Knowledge TechnologiesJožef Stefan Institute, Ljubljana

  3. Freising Manuscripts (FM): • Three religious texts: • FM I: a confession form • FM II: a homily on penitence and remission • FM III: a confession form • Provenance: Upper Carinthia or Freising(Austria, Germany) • Place of use: Carinthian estates of the Freising diocese • Written after 27 May, 972; not after 1023

  4. The history of the Freising Manuscripts • Discovered by B. J. Docen in 1806 in the Munich State Library • Many printed editions since then • First diplomatic transcription 1827 by P. Köppen & A. H. Vostokov, Sanktpeterburg  Critical edition by Slovenian Academy of Sciences 1992, 1993, 2004

  5. The printed edition 2004 – our source, containing: • Diplomatic transcription with apparatus, comparing 9 older DT • Critical transcription with apparatus,comparing 13 older CT • Phonetic transcription in IPA, with apparatus • Translations into Latin and 3 modern languages • Dictionary of all words in the CT, with PT, the 4 translations + Old Church Slavonic, and examples (concordances) • Bibliography, with 600+ items • Introductions

  6. The goal of e-edition: to gather the 200-years history of FM editions • Annotated text of all major transcriptions so far:the history of understanding • Alignment of all 16 transcriptions and translations:understanding through comparison • Sound recording added to phonetic transcription:understanding through experiencing • Addition of translations: Polish, Italian understanding for non-Slovenian speakers • Integration of materialsunderstanding for all

  7. Production of the e-edition • Electronic original: a local editor format or re-keyed Word files • Conversion: dedicated Perl and XSLT filters • Target format: the Text Encoding Initiative Guidelines P4 • View format: XSLT transform into HTML • Rapid prototyping and a cyclical process of refinement

  8. Challenging issues • Complex characters, e.g. (ZRCola font: http://zrcola.zrc-sazu.si/) • Adding speech into the e-edition(manual segmentation, errors in the originals, inserting phrase & sentence boundaries into parallel views) • Dictionary conversion(idiosyncratic format, complex structure, difficult cross-references)

  9. Examples:The TEI encoded phonetic transcription

  10. BS Dictionary

  11. BS Bibliography

  12. BS basic parallel view

  13. Further work in finishing the BS eEdition • TEI header (Slovene + English, also HTML view) • Better treatment of PUA characters(documented in header, fallback) • Resolving outstanding content issues • Better overall structure and linking

  14. Further work:general goals • Incorporating language technologies into the eEditions (concordancing, lemmatisation, part-of-speech tagging) • Adaptable Web interface for viewing (select what and how to see: corrections, emendations, notes, facsimile) • Accessing and connecting the e-library as a whole (cataloguing, searching)

More Related