processing textual sources for linguistic and literary research what a solitary scholar can do n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Processing Textual Sources for Linguistic and Literary Research: What a 'Solitary Scholar' Can Do PowerPoint Presentation
Download Presentation
Processing Textual Sources for Linguistic and Literary Research: What a 'Solitary Scholar' Can Do

Loading in 2 Seconds...

play fullscreen
1 / 32

Processing Textual Sources for Linguistic and Literary Research: What a 'Solitary Scholar' Can Do - PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on

Alexei Lavrentiev Alexei.Lavrentev@ens-lsh.fr Ecole Normale Supérieure Lettres et Sciences humaines, Lyon, France. Processing Textual Sources for Linguistic and Literary Research: What a 'Solitary Scholar' Can Do. University of Kentucky, October 24 2007. Two projects.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Processing Textual Sources for Linguistic and Literary Research: What a 'Solitary Scholar' Can Do


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Alexei Lavrentiev Alexei.Lavrentev@ens-lsh.fr Ecole Normale Supérieure Lettres et Sciences humaines, Lyon, France Processing Textual Sources for Linguistic and Literary Research:What a 'Solitary Scholar' Can Do University of Kentucky, October 24 2007

    2. Two projects • Scholarly re-edition of a 1861 “Anonymous” folklore collection • Corpus of Medieval French manuscript transcriptions for the study of punctuation

    3. Folklore Project 1/14

    4. Folklore Project 2/14 Project Team • Vera Kuznetsova • Senior Researcher, Institute of Philology SB RAS • Specialist in Russian folklore • Olga Laguta • Professor, Novosibirsk State University • Linguist • Alexei Lavrentiev

    5. Folklore Project 3/14 Objectives • Verify the authenticity of folklore texts in the collection • Analyze linguistic features of the texts • Learn more about the author of the collection • Make these texts available to scholarly community

    6. Folklore Project 4/14 Challenges • Encode data in a sustainable format (TEI XML) using available tools • Microsoft office (Word, Access) • XML processing software (XML Spy) • Perl • Configure the tools for the users with virtually no experience in IT

    7. Folklore Project 5/14 Workflow Metadata Tokenized XML-TEIdocuments Word Documents XSL Stylesheets Perl script Lemmatized XML-TEIdocuments AccessDatabase Printededition Linguistic analysis Vocabularywith contexts

    8. Folklore Project 6/14 Worddocument

    9. Folklore Project 7/14 Metadata file [1. File name] chtochelovekzakhochet ; [номер] 20 ; [2. Заглавие текста (в источнике)] Что человек захочет, то и сделает ; [3. Заглавие текста (рабочее)] Что человек захочет ; [4. Коллектив - редактор электронной версии] Сектор русского языка в Сибири, Институт филологии СО РАН ; [5. Ответственные исполнители] : [функция] Ввод текста и предварительная разметка ; [ФИО] Кузнецова Вера Станиславовна, Алешина Ольга Николаевна ; [функция] Конвертирование в формат XML-TEI, валидация ; [ФИО] Лаврентьев Алексей Михайлович . [6. Информация о проекте] : Корпус текстов русской фольклорной прозы (легенды) ; [7. Информация об источнике] : [Информация о редакторе(ах), составителе(ях) и т.п.] : [функция] подготовка к изданию ; [ФИО] Кузнецова Вера Станиславовна ; [функция] составитель сборника ; [ФИО] аноним ; [функция] автор записи ; [ФИО] не указан . [Место записи] не указано ; [Издательство] типография Ф. Иванова; [Место издания] Санкт-Петербург ; [Год издания] 1861 ; [ISBN] ???? .

    10. Folklore Project 8/14 Perl script • Takes Word document saved in HTML (filtered) format • Takes the metadata • Produces an XML-TEI document • Tokenizes and gives ID to <w> and <s> • Transforms analytical markup into <seg type=“…”> elements

    11. Folklore Project 9/14 XML Document

    12. Folklore Project 10/14 XSLT Stylesheets • Produce legible text for proofreading • Produce tables to be exported to the database

    13. Folklore Project 11/14 Access Database

    14. Folklore Project 12/14 Access Database

    15. Folklore Project 13/14 Access Database

    16. Folklore Project 14/14 Results • Printed edition • Texts • linguistic analysis supplement • indexes • XML-TEI lemmatized text corpus • XSLT stylesheets • Access database • morphological table, • forms for lemmatization and dictionary • Problem: no direct connection between the printed edition and the XML texts

    17. Punctuation Project 1/12 Challenges • Create an adequate representation of linguistically relevant data from a medieval manuscript • Multiple visualizations according to various editing traditions • Annotate and analyze the use of punctuation marks

    18. Punctuation Project 2/12 Project “History” • 1994-1999: first transcriptions using ASCII special characters • 2001: first annotation using Excel • 2003: XML-TEI (Charrette-style) transcriptions • 2005-2007: XML-TEI (Menota-style) transcriptions

    19. Punctuation Project 3/12 “Special” data to be encoded

    20. Punctuation Project 3/12 “Special” data to be encoded • Variant character glyphs

    21. Punctuation Project 3/12 “Special” data to be encoded • Variant character glyphs • Abbreviations

    22. Punctuation Project 3/12 “Special” data to be encoded • Variant character glyphs • Abbreviations • Large initials • “Abnormal” word spacing

    23. Multiple visualizations Punctuation Project 4/12 “Normalized” Presentation [ § 7]  Endementres qu'il parloient einsi si entra laienz uns vaslez qui dist au roi: « Sire noveles vos aport mout merveilleuses. – Queles ? XML Transcription <pn="7"> <lbn="6"/> <wxml:id="w016_0251"> <norm>Endementres</norm> <dipl>ENdementres</dipl> <facs><mdv_dropcapletter="E" color="blue"size="2"sizeAct="2"> E</mdv_dropcap>Ndementre&slong;</facs> </w> <waggl="elision"xml:id="w016_0252"> <norm>qu</norm> <dipl>qu</dipl> <facs>qu</facs> </w> “Diplomatic” Presentation [ § 7]  ENdementres qu'il parloient einsi si entralaienz uns uaslez qui dist au roi. Sire noueles uos aport mout merueilleuses. Queles “Imitative” Presentation [ § 7]  ENdementreſ quıl parloıent eínſı ſı entͣ laıenz unſ uaſlez quı dıſt au roı . Sıre noueleſ uoſ apot mout merueılleuſeſ . Queleſ Extract from Ms.Lyon BM, P.A. 77, Queste del saint Graal, Photo: BM Lyon, Transcription: Graal Project

    24. Punctuation Project 5/12 Encoding choices • “Menota-style” TEI extension • Multiple representation at a word level (norm, dipl, facs, pal?) • Additional elements • punct, mdv_dropcap, mdv_lb… • Additional attributes • w/@aggl, punct/@force...

    25. Punctuation Project 6/12 Workflow • Compact syntax transcription • xml + “shortcut” characters (cf. Wiki) • Text description using Access Database • Ms Description • Text typology • Expanding to a standard XML format using a Perl script • Export to tabular format for annotation • Re-integration of annotation to XML documents • Export and analysis using Weblex software

    26. Punctuation Project 7/12 Compact syntax

    27. Punctuation Project 8/12 Manuscript description

    28. Punctuation Project 9/12 Expanded XML

    29. Punctuation Project 10/12 Annotation

    30. Punctuation Project 11/12 Weblex

    31. Punctuation Project 12/12 Results • 25 fragments of manuscripts transcribed and described • Encoding guidelines • Integrated database of text descriptors (editions and transcriptions) • Perl scripts for conversions • XSLT stylesheets

    32. Thank You!