1 / 24

Representing dictionaries with the TEI

Representing dictionaries with the TEI. Proposal for basic guidelines Laurent Romary - Max Planck Digital Library With the help of Susanne Alt - CNRS. Background. The P5 edition of the TEI guidelines XML ODD - Roma Modules and classes DTD, RelaxNG, W3C schemas The dictionary chapter

jewell
Download Presentation

Representing dictionaries with the TEI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Representing dictionaries with the TEI Proposal for basic guidelines Laurent Romary - Max Planck Digital Library With the help of Susanne Alt - CNRS

  2. Background • The P5 edition of the TEI guidelines • XML • ODD - Roma • Modules and classes • DTD, RelaxNG, W3C schemas • The dictionary chapter • Very close to the P4 version • Work to be done • Enhancing the coherence with the class system • Providing more examples • …

  3. Proposal for today • Browse through the main features of the dictionary chapter • Identify questionable issues • Select best practices • Work with Roma and implement (part of) the best practices • Minimal schema that dictionary project can start with • Bottom approach to customization • Discuss about conformance

  4. Dictionaries as TEI documents • Same general document structure as any other TEI document • <teiHeader>, <text> • Define a common strategy concerning source identification with general text sources • Specific documentation of previous editions • Intuition that <teiCorpus> is not to be retained here • <front>, <body>, <back> • Divisions… • Strong case for unnumbered <div>s • Can we recommend/implement a basic dictionary oriented typology?

  5. Issues [see Wuerzburg.xml] • Providing precise guidelines for • <publicationStmt> • Elicit the role and possible content of <publisher> • <sourceDesc> • Base the guidelines on <biblStruct> (<biblItem>?) and <listBibl>

  6. Describing dictionary entries • A variety of possible objects • <entry>, <entryFree> <superEntry>, <dictScrap> • <hom>, <re> • First issue: dealing with the editorial workflow • Keep <dictScrap> for ongoing tagging activity • depends on the degree of structure of the dictionary • Stay consistent in the use of entry/entryFree/superEntry/hom • Strong feeling for limiting ourselves to <entry> • Point to the importance of <re> • Embedded entries

  7. Finding the right granularity • The core lexical unit: <entry> • Should be used coherently in a dictionary project to gather up homogenous lexical objects • Possible combination with: • <superEntry> to group sets of homographs • Should only be used to record such a feature when it exists in legacy data • Should be avoided for new editorial projects • <hom> to subdivide senses in groups of homonyms

  8. Example • Recording a series of homographs with <superEntry> <body> <entry/> <entry/> <superEntry> <entry type="hom" n="1"/> <entry type="hom" n="2"/> </superEntry> </body> • Issues • Values of ‘n’ attribute according to the source • Values of type defined in ‘att.entryLike’

  9. Example • Recording a series of homographs with <hom> <entry> <hom n="1"> <sense n="1"/><sense n="2"/> </hom> <hom n="2"> <sense n="1"/><sense n="2"/><sense n="3"/> </hom> </entry> • Issues • Weak boundary between polysemes and homonyms • Why not just have separate entries?

  10. From word to senses… • Background • Semasiological vs. onomasiological views on lexical data • Two complementary data organisations • Two sets of standards • In ISO: TMF (ISO 16642) vs. LMF • In the TEI: Terminology vs. Print dictionary chapters

  11. The LMF Model Lexical DB 1..1 1..1 1..1 0..n Global Info Lexical Entry 1..1 1..1 0..n 1..1 0..n Sense Form 1..1

  12. Consequences for dictionaries • Strong <form> to <sense> orientation • <form> qualifies the entry, with the identification of the headword and its morphological variations • <sense> is subordinated to the choice made for <form> • Role of grammatical information • Overall qualification of the entry • Qualification of morphological variants • Issue • <re> does not necessarily fit into the theory

  13. Example • Basic structure of an <entry> <entry> <form> <orth>chat</orth> </form> <sense> <def>Petit animal familier</def> </sense> </entry>

  14. Representing form and grammar • General issues • Multiple forms • <orth>, <pron>, etc. • Compounds • May be represented using embedded forms • Role of grammar (<gramGrp>) • In isolation: qualifies the entry • Within a form: marks special features associated with the form • Inflexions • Can be reprensented by means of additional <form>’s

  15. Example • A simple entry <entry> <form> <orth>chat</orth> <pron>∫a</pron> </form> <gramGrp> <pos>N</pos> <gen>f<gen> </gramGrp> </entry>

  16. Example • Simple entry with inflected form <entry> <form type=“lemma”> <orth>chat</orth> </form> <gramGrp> <pos>N</pos> <gen>m</gen> </gramGrp> <form type=“inflected”> <orth>chats</orth> <gramGrp> <number>p</number> </gramGrp> </form> </entry>

  17. <form>: the case of the Campe dictionary • Step 1: Dealing with the presence of determiners <form type=“lemma”> <form type=“determiner”> <orth>Das</orth> </form> <form type=“headword”> <orth>Aak</orth> </form> </form>

  18. <form>: the case of the Campe dictionary • Step 2: adding grammatical information <form type=“lemma”> <form type=“determiner”> <orth>Das</orth> <gramGrp> <pos value=“D”/> <gen>n</gen> </gramGrp> </form> <form type=“headword”> <orth>Aak</orth> <gramGrp> <pos>N</pos> <gen>n</gen> </gramGrp> </form> </form>

  19. <form>: the case of the Campe dictionary • Step 3: dealing with inflected forms <form type=“inflected”> <form type=“determiner”> <orth>des</orth> <gramGrp>…</gramGrp> </form> <form type=“headword”> <orth><oVar><oRef/>-es</oVar></orth> <gramGrp> <case value=“G”>G</case> </gramGrp> </form> </form>

  20. Main arguments for the proposed changes • Coherent use of <form> and <orth> • Accounts for a coherent access to orthographic information in form/orth • Coherent use of grammatical features • Danger of tag abuse with • <gram type=“art_n”>Das</gram> • ‘type’ attribute should indicate a grammatical feature • <gram> content should be the value of that feature • Non differentiation of features (art_n -> pos + gen)

  21. <sense>: main components • Core elements • <def>: to provide the definition • <dicteg> • Need to establish guidelines on the identification of sources • <etym>: a complex issue…

  22. Documentation des exemples <dicteg><q>Ta gamine est assise trop <oRef/>, elle ne dépasse pas de la table.</q></dicteg> • <dicteg><cit> • <q>Ta gamine est assise trop <oRef/>, elle ne dépasse pas de la table.</q> • <bibl>Benoit M., Michel C., Le Parler de Metz...</bibl> • </cit></dicteg> • <dicteg> • <cit> • <q>Ta gamine est assise trop <oRef/>, elle ne dépasse pas de la table.</q> • <biblStruct> • <author>BENOIT M, MICHEL C.</author> • <title>Le Parler de Metz et du pays messin</title> • <imprint> • <pubPlace>Metz</pubPlace> • <publisher>Serpenoise</publisher> • <date>2001</date> • <biblScope>p. 38</biblScope> • </imprint> • </biblStruct> • </cit> • </dicteg>

  23. A quick glimpse into Roma • A journey in three steps • Adding the PD module and generating a schema • Checking out elements • Expressing constraints on specific values

  24. Final discussion • What is it, being TEI conformant?

More Related