1 / 48

The Guidelines (P5) of the Text Encoding Initiative (TEI)

The Guidelines (P5) of the Text Encoding Initiative (TEI). Laurent Romary Max Planck Digital Library With many thanks to Lou Burnard (OUCS). In the beginning. Michael Sperberg-McQuen. Lou Burnard. Antonio Zampolli. 1. Novembre 1987: Vassar College, Poughkeepsie.

gonzalesd
Download Presentation

The Guidelines (P5) of the Text Encoding Initiative (TEI)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Guidelines (P5) of the Text Encoding Initiative (TEI) Laurent Romary Max Planck Digital Library With many thanksto Lou Burnard (OUCS)

  2. In the beginning Michael Sperberg-McQuen Lou Burnard Antonio Zampolli 1. Novembre 1987: Vassar College, Poughkeepsie Seite 1

  3. Digitizing source documents Further work on documents The basic scenario? Seite 2

  4. material transcriptions of early printed books organization library metaphor technologies XSLT rendering in the browser; simple search via php TEI source visible? Yes, via view source Early America’s Digital Archive http://www.mith2.umd.edu/eada/ Seite 3

  5. material multiple recensions from different traditions of the Buddhist Canon organization aligned texts for each sutra technologies Cocoon, eXist, Xquery TEI source visible? yes Samyukta Agama http://buddhistinformatics.chibs.edu.tw/BZA/ Seite 4

  6. Can scientists bear standards? • Standards are essentially bad for scientists • Freezing knowledge • Making one lose time that could be dedicated to research • Forcing diverging views to agree • …especially if the work is done by others • A positive view on standards • Documenting data • Giving semantics to data • Pooling data from various origins • Allowing interoperability of tools • A possible answer • Standards as specification platforms • Does the TEI provide you with this? Seite 5

  7. An overview of the TEI elements

  8. Basic structure(s) • Every TEI-conformant document comprises a header followed by (at least one) text • the header contains: • mandatory file description • optional encoding, profile and revision descriptions • the header is essential for: • bibliographic control and identification • resource documentation and processing Seite 7

  9. Structure of a TEI text • In the simplest case, a text just consists of paragraphs or clauses or verse lines • A TEI text has a little more structure: it contains • optional front matter • optional back matter • a body Seite 8

  10. The body of a text usually has divisions • usually nested one with another • the type attribute labels a particular level e.g. as "part" or "chapter“ • the n attribute gives a particular division a name or number • the xml:id attribute gives a particular division a unique identifier Seite 9

  11. For example... <text> <front> <!-- titlepage, etc here --> </front> <body> <div type="book" n="I" xml:id="JA0100"> <head>Book I.</head> <div type="chapter" n="1" xml:id="JA0101"> <head>Of writing lives in general...</head> <!-- remainder of chapter 1 here --> </div> <div n="2" xml:id="JA0102"> <!-- chapter 2 here --> </div> <!-- remainder of book 1 here --> </div> <div type="book" n="II" xml:id="JA0200"> <!-- book 2 here --> </div> <!-- remaining books here --> </body> </text> Seite 10

  12. TEI global attributes • The attribute class att.global defines these for all elements: • xml:id supplies a unique identifier • n supplies a (non-unique) name or number • rend gives a suggestion about rendition (appearance) • xml:lang identifies the language using an ISO standard code • The linking module extends this class with: • corresp, synch, ana for specific association types • next, prev for aggregating fragmented elements Seite 11

  13. Text components What are divisions composed of? • prose is mostly paragraphs (p) • verse is mostly lines (l), sometimes in hierarchic groups (lg) • drama is mostly speeches (sp) containing p or l elements interspersed with stage directions (stage) These may be mixed, and may also appear directly within undivided texts .... but divisions can also contain embedded text or quote elements. Seite 12

  14. For example <div type="book"> <l>Of Man's first disobedience, and the fruit</l> <l>Of that forbidden tree whose mortal taste</l> <l>Brought death into the World, and all our woe,</l> <l>With loss of Eden...</l> .... </div> <lg type="haiku"> <l n="1">Summer grass —</l> <l n="2">all that's left</l> <l n="3">of warriors' dreams</l> </lg> Seite 13

  15. For example <stage>Enter Barnardo and Francisco, two Sentinels,at several doors</stage> <sp who="Barnardo"> <l part="f">Who's there? </l> </sp> <sp who="Francisco"> <l>Nay, answer me. Stand and unfold yourself.</l> </sp> <sp who="Barnardo"> <l part="i">Long live the king! </l> </sp> <sp who="Francisco"> <l part="m">Barnardo? </l> </sp> <sp who="Barnardo"> <l part="f">He. </l> </sp> Seite 14

  16. not to mention <p>.... And he wrote on one side of the paper: <quote> <p>HELP!</p> <p>PIGLIT (ME)</p> </quote> and on the other side: <quote>IT'S ME PIGLIT, HELP HELP</quote> </p> <p>Then he put the paper in the bottle... </p> Seite 15

  17. What are speeches, paragraphs, and lines made of? • phrases that are conventionally typographically distinct • “data-like” (names, numbers, dates, times, addresses) • editorial interventions (corrections, regularizations, additions, omissions ...) • cross references and links • lists, notes, graphics, tables, bibliographic citations... • all kinds of annotations! Which of these you need to markup will depend on your research agenda Seite 16

  18. for example... <head>Of writing lives in general,and particularly of <title>Pamela</title>, with a word by the bye of <name key="#CIBC03">Colley Cibber</name> and others.</head> <p>It is a trite but true observation, that <q>examples work more forcibly on the mind than precepts</q> </p> <p> <name key="#JA">Mr. Joseph Andrews</name>, <rs>the hero of our ensuing history</rs>, was esteemed to be ...</p> Seite 17

  19. Direct speech • Use the who attribute to show speakers • Speeches can be nested in other speeches <q who="Wilson"> Spaulding, he came down into the office just this day eight weeks with this very paper in his hand, and he says: <q who="Spaulding">I wish to the Lord, Mr. Wilson, that I was a red-headed man.</q> </q> Seite 18

  20. Foreign language phrases • The xml:lang attribute may be attached to any element • Use <foreign> if nothing else is available • Use ISO 639-2 code to identify language Have you read <title xml:lang="deu">Die Dreigroschenoper</title>? <mentioned xml:lang="fra">Savoir-faire</mentioned> is French for know-how. John has real <foreign xml:lang="fra">savoir-faire</foreign>. Seite 19

  21. ‘Inter’ class elements • <list> • lists of all kinds • <note> • notes (authorial or editorial) • <figure> • pictures or figures • <table> • tables • <bibl> • bibliographic descriptions Seite 20

  22. for example... <list type="xmas"> <label>For my true love</label> <item> <list type="bullets"> <item>three calling birds></item> <item>two french hens</item> <item>a partridge in a pear tree</item> </list> </item> <label>For Uncle Joe</label> <item>socks as usual</item> </list> Seite 21

  23. Example <figure> <head>Mr Fezziwig's Ball</head> <figDesc>A Cruikshank engraving showing Mr Fezziwig leading a group of revellers.</figDesc> <graphic url="fezz.gif"/> </figure> Seite 22

  24. Feeling overwhelmed? All of this is just one way of looking at the TEI. 1. The TEI is a modular system: you use it to build an encoding scheme appropriate to your needs, by selecting specific modules 2. Each module defines a group of elements and attributes 3. Elements are classified structurally and semantically Define your goals before using the TEI! Seite 23

  25. Some other modules Your choice from: 1. Transcription of spoken texts 2. Dictionaries and lexica 3. Varieties of linguistic annotation 4. Nonstandard characters and glyphs 5. Linking, alignment, non-hierarchic structures 6. Detailed metadata (the TEI Header) 7. Manuscript Description 8. Text-critical apparatus 9. Physical description 10. Onomastics and ontologies 11. The ODD system Seite 24

  26. Customizing the TEIHow do you use the TEI modular structure?

  27. The global TEI architecture Seite 26

  28. Following the TEI spirit Conformance to the TEI means: • Sharing a common text encoding culture • Sharing the same vocabulary (when applicable) • Allowing user autonomy in defining modifications (extensions, customization), but sharing the mechanisms to do so The TEI gives you a lot of help in following these rules. Seite 27

  29. Important concepts The TEI's literary programming with ODD (One Document Does it all) provides: • Schema specification • User oriented documentation • Modularity: all specifications pertaining to a coherent sub-domain of the TEI • Classes: identifying shared behaviours or semantics • Extensibility: a consequence of the above mechanisms Seite 28

  30. The TEI ODD in practice The TEI Guidelines, its schema, and its schema fragments, are all produced from a single XML resource containing: 1. Descriptive prose (lots of it) 2. Examples of usage (plenty) 3. Formal declarations for components of the TEI Abstract Model: • elements and attributes • Modules • classes and macros Seite 29

  31. Possibilities of customizing the TEI The TEI has over 20 modules. A working project will: • Choose the modules they need • Probably narrow the set of elements within a module • Probably add local datatype constraints • Possibly add new elements • Possibly localize the names of elements Seite 30

  32. Quick and simple access to the TEI Imagine that you have seen your colleague next door doing some encoding with the TEI and want to do the same thing: • Go to Roma at http://tei.oucs.ox.ac.uk/Roma/ • Generate a schema [Schema] • Make a trial with the editor, creating a simple document • Get back to Roma and make basic documentation Seite 31

  33. Roma:Start Seite 32

  34. Roma:Schema Select Seite 33

  35. Roma:Generate Doc Seite 34

  36. Subsetting the TEI Suppose you now feel you want to use some more of the TEI, but not all of it • Go to Roma... • Look at [Modules] • Explore default modules by pointing to main elements (by order of interest). You can throw away most things, but • In textstructure, you should really keep TEI, text, body and div • In core, most people need p, q, list, pb and head • From header, keep everything unless you really understand the details • Start checking out elements • Make editorial choices (numbered vs. unnumbered head’s) Seite 35

  37. Roma: Modules Seite 36

  38. Roma: Change Module Seite 37

  39. Adding TEI objects You can add your own elements and attributes. But • make very sure you are not just making something which is syntactic sugar for an existing TEI concept • do not rename existing elements - you can do that directly in ODD • if you want facilities from a very different field of discourse, such as maths or vector graphics, use the existing standards in that area • consider interoperability Seite 38

  40. Roma: Add Element Seite 39

  41. Under the hood TEI customizations are themselves expressed in TEI XML, using elements from the tagdocs module. For example: <schemaSpec ident="myTEIlite"> <desc>This is TEI Lite with simplified heads</desc> <moduleRef key="tei"/> <moduleRef key="core"/> <moduleRef key="textstructure"/> <moduleRef key="header"/> <moduleRef key="linking"/> <elementSpec ident="head" mode="change"> <content> <rng:text/> </content> </elementSpec> </schemaSpec> produces something like TEI Lite, with a slight change Seite 40

  42. What does an ODD look like? <elementSpec module="spoken" ident="pause"> <classes> <memberOf key="model.divPart.spoken"/> <memberOf key="att.timed"/> <memberOf key="att.typed"/> </classes> <content> <rng:empty/> </content> <attList> <attDef ident="who" usage="opt"> <gloss>A unique identifier</gloss> <desc>supplies the identifier of the person or group pausing. Its value is the identifier of a <gi>person</gi> or <gi>persGrp</gi> element in the TEI header. </desc> <datatype> <rng:ref name="data.pointer"/> </datatype> </attDef> </attList> <desc>a pause either between or within utterances.</desc> </elementSpec> Seite 41

  43. ... from which we generate element pause { pause.content, pause.attributes } pause.content = empty pause.attributes = att.global.attributes, att.timed.attributes, att.typed.attributes, att.ascribed.attributes, model.divPart.spoken |= pause att.timed |= pause att.typed |= pause att.ascribed |= pause Seite 42

  44. .. or <!ELEMENT %n.pause; %om.RR; EMPTY> <!ATTLIST %n.pause; %att.global.attributes; %att.timed.attributes; %att.typed.attributes; %att.ascribed.attributes;> <!ENTITY % model.divPart.spoken "%x.model.divPart.spoken; %n.event; | %n.kinesic; | %n.pause; | %n.shift; | %n.u; | %n.vocal; | %n.writing;"> Seite 43

  45. ... and, indeed Seite 44

  46. MPDL CoLaboratory (MPDL CoLab) • Platform for community building and knowledge exchange • Aim: • improve exchange of explicit knwoledge and make tacit and individual know-how explicit • Supports community-building processes • Connects people with similar fields of interest and goals • within the MPS: MPDL, librarians, scientists • Outside: underlying basis of our national and international collaborations • Provide information about existing standards and best practices in the domain of supporting scientific life cycles • Ensuring long-term compatibility between local and centralized initiatives within the MPDL Seite 45

  47. Seite 46

  48. Exploring TEI P5 • Visit http://www.tei-c.org/release/doc/tei-p5-doc/html/ • alphabetical lists of classes, macros, elements • each chapter describes a distinct module • each module presents a semantically related list of elements, with examples of their use Feedback and advice available to all on tei-l@listserv.brown.edu Seite 47

More Related