1 / 29

More Text Encoding Initiative (TEI)

More Text Encoding Initiative (TEI). 6/30 XML + XSLT for Libraries. Today. Basic anatomy of TEI Capturing the structure of source documents Capturing more than the structure Building personographies Using TEICorpus In class continue Assignment 5: Mark up digital texts in TEI.

arella
Download Presentation

More Text Encoding Initiative (TEI)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. More Text Encoding Initiative (TEI) 6/30 XML + XSLT for Libraries

  2. Today Basic anatomy of TEI Capturing the structure of source documents Capturing more than the structure Building personographies Using TEICorpus In class continue Assignment 5: Mark up digital texts in TEI

  3. Basic anatomy of TEI • <TEI> is the root element • <teiHeader> - where the metadata about the digital document you are creating goes • this element is similar to <eadheader> in EAD • <text> - where the transcription of the source document is captured

  4. Required elements of <teiHeader> • <fileDesc> - a wrapper element for capturing these required elements: <titleStmt> - title of your TEI document (not the original document you are transcribing) <publicationStmt> - for publication information about your TEI document <sourceDesc> - for describing the original document you are transcribing

  5. <teiHeader> examples • While there are several required elements inside <teiHeader>, the structure of these elements is pretty flexible • A less structured example that uses <p> tags: http://slis.uiowa.edu/~jlee/239/sampledocs/sampleTEIbook.xml • A more structured example that uses more detailed tags such as <msIdentifier>: http://slis.uiowa.edu/~jlee/239/sampledocs/NoblePostcardsTEI.xml

  6. Capturing the structure of your source document

  7. Determining the level of your markup • We will be transforming our TEI documents to web display as HTML. • The more structure you capture in your transcription, the more flexible your display options will be later.

  8. The <text> element • <text> contains a single text of any kind • You decide the scope of the <text> element • A poem? • A play? • An essay? • A collection of essays?

  9. The <div> element • Within <text>, <div> is used to describe some discrete structure of the source document • You decide what <div> should represent: • One poem? One stanza of a poem? • One book? One chapter?

  10. Sample <div> structure • In this example,<div> represents one chapter: <text> <body> <div> <head type="chapter">Chapter 1</head> <p>In this chapter, we will focus on….</p> </div> <div> <head type="chapter">Chapter 2</head> <p>In chapter one, you learned….</p> </div> </body> </text>

  11. The <group> element • For more complex source documents, use <group> tags to capture a series of <text> elements • For example, encoding a book of poems and using <text> for each poem and <div> to capture stanzas • <text>  <front> <!-- biographical notice by editor -->  </front>  <group>    <text> <!-- first poem -->    </text>    <text> <!-- second poem -->    </text>  </group></text>

  12. The <ab> element • The anonymous block element, <ab>, is used to encode a discrete chunk of text • It is generally used to describe paragraph-like elements, like <p> tags in HTML

  13. Encoding line breaks • To retain original breaks in texts: • encode them with line break <lb/> elements within anonymous block <ab> elements <ab>Line one of text <lb/> Line two of text</ab> • encode them with separate <ab> elements <ab>This is the first paragraph…</ab> <ab>This is the second paragraph…</ab>

  14. Encoding more than the structure of your source document…

  15. Capturing images • To include an image of the source document, use the <facsimile> element before <text> element: <facsimile> <graphic url="http://digital.lib.uiowa.edu/u?/noble,1184"/> </facsimile> *The URL points to a publicly accessible image file

  16. Identifying names Use <name>, <orgName>, or <persName> element anywhere within the transcription <div> <p>As I haven't time to write a letter I will just drop you a postal. How is <persName>Hattie</persName>? I have got a cold but that's all. this postal is kinda dirty but I got cause it is just what we will do isn't it. Just wait we'll let them know you're not dead. ha ha</p> <signed>bye. <persName>Golda</persName></signed> </div>

  17. Identifying places • <placeName> for geo-political place names • <placeName>Rochester, NY</placeName> • <placeName><settlement type="city">Rochester</settlement>,<region type="state">New York</region></placeName> • <geoName> for places named in terms of geographic features such as mountains, lakes, or rivers, independently of geo-political units • <geogName type="river">Mississippi River</geogName>

  18. Identifying dates • <date> contains a date in any format • <time> contains a phrase defining a time of day in any format. • the attribute @when normalizes the date or time in a standard form, e.g. yyyy-mm-dd. • <date when="1945-10-24">24 Oct 45</date> • <date when="1996-09-24T07:25:00Z">September 24th, 1996 at 3:25 in the morning</date> • <time when="1999-01-04T20:42:00-05:00">Jan 4 1999 at 8 pm</time>

  19. Other elements can record date + time information • Normalized dates and times can be expressed for other elements through attributes • A complete table of “date-able” elements: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-att.datable.html • For example: <birth when=“1981-01-23”>January 23, 1981</birth>

  20. Expressing date spans and ambiguous dates • @notBefore specifies the earliest possible date for the event • @notAfter specifies the latest possible date for the event • @from indicates the starting point of the period • @to indicates the ending point of the period <residence notBefore-iso="1907-09-09" notAfter-iso="1910-09-06"></residence>

  21. Elements applicable to correspondence • <opener> groups together dateline, byline, salutation, and similar phrases appearing as a preliminary group at the start of a division, especially of a letter. • <closer> groups together salutations, datelines, and similar phrases appearing as a final group at the end of a division, especially of a letter. • <dateline> contains a brief description of the place, date, time, etc. of production of a letter, newspaper story, or other work, prefixed or suffixed to it as a kind of heading or trailer. • <salute> contains the salutation in the opening/closing of a letter, preface, etc. • <signed> contains the closing signature

  22. Sample use of <opener> and <closer> • <div type="letter" n="14"> <head>Letter XIV: Miss Clarissa Harlowe to Miss Howe</head><opener><dateline>Thursday evening, March 2.</dateline></opener> <p>On Hannah's depositing my long letter ...</p> <p>An interruption obliges me to conclude myself   in some hurry, as well as fright, what I must ever be,</p><closer><salute>Yours more than my own,</salute><signed>Clarissa Harlowe</signed></closer></div> • (Taken from http://www.tei-c.org/release/doc/tei-p5-doc/en/html/DS.html#DSOC)

  23. Building a personography • A personography is a list of normalized biographical data about persons tagged in your TEI document • It can be referenced in multiple TEI documents • It can be used to enhance search + browse tools

  24. The <listperson> element • Personographies are contained within <sourceDesc> in the header • @xml:id is used to uniquely identify a person <listPerson> <person> <persName xml:id="HJ"><forename>Hattie</forename> <surname>Jacobs</surname></persName><sex>female</sex><residence notBefore-iso="1907-09-09" notAfter-iso="1910-09-06"></residence></person> </listPerson>

  25. Referencing personography data in the transcription • Use @ref to refer to the @xml:id you assigned to that person <address> <addrLine> Miss <persName ref="#HJ">Hattie Jacobs</persName> </addrLine> <settlement>Madrid</settlement> <region>Iowa</region></address>

  26. Other global lists • Similarly, you can use @xml:id create a global list of other elements • <listPlace> • <listOrg> • <listBibl> • <listEvent>

  27. Using <teiCorpus> • <teiCorpus> can be used as a wrapper root element for multiple <TEI> documents • <teiCorpus> has its own global header for capturing metadata about all of the <TEI> documents it contains • Example – postcards: http://slis.uiowa.edu/~jlee/239/sampledocs/NoblePostcardsTEI.xml

  28. Take a break

  29. In class • Continue Assignment 5: Mark up digital texts in TEI • If you have finished encoding the basic structure in your TEI documents: • try enhancing your markup with name, date, and place information • try nesting your TEI documents within one <teiCorpus> document • try building a personography

More Related