1 / 46

Beyond TEI-Lite...

Beyond TEI-Lite. An overview of various TEI pizze. The TEI base modules. Prose Verse Drama Transcribed speech Print dictionaries Terminological databases. The Verse base module. Adds numbered <lg1>, <lg2> etc. to the <lg> in the core (by analogy with <div> and <div1>)

Download Presentation

Beyond TEI-Lite...

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.


Presentation Transcript

  1. Beyond TEI-Lite... An overview of various TEI pizze

  2. The TEI base modules • Prose • Verse • Drama • Transcribed speech • Print dictionaries • Terminological databases

  3. The Verse base module • Adds numbered <lg1>, <lg2> etc. to the <lg> in the core (by analogy with <div> and <div1>) • Additional attributes met, real, and rhyme for metrical and rhyme analysis • Metrical notations may be defined by <metDecl> element in the Header

  4. Line groups <lg type="stanza"><lg type="sestet"> <l>In the first year of Freedom's second dawn</l> <l>Died George the Third; although no tyrant, one</l> <l>Who shielded tyrants, till each sense withdrawn</l> <l>Left him nor mental nor external sun:</l> <l>A better farmer ne'er brushed dew from lawn,</l> <l>A worse king never left a realm undone!</l></lg> <lg type="couplet"> <l>He died &mdash; but left his subjects still behind, </l> <l>One half as mad &mdash; and t'other no less blind. </l> </lg></lg> <lg1 type="stanza"><lg2 type="sestet"> <l>In the first year of Freedom's second dawn</l> <l>Died George the Third; although no tyrant, one</l> <l>Who shielded tyrants, till each sense withdrawn</l> <l>Left him nor mental nor external sun:</l> <l>A better farmer ne'er brushed dew from lawn,</l> <l>A worse king never left a realm undone!</l></lg2> <lg2 type="couplet"> <l>He died &mdash; but left his subjects still behind, </l> <l>One half as mad &mdash; and t'other no less blind. </l> </lg2></lg1>

  5. Metrical analysis <lg1 met="-+-+-+/"> <l real="-+-+-++"> This morn, thy gallant bark, love,</l> <l>Sail'd on the sunny sea;</l> <!-- … --> </lg>

  6. Rhyme <lg rhyme='AB-BBA'> <l>The sunlight on the garden</l> <l>Hardens and grows cold, </l> <l>We cannot cage the minute</l> <l>Within its nets of gold</l> <l>When all is told</l> <l>We cannot beg for pardon. </l> </lg> <lg rhyme='AB-BBA'> <l>The sunlight on the <seg id="A1">garden</seg></l> <l><seg id="A2">Harden</seg>s and grows <seg id=B1>cold, </seg></l> <l>We cannot cage the <seg id="C1">minute</seg></l> <l>Wi<seg id="C2">thin it</seg>s nets of <seg id="B2">gold </seg></l> <l>When all is <seg id="B3">told</seg></l> <l>We cannot beg for <seg id="A3">pardon</seg>.</l> </lg> <linkGrp type=rhyme> <link targets='A1 A2 A3'> <link targets='B1 B2 B3'> <link targets='C1 C2'> </linkGrp>

  7. The Drama base module adds... • Specialised elements to front matter: • <set>, <prologue>, <epilogue>, <performance> • <castList>, <castItem>, <role>, <roleDesc>,<actor> • Stage business: <move> • Specialised elements for film or tv scripts • <view>, <camera>, <caption>, <sound>

  8. Cast lists <castList><head>ACTEURS.</head> <castItem><role id="JUP">JUPITER</role> <roleDesc> Arlequin.</roleDesc></castItem> <castItem><role id="MER">MERCURE</role> <roleDesc> Scaramouche.</roleDesc></castItem> <castItem><role>ISABELLE.</role></castItem> <castItem><role>PIERROT.</castItem> <castItem><roleDesc>OMBRES qui sortent des Enfers.</roleDesc></castItem> <castItem><roleDesc>QUATRE FURIES.</roleDesc></castItem> <castItem><role>PLUTON.</role></castItem> <castItem><roleDesc>Plusieurs Danseurs &amp; Danseuses.</roleDesc></castItem></castList>

  9. Speeches and stage directions <sp who="JUP"><speaker>JUPITER</speaker> <l>Si ma Ma&icirc;tresse est infidelle,</l> <l>Je veux en &ecirc;tre convaincu,</l> <l>Mercure, ce soir avec elle</l> <l>T&acirc;che de me faire cocu.</l></sp> <stage>Mercure fait plusieurs lazzy, &amp; lui fait entendre que sa Ma&icirc;tresse est dans l'empire de Pluton. ... Quatre Furies sortent aussi des Enfers, qui dansent. Mercure dit &agrave; Pluton.</stage> <sp who="MER"><l>Pluton, faites-nous donc paro&icirc;tre<BR> <l>Les habitans de ce s&eacute;jour:</l> <l> Afin de les mieux reconno&icirc;tre, </l> <l> Que chacun passe tour &agrave; tour. </l></sp>

  10. Scripts, captions, FX... <camera>Zoom in to overlay showing some stock film of hansom cabs galloping past.</camera> <caption>London, 1895.</caption> <caption>The residence of Mr Oscar Wilde.</caption> <sound>Suitably classy music starts.</sound> <view>Mix through to Wilde's drawing room. A crowd of suitably dressed folk are engaged in typically brilliant conversation,laughing affectedly and drinking champagne.</view> <sp who="TJ"><speaker>Prince of Wales</speaker> <p>My congratulations, Wilde. Your latest play is a great success.</p></sp>

  11. Transcribing speech • normalization issues • ease of reading vs accuracy • interpretation vs prosody • analagous to problems of handling digitized images

  12. The Spoken base module • components : <u> <event> <kinesic> <vocal> <pause> <shift> • contextual information in header <settingDesc> <particDesc> • facilities for synchronization and timing

  13. Features of speech

  14. Utterances • Basic unit of discourse, corresponding to speaker turns • Optionally grouped into higher-level divisions (<div>s), e.g. to mark discourse function • Linked by who attribute to <person>description in header

  15. Vocals and events • Empty elements are used to mark paralinguistic phenomena <u who="Jan">This is just delicious</u> <event desc='telephone rings'> <u who="Kim">I'll get it</u> <u who="Tom">I used to <vocal desc="cough"/> smoke a lot</u> <u who="Bob"><vocal desc="sniff"/>He thinks he's tough</u> <vocal who="Ann" desc="snorts"/>

  16. Voice quality and prosody • The <shift> element is used to mark changes in voice quality • Other prosodic features may be marked using specific kinds of <seg>or entity refs <u who="LB"> <shift feature="loud" new="f"/>Elizabeth</u> <u who="EB">Yes</u> <u who="LB"><shift/>Come and try this <pause/> <shift feature="loud" new="ff"/>come on</u>

  17. Another example <u who="MAR">you never <pause/> take this cat for show and tell <pause dur='5'> meow meow</u> <u who="ROS">yeah well I dont want to</u> <event desc='toy cat has bell in tail which continues to make a tinkling sound'> <vocal who="MAR" desc='meows'> <u who="ROS">because it is so old</u> <u who="MAR">how <reg orig="bout">about</reg> yourcat <pause/> yours is new <kinesic desc='shows Father the cat'></u> <u who="FAT" trans="pause">that<pause/> darling</u> <u who="MAR"><s>no mine isnt old</s> <s>mine is just um a little dirty</s></u>

  18. Participant Description <person id="P1" sex="F" age='mid'> <birth date='1950-01-12'> <date>12 Jan1950</date> <name type="place">Shropshire, UK</name> </birth> <firstLang>English</firstLang> <langKnown>French</langKnown> <residence>Long term resident of Hull</residence> <education>University postgraduate</education> <occupation>Unknown</occupation> <socecstatus source="PEP" code="B2"/> </person> <person id="P1" sex=F age='mid'> <p>Female informant, well-educated, born in Shropshire UK, 12 Jan 1950, of unknown occupation. Speaks French fluently. Socio-Economic status B2 in the PEP classification scheme. </person>

  19. Setting Description <settingDesc> <setting who="P1 P2"><name type="city">Bedford</> <name type="region">UK: South East</name> <date value="1989">early spring, 1989</> <locale>rug of a suburban home</locale><activity>playing</activity></setting> <setting who="P3"><name type="city">Bedford</name><name type="region">UK: South East</name><date value="1989">early spring, 1989</date><locale>at the sink</locale> <activity>washing-up</activity></setting> <setting who="P4"><name type="place">London, UK</name> <time>unknown</time><locale>broadcasting studio</locale> <activity>radio performance</activity> </setting></settingDesc> • eg from P2

  20. Timing • Pausing • use<pause> element • Duration • use dur attribute • Overlap • use trans attribute

  21. Overlap Have you heard the the election results? its a disaster its a miracle <u id="A1" who="A">Have you heard the</u> <u id="B1" who="B" trans="latching">the election results? </u> <u id="A2" who="A" trans="pause">its a disaster</u> <u id="B2" who="B" trans="overlap">its a miracle </u>

  22. The Dictionary base tagset • primarily for printed dictionaries, rather than lexica or dictionary production systems • <entry>, <entryFree>, and <superEntry> • <sense> and <hom> • logical structure vs. typographic fidelity

  23. Constituents of a Dictionary Entry • the form group • the grammatical-information group • the definition or translation • etymology • examples • usage information • cross-references to other entries • notes and related entries

  24. Dictionary components (1) • <form> grouping element for one or more of <orth> <pron> <hyph> <syll> <stress> etc. • <gramGrp> groups specialised grammatical tags <gen>, <number> etc • <def> for definition text, <trans> for translation • <etym> for etymology

  25. Dictionary components (2) • examples <eg> • usage note <usg> • label <lbl> • related entries <re> and specialized pointers <oRef>, <pRef> etc

  26. Simple example <entry><form><orth>OATS,</orth> <gram>n. s.</gram> <etym>[aten, Sax.]</etym> <def>A grain, which in England is generally given to horses; but in Scotland supports the people.</def> </form></entry>

  27. The additional modules • Linking segmentation and alignment • Simple analytic mechanisms • Feature structures • Certainty and responsibility • Transcription of primary sources • Text-critical analysis • Names and dates • Graphs networks and trees • Tables, formulae and graphics • Language corpora

  28. Linking, segmentation, alignment • Provides generic segmentation elements • Provides extended pointer syntax and linking • <xptr>, <xref>, <link>, <linkGrp> etc. • Extensive set of attributes for linkage, correspondence,synchronization, aggregation, alternation, and interpretation

  29. Generic segmentation elements • <seg> for arbitrary (nesting) segmentation • <s> for end-to-end segmentation • use type attribute to subcategorise • <anchor> for points • Segmentation is the key to successful linking and analysis

  30. Clustering • (Difficulty (is being expressed) • with ((the method) (to be used))) <s>Difficulty <seg>is being expressed</seg> with <seg><seg>the method</seg> <seg>to be used</seg></seg></s>

  31. discontinuous segments • fundamental problem • join by internal or external links “You put it,” Quill reminded him, “in the safe.” <s id="s1">"You put it,"</s> <s id="s2">Quill reminded him,</s> <s id="s3">"in the safe."</s>

  32. discontinuous segments “You put it,” Quill reminded him, “in the safe.” • can also use PART attribute to indicate that segments are incomplete <s id="s1" next="s3">"You put it,"</s> <s id="s2">Quill reminded him,</s> <s id="s3" prev="s1">"in the safe."</s>

  33. discontinuous segments “You put it,” Quill reminded him, “in the safe.” <s id="s1">”You put it,”</s> <s id="s2">Quill reminded him,</s> <s id="s3">“in the safe.”</s> <join targets="s1 s3" result="s"/>

  34. Translation pairs <s id="s1" corresp="s2" lang="EN"> For a long time I used to go to bed early</s> <s id="s2" corresp="s1" lang="FR"> Longtemps je me couchais de bonne heure</s> • <correspGrp type="trans"> • <link targets="s1 s2"/> • </correspGrp> and/or....

  35. Synchronization • of whole elements • of points in time <u id="A2" who="A" synch="u2"> its a disaster</u> <u id="B2" who="B">its a miracle</u> <u id="A1" who="A">Have you heard <anchor id="AO"/>the</u> <u id="B1" who="B" start="A01"> <anchor id="BO1"/>theelection results? yes</u>

  36. Analytic mechanisms • Specific kinds of segment for linguistic analyses • Specialized interpretive pointers (<span> and <spanGrp>) • The ana attribute and its possible targets • <interp> and <interpGrp> • feature systems <fs> and <fsd>

  37. Arbitrary characterizations • The <span> element can be used to point into a stretch of a text and characterize it in any way • Targets must be SGML identifiers <spanGrp resp=LB type="thematic" > <span value="ships" from="P1" to="P2"> <span value="shoes" from="P4" to="P8"> <span value="sealing wax" from="P12" to="P14"> </spanGrp>

  38. More detailed analysis • the ana attribute is of type IDREFS • what does VVD identify? • a prose description • an <interp> element • a feature structure <w ana="VVD">annotated</w>

  39. using interp... <w ana="VVD">annotated</w> <w ana="NN2">corpora</w> <interp id="VVD" type="lexical class" value="verb past tense"/> <interp id="NN2" type="lexical class" value="noun plural"/>

  40. Hierarchic bundling of interps • nouns can be common or proper • nouns can be singular or plural <interpGrp value="nomimal"> <interpGrp value="common"> <interp value="singular"/> <interp value="plural"/> </interpGrp> </interpGrp>

  41. Feature structures • a feature structure consists of a bundle of features • a feature has a name and a value • values may be binary switches, symbols, strings, or feature structures • bundling may constrained in various (not necessarily hierarchic) ways

  42. Using a feature structure... <w ana="NN2">corpora</w> <fs id="NN2"> <f name="class"><sym value="noun"></f> <f name="number"><sym value="plural"></f> <f name="proper"><minus/></f> </fs> <fs id="NN1"> <f name="class"><sym value="noun"></f> <f name="number"><sym value="singular"/></f> <f name=proper><plus/></f> </fs>

  43. ...feature definitions may be stored as a feature library... <fLib> <f id="FCN" name="class"> <sym value="noun"></f> <f id="FN1" name="number"> <sym value="singular"></f> <f id="FN2"name="number"> <sym value="plural"></f> <f id="FPM" name="proper"> <minus/></f> ... </fLib>

  44. ...and invoked by reference <fLib> <f id=FCN name=class> <sym value=noun> <f id=FN1 name=number> <sym value=singular> <f id=FN2 name=number> <sym value=plural> <f id=FPM name=proper> <minus> ... </fLib> <fs id="NN1" feats="FCN FPM FN1"/> <fs id="NN2" feats="FCN FPM FN2"/>

  45. Not covered here • Certainty and responsibility • Names and dates • Graphs, networks and trees • Tables, formulae and graphs • Language corpora

  46. Summary Scholars want a lot! • orthographic transcription • all languages of all types of all times • pointer(s) to digital recording or images • markup of proper nouns, dates, times, etc. • part-of-speech and morphological tagging • syntactic, semantic, stylistic or other analyses • cross references to other material on the topic • editorial commentary and annotation • etc., etc., etc. The TEI scheme is designed to facilitate these and more: but to get the best out of it, you have to know what is there...

More Related