1 / 51

OSIS – A Closer Look

OSIS – A Closer Look. Steven J. DeRose, Ph.D. Chair, Bible Technologies Group http://www.bibletechnologies.net sderose@acm.org November 22, 2002. Why have a standard? (first, for publishers). Can reduce the costs of: Editing and publication process Software purchase, training, maintenance

drucilla
Download Presentation

OSIS – A Closer Look

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OSIS – A Closer Look Steven J. DeRose, Ph.D. Chair, Bible Technologies Group http://www.bibletechnologies.net sderose@acm.org November 22, 2002

  2. Why have a standard?(first, for publishers) • Can reduce the costs of: • Editing and publication process • Software purchase, training, maintenance • Rekeying, scanning, and conversion • Lets texts survive when your WP or typesetting program goes obsolete • Facilitates multi-format, multi-platform delivery and distribution • Enables use of generic tools

  3. Why have a standard?(next, for users) • Lets you obtain the same texts regardless of what reading and other tools you use • Because the publisher does no more work to support 10, than to support 1 • Helps texts survive when your book-reading software goes obsolete • Reduced costs • Better, more reliable resources • Enables communities of interest • Shared notes, collaborative study,…

  4. The medium picture Cost savingsusually start here XHTML Typeset OCR Braille XML/OSIStext HTML PDF WPs Open eBook Other XML Palmtops 4+7 convertors instead of 4  7 (and reality is bigger) Cell delivery

  5. The basic principle:“Descriptive markup” • WPs only see “huge, bold, space before” • Now find/reformat all chapter headings • Expensive to apply a house style or look/feel • Hard to create diverse forms: • Web, paper, and braille publication • A perfect user could use stylesheets • But interfaces make inconsistent work easier • Instead: say what kind of portion each is • A formatter applies rules by kind

  6. Why should I separate out the formatting? • It speeds your work • You can use a stylesheet from someone else, and not have to do any manual formatting • Typesetter can enhance formatting without risking corrupting your content • Therefore, less time wasted reviewing galleys • Multiple formats from the same source • Print, braille, Web, etc. • House styles for different journals • Last-minute changes are safer, cheaper • Especially crucial for Bible publishing

  7. Why not just use HTML? • HTML is nice but lacks • Units like poem, chapter, verse, inscription • Ways to annotate for meaning, grammar, etc • Support for reference systems: "Matt 1:1" • Multi-purpose tags like <b>, <i>, etc. • Are hard to tease apart when you need to • HTML limitations encourage using tables to force layout, making re-use infeasible • And…..

  8. Compare • <item> <desc>Cashmere sweater</desc> <price unit='yen'>120000</price></item><item> <desc>Socks</desc> <price unit='yen'>1000</price></item> versus: • <br>Cashmere sweater, ¥120000<br>Socks, ¥1000

  9. Why is the markup better? • When relations are marked,an indexer can match price with item • If not, there is no reliable way • (there are lots of ways one might guess…) • A search for “Cashmere and ¥1000” hits • Needlessly annoying the searcher • How many false hits have you had like this? Markup is not just about formatting

  10. How do you spell XML? • The Extensible Markup Language • HTML on steroids (sort of) • Key features: • Intrinsic support for Unicode • Ability to create your own units • Ability to validate how they are used • (no chapters inside footnotes, etc.) • Very easy for computes to process • Separates formatting (remember earlier)

  11. OSIS and XML • OSIS is an application of XML • XML specifies the syntax • OSIS specifies a lexicon for our genre  Life would be easy if natural languages were that simple! • There are many other lexica for XML • Humanities: Text Encoding Initiative • Closely related to OSIS

  12. What is OSIS, really? • OSIS defines: • A set of XML element types • p, verse, inscription, note,…. • Certain attributes for those types • type=“devotional” • A standard form for Biblical references • A consistent way to to write them down • A way to specify within-verse locations • A way to refer to editions and translations, or to refer to a passage generically

  13. Concept: a hierarchy osis osisText div type=‘book’ header div type=‘chapter’ workosisWork=‘KJV’ p p title language identifier verse verse verseosisID=‘Gen.1.3’ text content note text content inscription

  14. What's under the covers? • All of this is represented by inserting markers ("tags") into the text • Like HTML but more consistent • All starts and ends are explicit • Three kinds: • Start tags: <p> • End tags: </p> • Empty tags: <milestone/> • <p>Jesus wept.</p>, is an element.

  15. What else is there? • Elements can contain other elements • <div type="chapter"> <verse>In the beginning...</verse> <verse>And the Word...</verse>...</chapter> • Many elements can also contain text • Some elements require or prohibit others • No <div> inside <abbr> • An empty tag just marks a point • <milestone type="pb"/>

  16. Attributes • Usually modify a whole element • Appear only inside start tags <name type="nonhuman">Baal</name><div type="chapter">…</div><verse osisID="Rev.22.21"><q who="God"><transChange type="added">

  17. abbr  actor  caption  castGroup  castItem  castList  catchWord  cell  closer  contributor  coverage  creator  date  description  div  divineName  figure  foreign  format  The full set of (68) tags • head  • header  • hi  • identifier  • index  • inscription  • item  • l  • label  • language  • lg  • list  • mentioned  • milestone  • milestoneEnd  • milestoneStart • name  • Note • osis  • osisCorpus  • osisText  • p  • publisher  • q  • rdg  • reference  • refSystem  • relation  • revisionDesc • rights  • role  • roleDesc  • row  • salute  • seg  • signed  • Source • Speaker • speech  • subject • table  • teiHeader  • title  • transChange • type  • verse  • w  • work 

  18. Don't panic • A lot of these get used once each, in the header, almost as a ritual • You can paste a sample header and fill it in • About a dozen form the Dublin Core set for cataloging and identification info • Most of the rest fall into nice groups • The hard parts (later) include • Milestones • Quotes when they cross verses/paragraphs

  19. Three major pieces to OSIS • The markup elements and their attributes • Defined by a schema • The standardized referencesystem • Partly defined in the schema • Partly defined in grammar and prose • The authoritysystem • A way to declare formal/normalized names • Declaration portion still in process

  20. Basic OSIS markup (What's in a name?)

  21. Sample markup <div type="testament"> <div type="book" osisID="Gen"> <div type="chapter" osisID="Gen.1"> <verse osisID="Gen.1.1">In the beginning God created the heaven and the earth.</verse> <verse osisID="Gen.1.2">And the earth was without form, and void; and darkness was upon the face of the deep. And the Spirit of God moved upon the face of the waters.</verse> <verse osisID="Gen.1.3">And God said, Let there be light: and there was light.</verse> <verse osisID="Gen.1.31">And God saw every thing that he had made, and, behold, it was very good. And the evening and the morning were the sixth day. <note type="x-StudyNote">And the evening...: Heb. And the evening was, and the morning was etc.</note></verse> </div></div></div> </osisText></osis>

  22. Big generic elements • div  Testament, book, chap, section • type the type of division, as above • divTitle optional display title • title  Title of any div • list  Genealogies and other lists • label • item • table  Mainly for appendixes, etc. • row • cell

  23. Book/chapter/verse • Large units all use the <div> element • It has a type attribute, with values • appendix • book • chapter • concordance • glossary • As with most attributes you can add new values if they start with "x-" • <div type='x-toronto-thing'> • We expect to add more div types in time • <verse osisID="Rev.3.20"> Note: There are no separate tags for testament, book, or chapter

  24. Small items • abbr  <abbr expansion="">… • divineName  <divineName>The Lord… • foreign  <foreign lang="">Talitha… • hi  Emphasis in notes/comm • inscription  Mene, mene, tekel, parsin • mentioned  The name <mentioned>Peter • name  Destroyed the <name type= "nonhuman">Baals</name> • P The ubiquitous paragraph • q  Quotations (more later)

  25. Genre-specific elements • Epistolarysalute, closer • <closer>I, Paul, sign this with my own hand.</closer> • Illustrationsfigure • May contain caption, note, index • Poetrylg, l • Also used for other line-oriented text • lg (line group) can be nested • Dramaspeech, speaker • speaker ok in: speech cell closer div inscription l p q salute verse • who attribute can point to a castItem in the header

  26. Inscription <verse osisID="Dan.5.25">This is the inscription that was written: <inscription>Mene, Mene, Tekel, Parsin<note type="">Aramaic UPARSIN (that is, AND PARSIN)</note></inscription> • How many inscriptions can you think of?

  27. About the source/target layout • <milestone> • Use to mark point events • page and column breaks of a source manuscript • Intended screen breaks for display • Types: column footer header line page screen • Note: Do not confuse with milestoneStart and milestoneEnd, which stand in for several other elements when they must cross verse/p boundaries in certain ways.

  28. About the text itself • transChange  Changed in translation • Types: added amplified changed deleted moved • rdg  Variant readings • Used only within notes (for now) • <note>Some ancient mss <rdg>kiss the Son</rdg></note> • seg  (extensions) • w word-level linguistics • Attributes: POS, morph, lemma, gloss, src, xlit

  29. Attributes of all elements(all are optional) Name   Type   Meaning osisRef   osisRefType        annotateWork   anything    I am about W     annotateType   osisAnnotation   My relation to W ews   anything             ID   xs:ID   For Web to link to lang   languageType   language, wr sys osisID   osisIDType   reference to here resp   anything    responsible person  splitID   anything    (later) type   anything      subType   anything    n   anything    name/num of unit

  30. The reference system (I am named, therefore I am)

  31. Header overview • Purpose • Identify the file as an XML file • Identify the file as using the OSIS schema • Say whether it's one text or a collection • Identify and declare names for: • The work itself (title, author, etc) • Other works referenced • Verse reference systems used • Characters in the text <castList>

  32. Header sample <?xml version="1.0" encoding="UTF-8" ?> <osis xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="osisCore.1.1.xsd"> <osisText osisIDWork="KJV" osisRefWork="defaultReferenceScheme"> <header> <work osisWork="KJV"> <title>King James Version of 1769</title> <identifier type="OSIS">KJV</identifier> <language>en</language> <refSystem>Bible.KJV</refSystem></work> <work osisWork="defaultReferenceScheme"> <refSystem>Bible.KJV</refSystem></work> </header>

  33. Other header elements • osisCorpus • Use inside <osis> when there will be several texts in one document, as for a polyglot • osisCorpus can have its own header • osisCorpus then contains osisText elements • teiHeader • Allows including a fuller TEI-style header • Work uses the standard "Dublin Core" tags to give catalog/bibliography info

  34. Dublin Core • title  The title of the work or collection • creator  The primary author • contributor  Other contributers (set 'role') • identifier  ISBN or similar unique ID of work • date  Publication date • language  Primary language of the work • rights  Statement of permissions/rights • publisher  Name of the publisher • description  An abstract or precis of the work • format  What representation (=OSIS) • coverage  Intended audience and scope • relation  • source  If derived from another work • subject  LCSH or similar subject descr • type • refSystem  (OSIS only, not in D.C.)

  35. Identifying parts of the work • osisID must be specified on any element that has a canonical reference: • <verse osisID="Luk.3.10"> • <p osisID="Rev.3.20"> • <div type="chapter" osisID="Luk.3"> • 3-letter book names, periods to separate • HTML <a name="…"> available as well • More useful in notes/commentary, not Bible • Back-of-book index entries • <index level1="Idols" level2= "burning of" level3="by Hezekiah"> • <index level1="False gods" see="Idols">

  36. When it won't come out even • If several verse are translated as (say) a p • Put all the appropriate osisIDs on the p • <p osisID="Matt.1.1 Matt.1.2"> • If a verse is split across paragraphs • Tag each part; use splitID to number them • <p>…<verse osisID="1Pe.1.3" splitID="1">…</verse></p> <p>…<verse osisID="1Pe.1.3" splitID="2">…</verse>…</p> • milestone_Start… milestone_End • Used to mark units that cross boundaries • abbr closer div foreign l lg q salute seg signed speech verse

  37. References • Reference to other places/works • <note>See also <reference osisRef= "Mat.1.1">Matthew</reference> for a similar theme.</note> • div, figure, note, and reference can also directly refer: • <div type="commentary" osisRef="Luk.3.10"> • This identifies the passage this commentary div is about. • HTML <a href="…"> also available • (more useful in notes/commentary, not Bible)

  38. work ref canonical ref canonical ref grain ref range ref finegrain ref Reference syntax 'code point', ~=character NIV.Heb:Psa.42.1-Psa.43.12@cp[12] book verse edition chapter refsystem grain type grain value

  39. Notes • Notes are placed right where they are referenced in the text. • Notes have several types • allusion alternative background citation devotional exegesis explanation study translation enumeration variant • Additional types must start with "x-" • catchWord -- marks referenced text cited within a note • <note><catchWord>hello</catchWord> may also be translated "goodbye" here.</note> • rdg -- marks alternate readings

  40. On to the authority system The name is the thing, and the true name is the true thing. To know the name is to control the thing. -- Ursula LeGuin

  41. Cast-lists • To declare cast of characters • Provides a formal ID for each • Can refer to ID from <speaker>, <q>, etc. • castList • castGroup • castItem • actor • role • roleDesc

  42. The authority system • Only supported for castList at present • We intend to provide • A schema for declaring sets of formal names • A way to invoke such lists in documents • Standard name sets for • Bible versions • Versification schemes • People, places, etc. in the Bible • Journals, classical literature, and other works commonly cited in Biblical studies

  43. OSIS in practice Tourist to police officer: Can you tell me how to get to Carnegie Hall? Officer to tourist: Practice, practice, practice.

  44. 5 levels of 'correct': SLipshod Only well-formed Valid Accurate Complete SL: no check required O: Load in IE 5+ V: xp, xmetal, and other true validators A: requires human proofreading and interpretation C: there is always more that could be marked up How do I know if the markup is correct?

  45. Tools vs. today • Today we will use the raw form • Experts will need to know this • Users should have protective software • Some XML editing programs: • SoftQuad XMetal -- $300 • Open Office -- free, very promising • Some generic-enough HTML editors: • BBEdit, emacs, Netscape Communicator

  46. Getting to OSIS • The cleaner your data, the easier it is • Data is seldom as clean as you think it is • Structured formats (USFM, XSEM, LGM, ThML) are the easiest sources • Tools: • Perl/awk/sed/cc and the like • XSLT if coming from XML • BTG has sponsored development of several convertors. • BTG will maintain a repository of utilities

  47. Getting your OSIS XML to display in IE • Make sure the document is at least WF • Name it filename.xml • Refer to a stylesheet if you want formatting instead of just an outline view <?xml version="1.0"?><!DOCTYPE osis []><?xml-stylesheet href="mystyle.css" type="text/css"?><osis xmlns:="http://www.bibletechnologies.org/namespaces/OSIS-1.1"><header>…

  48. Getting your OSIS printed • Most typesetting programs now import XML • OSIS converts easily to most relevant XML schemas, using XSLT • Word processors are also gaining ability to import arbitrary XML • Typesetting firms, esp. for journals, are starting to accept XML as well.

  49. Near-term concerns of OSIS • Linguistic annotation • Formal name lists for people, places, translations, etc. • Connecting text to multimedia • Greater support for secondary genres • Tool development and conformance

  50. How you can help • Find the best place to apply OSIS in your organization, and do it. • Join a Working Group • Send feedback, feature requests, etc. • Join a Working Group • Convert or create OSIS texts • Join a Working Group • Create a converter for your current format • Join a Working Group • Tell your friends and colleagues • Join a Working Group

More Related