Life after HTML: An Introduction to the Future of Electronic Publication
Explore the evolution from HTML to XML, its benefits, and applications. Learn about information interchange, the XML family, and the significance of structured data in electronic publishing.
Life after HTML: An Introduction to the Future of Electronic Publication
E N D
Presentation Transcript
Life after HTML an introduction to the future of electronic publication Lou Burnard Humanities Computing Unit Oxford University http://users.ox.ac.uk/~lou
What went wrong? The web today!!!
who cares? • application developers and maintainers (the desperate perl hacker) • tools builders (the mythical CS grad student) • document creators and conservators • document managers • you and me, anxious to communicate
Information Interchange (1) A B E C D 20 translations required (n2-n)
Information Interchange (2) A CommonInterchangeStandard B E C D 10 translations required (2n)
What is XML? • eXtensible Markup Language • An activity of the World Wide Web Consortium (W3C) • original goal: delivering SGML on the web • new goals: refocus web development • Rewriting the rules of the game? • Adding intelligence to data • Database exchange • Client-side processing • Access to richer data • Better data management http://www.w3.org/pub/WWW/Markup/Activity
The XML WG Hall of Fame Jon Bosak, Sun (Chair) Paula Angerstein, Texcel Tim Bray, Textuality & Netscape James Clark Dan Connolly, W3C Steve DeRose, INSO Dave Hollander, HP Eliot Kimber, Isogen Tom Magliery, NCSA • Eve Maler, ArborText • Murray Maloney, Muzmo &Veo Systems • Makoto Murata, Fuji Xerox • Joel Nava, Adobe • Conleth O'Connell, Vignette • Jean Paoli, Microsoft • Peter Sharpe, SoftQuad • C. M. Sperberg-McQueen, UIC • John Tigue, DataChannel (plus a cast of hundreds on the SIG)
What is a document? • content: the components (words, images etc). which make up a document • structure: the organization and inter-relationship of the components • presentation: how a document looks and what processes are applied to it
Separating these things means... • the content can be re-used • the structure can be formally validated • the presentation can be customized for • different media • different audiences • … in short, the information can be uncoupled from its processing • This is not a new idea! But it’s a good one...
The XML family • XML (Extensible Markup Language): • A subset of SGML (ISO 8879) designed for easy implementation • XLink (Extensible Linking Language): • A set of standard hypertext mechanisms based on HyTime (ISO/IEC 10744) and the Text Encoding Initiative (TEI) • XSL (Extensible Stylesheet Language): • A standard stylesheet language for structured information derived from DSSSL (ISO/IEC 10179) and key CSS concepts
like HTML, XML must... • be usable on the net (but not restricted to it!) • support a wide variety of applications • be compatible with SGML • be easy to process • have few optional features (ideally none) • be human-legible and reasonably clear • be specifed in a way that is both formal and concise
unlike HTML... • XML is an extensible markup language • XML markup can be verified • XML markup reflects themeaning of your data, not its appearance
Some intelligent questions... Perec, Georges Life - a users manual. Collins, 1988. Translated from the French [La vie mode d’emploi] by David Bellos. xviii+581 pp. 841.941 Literature - French - 20th century • what’s the author’s name? • what titles have the classification …? • what authors have the name… ? • what translators are there ? • which books have more than 400 pages?
… which non-extensible markup doesn’t help us answer <p><b>Perec, Georges</b> <I>Life - a users manual. Collins, 1988. Translated from the French </I>[La vie mode d’emploi] <I> by David Bellos. xviii+581 pp. 841.941</I> Literature - French - 20th century Perec, Georges Life - a users manual. Collins, 1988. Translated from the French [La vie mode d’emploi] by David Bellos. xviii+581 pp. 841.941 Literature - French - 20th century
Extensible (user-defined) markup <author>Perec, Georges</author> <title>Life - a users manual</title><publisher>Collins</publisher><publDate>1988</publDate><note>Translated from the French [<title>La vie mode d’emploi</title>] by <translator>David Bellos</translator></note> <pages>xviii+581</pages> <ddc>841.941</ddc><keywords><term>Literature</term> <term>French</term> <term>20th century</term></keywords>
Verifiable markup • well-formed XML markup • tags (etc.) are syntactically correct • every tag has an end-tag • tags are properly nested • valid XML markup • only declared tags are used • all tag occurrences conform to specified positional constraints
Well-formedness <?xml version=“1.0” standalone=“yes”?> • <greeting>hello world!</greeting> • <greeting>hello world!</Greeting> • <grunting> <greeting>hello</greeting> world!</grunting>> • <greeting><grunting>hello</greeting> world!</grunting> • <greeting type=“loud”>ho!</greeting> • <greeting type=loud>ho!</greeting> • <greeting file=“ho.wav”/> • <greeting file=“ho.wav”>
A Valid XML Document • invokes a Document Type Declaration (dtd) • a dtd specifies • names for all your tags • names and default values for their attributes • rules about how tags can nest • names for re-usable pieces of data (entities) • and a few other things • XML dtds are much simpler than SGML dtds
A simple dtd <!ELEMENT greeting (#PCDATA)> a greeting consists of character data... <!ELEMENT name (#PCDATA)> <!ATTLIST name reg CDATA #IMPLIED> as does a name, which can also have an attribute called reg <!ELEMENT grunting (#PCDATA|greeting|name)* > a grunting contains zero or more of the other things, possibly mixed up with some character data
When do you need a dtd? • at document preparation time (definitely) • validation, checking, consistency • at document processing time (probably) • simplifies generic/specific processing • may clarify intended semantics • at document delivery time (possibly) • strictly unnecessary for wf docs • but reduces processing effort
Where do I get a dtd? • flood of industry announcements • some recent examples • Resource Description Framework (for metadata) • Channel Definition Format (for push technologies) • Electronic Data Interchange (banking etc.) • Handheld Device Markup Language (sic) • Chemical Markup Language (chemical modelling) • Math Markup Language (maths!) • Text Encoding Initiative (scholarly texts)
The meaning of markup • ontologically speaking… • markup may be performative or descriptive • markup asserts an intention or interpretation which cannot be formally defined • tags have no predefined meaning • presentation or behaviour of an XML document is specified elsewhere
Where is the behaviour of an XML document defined? • in a stylesheet • using XSL or CSS • possibly embedded in a program applet, or script, or JAVA bean • defined for that particular dtd, tagset, or tag • by reference to pre-existing mutual agreement amongst user communities • aka “namespaces” • by reference to a Document Object Model
Xlink: the future of hypertext We believe in the interconnectedness of all things F. Braudel
Some linking terminology • a link asserts a relationship between linkends • links may be typed • link behaviour is what happens when a link is activated • transclusion: new content appears without displacing current content • linkends may be single or multiple resources • linkends may be target or source with respect to each other
Linking in HTML • link behaviour is tied to particular tags • only two types • <A> replace in same (or new) window • <IMG> transclude inline (usually) • link targets are always whole documents • cannot reassemble fragments • cannot add links to read-only documents • linkends are inherently fragile
Xlink aims to do better • formerly XLL, formerly XML-Link • two components • Xlink • XPointer • working drafts at http://www.w3.org/TR/WD-xlink http://www.w3.org/TR/WD-xptr • WARNING: This is all subject to change!
XLink goals (1) Provide advanced linking constructs within XML documents(XLink) • To anything
Xlink goals (2) • Provide advanced addressing into XML document structure(XPointer) • From anything
XPointer is… • for pointing to subparts of XML resources (even if they don’t have IDs) • based on the Text Encoding Initiative (TEI)“extended pointer” notation • usable in association with URLs/URIs <a href="http://some.url.com/Thing/foo.xml#id(foo)"> <!ENTITY bar SYSTEM "http://some.url.com/Thing/foo.xml#id(foo)">
An XPointer consists of • a series of location terms in the form termname(parms) • terms are separated by a dot id(foo).child(3,SEC).child(4,LIST) • each term is the location source for the next • you can also use terms which point at strings, attributes, etc.
XPointer advantages • a compact syntax which scales well • as robust as possible • any changes “off the path” won’t (necessarily) break the link • IDs are as safe as it gets... • if there’s an ID nearby, point to it and walk down/up • if not, walk down from the root
Xpointers: a flavour • An Xpointer addresses the tree that the markup represents, not the markup itself • Location terms address particular nodes in the tree e.g. • absolutely eg id(), html() • relatively eg child(), descendant(),ancestor(), psibling(), fsibling() • string and attribute matches • can also specify spans
id() and html() id(concepts) html(baz)
child() and descendant() child(1,chapter).child(2,section) descendant(1,abstract)
Xpointer examples id(intro).child(3,div1) the third <div1> within the element with identifier INTRO html(foo).child(2,div1).(4,p).child(1,quote,lang,”LAT”) the first <quote> whose LANG attribute is set to “LAT” within the fourth <p> of the second <div1> of whatever element contains an HTML <A NAME=“#foo”> descendant(#all,para) every <para> within the currentlocation source span(child(1,pb,n,”14”),child(1,pb,n,”23”)) everything between the first <pb> whose N attribute is “14” and the first one whose N attribute is “23”
Xlink proper • allows you to invent your own linking elements and define their behaviour • the xml:link attribute is used to specify the linking properties of your element • allows you to create link databases • “standoff” markup allows you to link to non-modifiable documents • inline vs out-of-line links
Link behaviours • show attribute • new/replace/embed • actuate attribute • user/auto • behavior attribute • “for other instructions”
The importance of XLink • Not just about fancy capabilities and new ways of associating information • Promotes the creation of advanced information structures and site management • Makes possible an industry devoted to knowledge management (that's us!) • For example: OED + LION
script Transformation Tool valid XML documents XML or non- XML documents transforming xml documents
XSL: the final piece • Standard Style Sheet Language • Combines DSSSL “flow objects” and CSS objects • Uses XML syntax (rather than Scheme) • Also uses ECMAscript for extensions • Automatic conversion from CSS
XSL is the next step for publishing • XSL is not just about translation • user-configurability • enhanced clients • Single source for print and online delivery • XSL is intended to complete the internationalization of publishing
Tools you can use now • Editing/creating documents • emacs + psgml; XED; any SGML editor • Parsers • free standing: SP • java applets: (many) • embedded in applications http://www.stud.ifi.uio.no/ ~larsga/linker/XMLtools.html
Tools you can use now • Browsers and viewers • Hybrick; IE5; Netscape 4; Amaya, Xmetal… • Toolkits • DOM support now in Perl, TCL… • Transformers • Jade
The wider picture • XML is not just about exchanging data between machines • It's also about communication between humans • XML is not just about the web • It's about information in general • XML is not just about technology • It's also about the relationship between content creators and software vendors
How we will use XML (1) xml Heterogenous clients interfacing with a single database