the xml sgml conundrum n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The XML/SGML Conundrum PowerPoint Presentation
Download Presentation
The XML/SGML Conundrum

Loading in 2 Seconds...

play fullscreen
1 / 54

The XML/SGML Conundrum - PowerPoint PPT Presentation


  • 345 Views
  • Uploaded on

The XML/SGML Conundrum. Presented by Joseph V. Gangemi Senior Consultant J.V.G. Consulting Services © 2005. Agenda. Compare aspects of XML and SGML Explore rational for choosing between them Discuss affect of XML on publishing applications Some personal thoughts on SGML

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The XML/SGML Conundrum' - adamdaniel


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the xml sgml conundrum

The XML/SGML Conundrum

Presented by Joseph V. Gangemi

Senior Consultant

J.V.G. Consulting Services

©2005

agenda
Agenda
  • Compare aspects of XML and SGML
  • Explore rational for choosing between them
  • Discuss affect of XML on publishing applications
  • Some personal thoughts on SGML
  • Where do we go from here?
  • Audience participation is encouraged

J.V.G. Consulting Services © 2005

what is a conundrum
What is a Conundrum
  • A question or problem having only a conjectural answer
    • Should I use SGML or XML?
    • Do I need a DTD or an XSD?
    • Will publishing survive the XML marketing hype?
  • An intricate and difficult problem
    • Does XML address the needs of my application

J.V.G. Consulting Services © 2005

what is sgml
What is SGML?
  • Markup Language for Text Processing
    • ISO Standard 8879
    • Syntax rules for defining a markup language
      • Does not include a set of tags
    • Full-featured to support a wide array of applications
    • Perceived as complex, but in reality, text applications are complex
      • Not all features are used in all applications

J.V.G. Consulting Services © 2005

what is xml
What is XML?
  • Subset of SGML
    • Designed for the transport of text over the Internet
  • Extensible Markup Language meant to complement HTML
    • More generic; not a formatting language
    • Structure rather than format oriented
  • Less complex from a programmer’s perspective
    • Not necessarily the user’s!

J.V.G. Consulting Services © 2005

the xml myth
The XML Myth
  • XML does not require a DTD or XSD
    • Yes and No
    • From a programming perspective, the DTD or XSD is optional (for some applications)
      • This is also true for SGML
    • From a practical design perspective, IT IS MANDATORY
  • No inherent benefit to XML (or SGML) without one
    • If you cannot define and enforce a tag set, calling it a language is a misnomer.

J.V.G. Consulting Services © 2005

choosing between xml and sgml

Choosing between XML and SGML

(Putting aside the marketing appeal and looking at the technology)

xml sgml feature sets
XML / SGML Feature Sets
  • Review SGML features omitted from XML
    • Original Intent of feature
    • Reason for omission from XML
  • Effect of Omission on Publishing Applications
    • How is user affected?
    • How is product vendor affected?

J.V.G. Consulting Services © 2005

sgml declaration
SGML Declaration
  • Describes the processing environment in which the document can be processed.
    • Identifies character set
    • Specifies features required for successful processing
  • Purely technical information
    • User has minimal, if any, awareness of it
    • Product vendors benefit by being able to tailor product to each client’s application environment
  • Once environment is defined for an organization, it has minimal value (except, of course, to the vendor).
  • XML has a defined environment (World Wide Web)
    • In effect, it has a built-in SGML declaration
    • Processing is restricted to the defined environment

J.V.G. Consulting Services © 2005

sgml dcl features
SGML DCL Features
  • XML defaults the following features to NO

(which means they are not supported)

    • DATATAG
    • OMITTAG
    • RANK
    • LINK
    • CONCUR
    • SUBDOC
    • SHORTREF

J.V.G. Consulting Services © 2005

datatag
Datatag
  • A string of characters that acts as both data and the end-tag of the currently open element.
  • Actually defined poorly since its purpose is to delimit repetitive elements. If it closes an element, it inherently opens the next one too.
  • Never implemented because short references became a better way to achieve same objective.
  • Considered an irrelevant technique and is not supported by vendors.

J.V.G. Consulting Services © 2005

omittag
Omittag
  • Refers to tag minimization
    • Start and end tags can be omitted under certain parsing conditions
  • Method of reducing the character count or physical size of a file back when it meant something to do that 
  • Vendor products make the feature moot.

J.V.G. Consulting Services © 2005

slide13
Rank
  • Rank IS rank
    • Irrelevant concept, poorly conceived, badly defined, and eschewed by we elite purists in the industry
    • Attempt to take linear tagging for typesetting and treat it hierarchically
    • Even Charles wants to see this gone!
  • No relevance to vendor or user

J.V.G. Consulting Services © 2005

slide14
Link
  • Method whereby a process can be associated with an element through an attribute
  • A mechanism for inserting process-specific information into a document
  • Attempt to connect some esoteric concepts that rambled around in Charles mind with some real world processing
  • Difficult to define, impossible to understand, and not considered relevant to most forms of text processing
  • Users ignore it; vendors shun it.

J.V.G. Consulting Services © 2005

concur
Concur
  • Concur
    • Good concept, but never fully grasped by the user community
    • Concurrent document structures coexisting in the same instance.
  • Primarily meant to express the document structure and the formatting structure associated with the document concurrently
  • Never implemented
  • Replaced (kind-of) by namespaces

J.V.G. Consulting Services © 2005

subdoc
Subdoc
  • Ability to include subordinate documents into a master document
  • Defined primarily to support things like anthologies
  • Subdocument may have its own DTD, but must conform to a single SGML declaration
  • Limited value to the user
  • Overkill for the vendor
  • Workarounds are simple and more practical

J.V.G. Consulting Services © 2005

short references
Short References
  • Character strings that represent an entity reference
  • Originally intended to reduce keystrokes
  • Allows characters in content to act as mark-up
  • A Shortref declaration defines a set of string-to-entity mappings
    • Named set
    • Different mappings for same string in different sets
  • Activated by USEMAP declaration that associates map set to element
    • In effect for duration of element

J.V.G. Consulting Services © 2005

feature recap
Feature Recap
  • DATATAG – not supported in SGML or XML
    • No affect on user or vendor
  • OMITTAG – not supported in XML
    • Vendor tools have reduced its value
  • RANK – not supported in SGML or XML
  • LINK – not supported by vendors or XML
  • CONCUR – not supported by vendors or XML
  • SUBDOC – not supported in XML
    • minimal SGML support
  • SHORTREF – not supported in XML
  • No negative impact on the publishing process

J.V.G. Consulting Services © 2005

shared features
Shared Features
  • SGML and XML share many features
      • Not always equally
    • Public and System Identifiers
    • Notations (with restrictions)
    • Parameter entities (with restrictions)
    • Marked sections (with restrictions)
    • Character and Entity References

J.V.G. Consulting Services © 2005

public identifiers
Public Identifiers
  • Used to identify something associated with but separate from the document instance
  • A consistent way to refer to another entity regardless of what it is or where it is
  • Must be unique within its processing universe
    • Global: must be registered with central authority
    • Local: unregistered, but managed within the processing scope
  • Resolved through a standard catalog entry
  • XML copped out and requires system identifiers

J.V.G. Consulting Services © 2005

system identifiers
System Identifiers
  • A system identifier points to a specific object
    • XML uses a URI to specify the entity (usually a file)
    • Since XML is designed for the Web, the URI is invariably a URL
    • Entity consider static and not likely to change
  • In SGML, a System Identifier usually points to a physical file
    • Seldom used in a production application
    • Entities considered dynamic and subject to change
    • Unregistered, local public identifiers are preferred

J.V.G. Consulting Services © 2005

public vs system identifiers
Public vs System Identifiers
  • XML is designed for the Web
    • Represents text in a specific instance for a specific purpose
    • URLs are preferred method of accessing external entities
    • System Identifiers support URLs easily
  • SGML is non-denominational
    • Designed for text in any environment
    • Supports information management from data capture, through editorial processing, to finished product
    • Public Identifiers are more versatile and better suited to changing entities
  • XML syntax supports Public Identifiers
    • Not all XML-compliant applications do.
    • XML products derived from SGML products usually do.

J.V.G. Consulting Services © 2005

notation and parameter entities
Notation and Parameter Entities
  • Notation is similar in XML and SGML
    • Vendor support is usually proprietary
    • More conceptual than practical
    • No industry-wide implementation model across platforms
  • Parameter entities are supported in XML, but XML restricts their use to DTDs; i.e., not permitted in marked sections (or XSDs)

J.V.G. Consulting Services © 2005

marked sections
Marked Sections
  • Marked Sections are restricted to CDATA in XML
    • Recognizes string <![CDATA[ before the content and ]]> after the content
    • CDATA is not parsed by the parser; i.e., embedded tags are ignored
  • SGML allows any type of data in a marked section, even parsable data
    • You can control if the marked section is included (parsed or at least passed to the application) or ignored by the parser

J.V.G. Consulting Services © 2005

affect on publishing
Affect on Publishing
  • XML restrictions limit value of Marked Sections in a publishing application
    • Need include / ignore option
    • Require parameter entity support to implement include / ignore
  • Workarounds are tedious, especially if Marked Sections are not on element boundaries

J.V.G. Consulting Services © 2005

more xml variants
More XML Variants
  • The PIC (processing instruction close) delimiter is ?>
  • Quantities and capacities are effectively unlimited
  • Names are case sensitive
    • (not necessarily a good thing)
  • Underscore and colon are allowed in names
  • Names can use Unicode characters and are not restricted to ASCII
    • Unicode is not widely supported in publishing systems
  • SGML can and does accept these variants by modifying the SGML declaration

J.V.G. Consulting Services © 2005

built in xml entity references
Built-in XML Entity References
  • Predefined entities in XML
    • &amp; for ampersand
    • &lt; for less than (<)
    • &gt; for greater than (>)
    • &apos; for apostrophe (’)
    • &quot; for quotation mark (”)
  • Not predefined in SGML
    • Must be declared if used
  • Programmer’s convenience if DTD not used.

J.V.G. Consulting Services © 2005

external entity references
External Entity References
  • References to external data entities in content are not supported in XML
    • Significant restriction to data organization facilities built into SGML
    • Often used to represent embedded symbols in running text
      • Unicode replaces this approach, but not always supported
    • External entities must be managed within the application’s environment
  • Simple workaround
    • Use empty element with an attribute whose value is declared as an entity
    • Affects DTD or XSD because element must be declared

J.V.G. Consulting Services © 2005

choosing xml or sgml
Choosing XML or SGML
  • Compare inherent features
    • Identify features that apply to your application
    • Estimate effort to support omitted features
      • Imprecise SWAG is usually sufficient
  • Will DTD or XSD define your doctype(s)?
    • If you can’t define it, it won’t work.
    • Can you use a reasonable workaround?

J.V.G. Consulting Services © 2005

where do you stand
Where do you stand?
  • DTDs are better than XSDs in general
  • XSDs are better than DTDs in general
  • DTDs / XSDs are better for text applications
  • DTDs / XSDs are better for data processing applications
  • DTDs are obsolete

J.V.G. Consulting Services © 2005

where do you stand1
Where do you stand?
  • DTDs are better than XSDs in general
  • XSDs are better than DTDs in general
  • DTDs are better for text processing applications
  • DTDs / XSDs are better for data processing applications
  • DTDs are obsolete

J.V.G. Consulting Services © 2005

where do you stand2
Where do you stand?
  • DTDs are better than XSDs in general
  • XSDs are better than DTDs in general
  • DTDs are better for text processing applications
  • XSDs are better for data processing applications
  • DTDs are obsolete

J.V.G. Consulting Services © 2005

here is where i stand
Here is where I stand
  • DTDs are better than XSDs in general
  • XSDs are better than DTDs in general
  • DTDs are better for text processing applications
  • XSDs are better for data processing applications
  • DTDs are NOT obsolete

Dem’s fighting woids !!!

J.V.G. Consulting Services © 2005

dtds are better
DTDs Are Better
  • Easy to use as a working notation during document design
  • Easier for less technical people (like editors) to follow as a visual notation for a document’s structure
  • Easier for a person to read and interpret
  • Clear, concise, and user friendly
  • Able to be processed by computers as well

J.V.G. Consulting Services © 2005

dtds are better1
DTDs are Better
  • Specifically geared to address text notation requirements
    • Text content
    • Text order and hierarchy
    • Text appearance (required or optional)
    • Text occurrence (repeatable)
  • Other data characteristics are not relevant to the application
  • Established methodology with existing support from vendor community

J.V.G. Consulting Services © 2005

dtds are better2
DTDs are Better
  • Designed for text processing
    • Supports editorial activity
    • Easily changed as needs evolve
  • Addresses text content issues with exceptions
    • Inclusions allow elements to occur randomly within text (good when used correctly)
    • Exclusions eliminate recursion that could introduce processing anomalies
      • reduces tag set substantially
  • External entities can be declared in the external DTD subset at the start of the instance

J.V.G. Consulting Services © 2005

xsds are better
XSDs Are Better
  • Data applications have different requirements
    • Document structure is simple
    • If data is in a database, design issues are minimal
  • Primarily used by programmers and other technical personnel (not editors, per se)
  • Characteristics of data are relevant

J.V.G. Consulting Services © 2005

xsds are better1
XSDs Are Better
  • Simplified parser because schema is in same syntax as data
  • Simplified parsing because data is well-formed
  • Physical characteristics of data can be validated as well as structure
  • No exceptions to contend with

J.V.G. Consulting Services © 2005

xsds are better2
XSDs are Better
  • Variations in data are minimal
    • Not intended for editorial processing
    • Exceptions cannot occur because content is in the instance
      • No inclusions
      • No exclusions
  • External entities need not be declared because instance contains specific URLs
  • Entity declarations are not relevant because all entity references are resolved in the instance

J.V.G. Consulting Services © 2005

another myth
Another Myth
  • XML Schema is easier to parse than a DTD
  • Syntactically maybe, because it uses the same XML parser as the content
  • Semantically, not really since the same hierarchical structure expressed in the DTD must be determined; i.e., the internal document object must be built
  • Additional support for the data types and corresponding validation processing also make the process more complex

J.V.G. Consulting Services © 2005

ah but it s free
Ah, but it’s free!!!
  • And you get what you pay for
  • Just the parser is free, there is so much more to the application than the parser
  • And there is XSL in two flavors
    • XSLT for data transformation
    • XSL-FO for data formatting
    • Unproven technology with horrendous syntax
    • Expectations exceed language’s potential
  • Ah, but it’s free!!!

J.V.G. Consulting Services © 2005

clean up sgml
Clean Up SGML
  • Eliminate features that time has proven irrelevant
  • Adjust Basic Concrete Syntax to meet today’s needs
  • Improve support for multiple DTDs
  • Simplify parsing requirements wherever possible
  • Add <!DATATYPE …> declaration to DTD
    • For die-hards who think its necessary
  • Incorporate XML variants (already done)
    • Web SGML Annex

J.V.G. Consulting Services © 2005

xml extensions to sgml
XML Extensions to SGML
  • HCRO delimiter (for hex numeric character references); for XML this is &#x
  • EMPTYNRM feature that allows elements declared EMPTY to have end-tags
  • NESTC delimiter (NET-enabling start-tag close) – permits empty tag, e.g. <tag/>
  • Duplicate enumerated attribute tokens are allowed
    • Goldfarbism, specifically rejected by committee
  • Relaxation of rules on use of parameter entity references inside groups
    • The rules were wrong anyway
  • Multiple ATTLIST declarations for a single element type
  • ATTLIST declarations which don't declare any attributes
    • What? Must be a programming thing
  • KEEPRSRE feature that turns off SGML's rules for ignoring RSs and REs
    • The rules are inconsistent anyway and sometimes outright wrong
  • Fully-tagged SGML documents need not be type-valid
    • This makes all XML documents, including those that are well-formed but not valid, conforming SGML documents
  • Predefined data character entities in the SGML declaration
    • (for ampersand, less than, and so on)
  • Unlimited capacities and quantities

J.V.G. Consulting Services © 2005

extend xml
Extend XML
  • Add publishing features that time has proven relevant
    • Support different character sets
      • ASCII is still around
    • External entity references in content
    • Catalog support for Public Identifiers
    • Full Marked Section support
    • Continued support for the DTD
      • Add <!DATATYPE … > declaration
  • Adjust Basic Concrete Syntax to meet publishing’s needs

J.V.G. Consulting Services © 2005

use the right approach
Use the Right Approach
  • SGML is for Text Processing
    • Data capture and editorial processing
    • Complex entity management
      • Reusable text entities
      • Illustration management
  • XML is for delivery over the Web
    • Designed for the Web environment
    • Converts easily to HTML

J.V.G. Consulting Services © 2005

avoid the hype
Avoid the Hype
  • SGML is an obsolete technology
    • Proven over 20 years in the field for practical applications
    • Adaptable to meet wide array of text processing needs
  • DTDs are bad for your programmers
    • Parsers already exist
    • Proven methodology
    • User friendly

J.V.G. Consulting Services © 2005

more hype
More Hype
  • XML is better than SGML
    • XML IS SGML
    • XML is a SUBSET of SGML
      • More limited; less capabilities
  • XSDs will replace DTDs
    • For data processing applications, why not
    • Overkill for text applications
      • And more restrictive
      • No entity references!

J.V.G. Consulting Services © 2005

what should publishers do
What Should Publishers Do?
  • Stop ignoring your SGML vendor
    • Use the four-letter acronym, it’s OK
    • SGML is your friend
    • Tell your boss it’s XML
      • Can they tell the difference?
      • It’s got angle brackets, doesn’t it?
  • Use your clout to get what you need
    • Stop following the leader (remember the lemmings)
    • Where do you want to go today? (MS jingle in background)
  • Don’t accept the programmer’s position as your own
    • You have different needs
    • Free parsers do not make better applications

J.V.G. Consulting Services © 2005

xml for publishing
XML for Publishing
  • XML is going off on a tangent that does not benefit publishers
    • Data processing advocates have hijacked it for their own purposes
  • Demand the features you require to support your applications
  • XML is not an easier way to do text processing for publishing

J.V.G. Consulting Services © 2005

conclusion
Conclusion
  • SGML is a viable methodology for text applications, especially publishing
  • Market forces have caused it to be denigrated
  • SGML needs to be modernized to overcome its poor image
  • Call it XML if you must, but make sure the features you need are there
    • Help your vendors understand your needs
    • Keep asking for support for the missing features or else they will go away

J.V.G. Consulting Services © 2005

xml has its place too
XML Has Its Place Too
  • XML is a wonderful technology for the Web
  • It should (but it won’t) replace HTML
    • Separates content from format
    • Content is well-structured with a defined tag set
    • Cascading style sheets work!
  • Insufficient browser support
    • Converts easily to HTML
  • XML is drifting away from publishing’s needs
    • Work together to keep it on course

J.V.G. Consulting Services © 2005

the end
The END
  • J.V.G. Consulting Services
  • Joseph V. Gangemi
  • Senior Consultant
  • JVGangemi@comcast.net
  • 856-809-0517

J.V.G. Consulting Services © 2005