1 / 66

Combined XML, SGML Issues

Combined XML, SGML Issues. William J. ‘Bill’ McCalpin MIT, LIT, CDIA, EDP AIIM 2002 - March 6, 2002. About MHE. MHE is the “print2image2Internet” consulting firm

Download Presentation

Combined XML, SGML Issues

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combined XML, SGML Issues William J. ‘Bill’ McCalpin MIT, LIT, CDIA, EDP AIIM 2002 - March 6, 2002 MHE - the print2image2Internet consultants

  2. About MHE • MHE is the “print2image2Internet” consulting firm • MHE’s principals have nearly 40 years of experience in electronic print streams, in taking electronic print streams to imaging systems, and now in taking legacy information to the Internet • See http://www.mhe-consulting.com MHE - the print2image2Internet consultants

  3. About the Speaker • William J. ‘Bill’ McCalpin is a principal at MHE • Mr. McCalpin was the first - and for years the only - person in the world to have the MIT, LIT, CDIA, and EDP designations • Mr. McCalpin serves on the AIIM Accreditation Committee and AIIM Conference Committee MHE - the print2image2Internet consultants

  4. About the Speaker (cont.) • Mr. McCalpin is on the Xplor Board of Directors and is Treasurer • Mr. McCalpin recently completed a two-year stint as Xploration Editor-in-Chief • Mr. McCalpin is a frequent speaker at both AIIM and Xplor MHE - the print2image2Internet consultants

  5. What Do You Say When They Ask You, “When Are You Going To Support XML?” MHE - the print2image2Internet consultants

  6. But The Real Question Is, “Why Should I Support XML?” MHE - the print2image2Internet consultants

  7. Agenda • What is XML? • What do we do in “e-Business”? • When do you want to use XML? • The Right Way and the Wrong Way to use XML • The Flow of Information • The XML Bubble • The answer to “when” and “why” MHE - the print2image2Internet consultants

  8. What is XML? MHE - the print2image2Internet consultants

  9. XML And SGML • XML is eXtensible Markup Language • XML is an instance of SGML, Standard Generalized Markup Language, an ISO standard (ISO 8879) • XML is “extensible” because people and enterprises with common interests get together to define the tags which describe their data MHE - the print2image2Internet consultants

  10. XML and HTML • HTML is a tagged language, but the tags are 40 or 50 “grammatical” tags like <p> or <h1> • XML is a tagged language, and the tags are (usually) created and agreed to by “domains” or vertical industry segments. E.g. <account_number> or <city> MHE - the print2image2Internet consultants

  11. The ‘Document’ • A document is “an organized collection of information in time” • A document contains information which can be understood by human or machine, and has validity at some period in time • The information in a document can be organized in many ways - as text, bitmaps, print streams, tagged languages, etc. MHE - the print2image2Internet consultants

  12. The New Document • Per this definition, the document • does not depend on which organization of the information is used (so long as author and recipient agree) • does not depend on the medium (paper, film, optical, magnetic or even parchment are all fine) • does not have to have presentation information, because the recipient may be a machine MHE - the print2image2Internet consultants

  13. Three Parts of an XML ‘Document’ Tagged Data (in XML) Tag Definitions (in DTD or Schema) Presentation (in XSL or CSS) MHE - the print2image2Internet consultants

  14. The XML Document • Data - data values bounded by XML tags • Presentation: • CSS - Cascading Style Sheets, like for HTML • XSL - format information in XML • Tag Definitions: • DTD - Document Type Definitions - old SGML definition • Schema - definitions in XML MHE - the print2image2Internet consultants

  15. Data In the XML Document • Data is the purpose of an XML document • Each piece of data is specifically identified by a tag • Data is organized because the tags match patterns in the DTD or Schema • An example of data in XML: MHE - the print2image2Internet consultants

  16. Data Example in XML <AUTHOR> <NAME>William J. "Bill" McCalpin, EDPP, CDIA, MIT, LIT</NAME> <JOBTITLE>Principal</JOBTITLE> <AFFILIATION>MHE</AFFILIATION> <ADDRESS> <STREET>1400 Cheyenne Dr.</STREET> <CITY>Richardson</CITY> <STATE>Texas</STATE> <ZIPCODE>75080</ZIPCODE> <EMAIL>mccalpin@mhe-consulting.com</EMAIL> </ADDRESS> </AUTHOR> MHE - the print2image2Internet consultants

  17. Presentation in XML • Tags in XML don’t have natural formatting (unlike HTML), so if presentation is needed, it must be explicitly defined • CSS can be used for HTML and XML • XSL can be parsed by an XML parser, and it can be used by XML and XSLT • XSL example: MHE - the print2image2Internet consultants

  18. Presentation Example • <?xml version="1.0"?> • <xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> • <xsl:template match="author"> • <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0”... <TR> • <TD COLSPAN="2"> • <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0”... • <FONT COLOR="#000000"><xsl:value-of select="name"/></FONT> • </TD> • ... • </xsl:template> • </xsl:stylesheet> MHE - the print2image2Internet consultants

  19. Why Two Style Sheet Languages? MHE - the print2image2Internet consultants

  20. DTD/Schema in XML • The DTD is the “old” (SGML) way of defining not only what tags are valid, but their relative order, number, mandatory/optional attributes, and so on • The Schema is a total rewrite - written in XML itself - which defines all of the above as well as possible legal values for a tag (e.g., integer, date, days of the week, etc.) MHE - the print2image2Internet consultants

  21. Schema Example • <?xml version="1.0"?> • <Schema name="sample_schema" ...> • ... • <!-- ********** Element Types ************ --> • <!-- *** data *** --> • <ElementType name="author"> • <element type="name" minOccurs="1" maxOccurs="1"/> • </ElementType> • ... • </Schema> MHE - the print2image2Internet consultants

  22. What do we do in “e-Business”? MHE - the print2image2Internet consultants

  23. What is “e-Business”? • Of course, e-Business is really just doing business using 100% electronic methods such as the Internet • In e-Business, we do transactions or exchange information using electronic media rather than the usual paper media • e-Business can broken down into two parts: • B2C • B2B MHE - the print2image2Internet consultants

  24. B2C • B2C is “Business to Consumer” • Your business generates the information, and a consumer receives it • The consumer is normally interested only in the data and its presentation • Thus, in this scenario, the consumer needs only an XML document and CSS/XSL - which is more or less the same as HTML! MHE - the print2image2Internet consultants

  25. Important Fact #1 • When you are engaged in B2C, and the recipient is a consumer with a “thin” client, then HTML is usually sufficient • Supplying the data in XML is usually a waste of time, because the recipient gets no additional value from the XML over HTML • XHTML is just HTML which is XML compliant MHE - the print2image2Internet consultants

  26. B2B • B2C is “Business to Business” • Your business generates the information, and another business receives it • Frequently, the recipient is not a person, but a software process in the business • Thus, in this scenario, the recipient often needs only the XML data and the reference to the DTD or Schema - no presentation may be needed! MHE - the print2image2Internet consultants

  27. Important Fact #2 • When you are engaged in B2B, and the recipient is a software process, then XML is often the most appropriate format • Binary data formats may be smaller, but will require more work and more maintenance • Don’t send presentation information unless the recipient actually wants your presentation information! MHE - the print2image2Internet consultants

  28. When do you want to use XML? MHE - the print2image2Internet consultants

  29. When Do I Use XML? • As we have seen, XML is best suited for the preservation of the “author’s” content • And (X)HTML is best suited for presentation of information to an enduser • And this leads us to... MHE - the print2image2Internet consultants

  30. Important Fact #3 • In today’s market: • XML is better utilized when communicating with a “thick” client - that is, most B2B in which a software process is the recipient • (X)HTML is better utilized when communicating with a “thin” client - that is, most B2C in which an Internet browser is the recipient • And when is this not true? MHE - the print2image2Internet consultants

  31. Exceptions to Fact #3 • XML can be used in B2C when the browser is used with so much Java and other local applications that the overall process resembles a thick client • (X)HTML can be used in B2B if the recipient is just a human being rather than a software process, e.g., when information is transmitted only to be viewed MHE - the print2image2Internet consultants

  32. The Right Way And The Wrong Way To Use XML MHE - the print2image2Internet consultants

  33. CML Chemical Markup Language • One of the early “vertical” implementations of XML • The official site is http://www.xml-cml.org/ • A “better” site is http://www.ch.ic.ac.uk/chimeral/ • CML uses the trio of tagged data, Schema, and XSL MHE - the print2image2Internet consultants

  34. A CML XML Document <molecule title="caffeine" id="mol_caffeine"> <formula>C8 H10 N4 O2</formula> <string title="CAS">58-08-2</string> ... </molecule> MHE - the print2image2Internet consultants

  35. The CML Schema • <?xml version="1.0"?> • <Schema name="cml_dev_karne" xmlns="urn:schemas-microsoft-com:xml-data" xmlns:dt="urn:schemas-microsoft-com:datatypes"> • ... • <!-- ********** Element Types ************ --> • <!-- *** data *** --> • <ElementType name="molecule" content="eltOnly" model="open" order="many"> • <element type="formula" minOccurs="0" maxOccurs="*"/> • ... MHE - the print2image2Internet consultants

  36. A CML Stylesheet • <xsl:template match="molecule"> • <TABLE WIDTH="100%" BORDER="1" CELLSPACING="0" CELLPADDING="3" BORDERCOLOR="#CCCCFF" BGCOLOR="#EEEEFF"> • <TR> • <TD COLSPAN="2"> • <FONT COLOR="#0000AA">Formula • <FONT COLOR="#000000"><xsl:value-of select="formula"/></FONT></TD><TD> • ... MHE - the print2image2Internet consultants

  37. The CML Document • Note that each data item is tagged • Note that each tag matches the standard Schema • Note that the data is used to create a complex image in the browser - but not the only possible image! MHE - the print2image2Internet consultants

  38. A Print to XML/HTML Conversion • Print stream does not contain any metadata, only data and presentation information • Tags cannot be meaningful unless they are reverse-engineered • The result might be only the tagged data and the stylesheet • Too often, the XML looks like: MHE - the print2image2Internet consultants

  39. Bad XML Example • /* text positioning information */ • .ps0{position:absolute;top:533px;left:29px;width:40px;} • .ps1{position:absolute;top:533px;left:317px;width:38px;} • .ps2{position:absolute;top:533px;left:454px;width:90px;} • ... • /* font properties information */ • .ft1{font-weight:bold;font-size:22px;} • .ft2{font-size:17px;} • .ft3{font-size:11px;} • <!-- text starts here --> • <SPAN CLASS="ps0"><NOBR>Account Number</NOBR></SPAN> • <SPAN CLASS="ps1"><NOBR>12345</NOBR></SPAN> • <SPAN CLASS="ps2"><NOBR>Name</NOBR></SPAN> • ... MHE - the print2image2Internet consultants

  40. An Image to XML Example • Most information may not be tagged • <invoice> • <account_no>12345</account_no> • <name>Bill McCalpin</name> • <data>70 02 02 02 02 FE A7 47 47 48 03 F9 A7 42 27 4A 74….</data> • </invoice MHE - the print2image2Internet consultants

  41. The Flow of Information MHE - the print2image2Internet consultants

  42. The Flow of Information • E-Business is about the flow of information between parties as well as within the enterprise • Traditionally, as information moves through the business process, we lose as much information as we add • Look at how we used to treat information: MHE - the print2image2Internet consultants

  43. As Information Flow Used to Be MHE - the print2image2Internet consultants

  44. As Information Flow Used To Be Data Data Toner on paper Data awareness (metadata) Presentation information Scan Composer X’010101’(bits) Archive Zap! MHE - the print2image2Internet consultants

  45. As Information Flow Is Today MHE - the print2image2Internet consultants

  46. As Information Flow Is Today Data Data Web page, emails, etc. Data awareness (metadata) Presentation information Transform Composer Text and graphics PDF Zap! MHE - the print2image2Internet consultants

  47. As Information Flow Should Be MHE - the print2image2Internet consultants

  48. As Information Flow Should Be email Data Data Data awareness (metadata) Data awareness (metadata) WAP Complete XML documents Web page Presentation information archive paper User MHE - the print2image2Internet consultants

  49. Or, As In The XML Bubble... Web page Process Add presenta- tion Data & metadata email Data & metadata Data & metadata Process Cell phones B2B applica- tions Archive MHE - the print2image2Internet consultants

  50. Important Fact #4 • Use XML to delay the loss of important information • Don’t throw away information until you commit the document to a final format which can’t support it • In other words, keep the information in XML as long as possible MHE - the print2image2Internet consultants

More Related