480 likes | 593 Views
Explore the evolution and impact of document technologies, from legacy systems to XML integration, addressing data formats, literacy, and the role of intelligent agents in document processing.
E N D
The XML Bubble William J. “Bill” McCalpin EDPP, CDIA, MIT, LIT Principal, MHE MHE - Consultants for Document and Datament Technologies
Xplor 21st Global Conference and Exhibit Miami Beach, Florida October 30, 2000 MHE - Consultants for Document and Datament Technologies
Introduction The Hegelian Dialectic MHE - Consultants for Document and Datament Technologies
Thesis, Antithesis, Synthesis In the philosophy of Hegel, these words show the inevitable transition of thought, by contradiction and reconciliation, from an initial conviction to its opposite and then to a new, higher conception that involves but transcends both of them MHE - Consultants for Document and Datament Technologies
The Hegelian Dialetic • Thesis: Most business have well-established, productive legacy systems • Antithesis: XML is springing forth everywhere • Synthesis: XML will be integrated with legacy systems - enhancing some processes, changing many others, and eliminating some altogether • In short, XML will affect what you do MHE - Consultants for Document and Datament Technologies
The Document In The 20th Century MHE - Consultants for Document and Datament Technologies
What Is A Document? • The American Heritage Dictionary defines a document as “information in writing placed on a medium such as paper, often used as a record.” • Documents have been placed on clay tablets, gold leaf, animal skins, all types of paper, microfilm, optical storage, and so on MHE - Consultants for Document and Datament Technologies
Information And Presentation • In every case, the document represents a fundamental union of information and presentation • But “presentation” presumes that the primary audience for the document is a human being • With the coming of the Internet, this is no longer the case MHE - Consultants for Document and Datament Technologies
The Curse Of Presentation • Composition products require that you specify a printer, even before you know where the document will print MHE - Consultants for Document and Datament Technologies
Why Are Print, Image, And Presentation Formats Incompatible? MHE - Consultants for Document and Datament Technologies
Printing And Imaging Formats • Many printing formats: AFP, Metacode, DJDE, XES (UDK), PostScript, PCL, etc. • All formats use external resources like fonts, forms, graphics, etc., although sometimes inconsistently • Most are escape-sequence based, some are formal data architectures, and some are almost programming languages MHE - Consultants for Document and Datament Technologies
Printing And Imaging Formats • Many imaging formats - while most used CCITT Group 4 for image compression, most also had proprietary data wrappers • Later systems adopted text-based formats such as PDF, although storing other print streams is not unknown • Systems which store text-based formats must wrestle with resource issues MHE - Consultants for Document and Datament Technologies
Different Print Formats • Why do printers have different formats? Because of physical constraints imposed by the hardware: • resources reduce the amount of data sent through pipeline to printer • pages must be imaged in less than a fraction of a second • complex graphics can be developed on the printer, but this needs a special language MHE - Consultants for Document and Datament Technologies
Different Imaging Formats • Why do imaging systems have different formats: because of physical constraints imposed by the hardware: • Mass storage was expensive • Indexing schemes were too close to the application • Text is avoided sometimes because of resource issues • Interoperability with other products an issue MHE - Consultants for Document and Datament Technologies
Result • In each case, data architecture decisions were made in order to enhance some aspect of legibility of the stored objects. • If there were no requirement to present the information (to a human reader), then the requirement for custom data formats for each vendor would probably disappear! MHE - Consultants for Document and Datament Technologies
Universal Literacy Who’s reading our documents? MHE - Consultants for Document and Datament Technologies
The Road To Universal Literacy • First, only the few could read • After the printing press, the many began to read • Eventually, educational reforms brought the ability to read to all MHE - Consultants for Document and Datament Technologies
Literacy In The Internet Age • Can there be a spread of literacy beyond “all”? • How many webpages have you ever read? • You will never be able to keep up with the Web – alone MHE - Consultants for Document and Datament Technologies
Intelligent Agents • Just around the corner is software that will read the Web for us – not search, but read • So we have to spread literacy to an audience beyond “all” – people, that is • Does increased quality in presentation mean better computer literacy? MHE - Consultants for Document and Datament Technologies
Noise On The Net • Think of the average webpage: • three dimensional spinning objects • marquees scrolling across the bottom • multiple frames bookmarks • audio • These items are all designed to attract the eye – your eye • This does nothing for the machine reading the webpage MHE - Consultants for Document and Datament Technologies
The Cost Of Data Differences “NASA lost a $125 million Mars orbiter because one engineering team used metric units while another used English units for a key spacecraft operation...” CNN 9/30/99 MHE - Consultants for Document and Datament Technologies
The Nature Of XML MHE - Consultants for Document and Datament Technologies
XML And SGML • XML is eXtensible Markup Language • XML is an instance of SGML, Standard Generalized Markup Language, an ISO standard (ISO 8879) • XML is “extensible” because people and enterprises with common interests get together to define the tags which describe their data MHE - Consultants for Document and Datament Technologies
XML And Print Formats • In most print formats, something like account number would be: • AMB 200 AMI 300 SCFL 01 STO 0, 90 TRN 12345-67890 • In XML, the same information is: • <account_number>12345-67890 MHE - Consultants for Document and Datament Technologies
XML And Print Formats • The nature of all print formats is to be focused on the presentation of the information. • The nature of XML is focused on the “author’s content”, that is, information is described as what it is, not how it looks. MHE - Consultants for Document and Datament Technologies
Why XML Over Print? • Given that print formats are focused on the presentation, it is often difficult for the non-human reader to derive information out of the print data. • E.g., we could have: • AMB 200 AMI 300 SCFL 01 STO 0,90 TRN 12345 RMI120 TRN - RMI 24 TRN 67890 • Note the data is not required to be contiguous MHE - Consultants for Document and Datament Technologies
XML enables the total separation of information from presentation Thus, some XML objects have only tagged information, while others have content and presentation information Separating Information From Presentation XML XML XSL MHE - Consultants for Document and Datament Technologies
The Four Spaces MHE - Consultants for Document and Datament Technologies
Dr. Davidson’s DocumentSpace • Dr. Keith Davidson, EDPP, hypothesized that we work in something called the “DocumentSpace” • He believes that industries will become spaces under the influence of the Internet MHE - Consultants for Document and Datament Technologies
Three Spaces • Dr. Davidson stated that there were three spaces: PrintSpace, MarketSpace, and DecisionSpace • PrintSpace comprised our existing industry • MarketSpace covered documents used in financial transactions • DecisionSpace deals with documents used in knowledge management MHE - Consultants for Document and Datament Technologies
Three Spaces Become Four • I have added a fourth space: ArchiveSpace, the use of documents in archival and records management to preserve information • These four spaces can be viewed as ---> MHE - Consultants for Document and Datament Technologies
The Use Of The Document In The Four Spaces MHE - Consultants for Document and Datament Technologies
Document And Information • The document is used as a container of information, particularly in the exchange of information across the boundaries between the four spaces • Documents are used for two reasons: • (1) The lack of common data standards across the four spaces, and • (2) The requirement that humans be able to read and process the information MHE - Consultants for Document and Datament Technologies
Print To Image MHE - Consultants for Document and Datament Technologies
Print To Image Format • Print formats are Metacode, DJDE, AFP, PCL, PostScript, and so on • Image formats are TIFF, MO:DCA, other proprietary formats using CCITT-4, and PDF • Only AFP & MODCA, and PostScript & PDF are closely related, but PostScript to PDF requires a transform, and AFP and MO:DCA often aren’t implemented the same MHE - Consultants for Document and Datament Technologies
Print To Market MHE - Consultants for Document and Datament Technologies
Print To Market Formats • Print formats are Metacode, DJDE, AFP, PCL, PostScript, and so on • Financial Interchange formats are OFX/IFX, XML, and “transaction” data • The significant data must be extracted out of the print stream to create data for SGML formats - a sometimes hazardous process • However, using original transaction data may not be correct MHE - Consultants for Document and Datament Technologies
Print To Knowledge MHE - Consultants for Document and Datament Technologies
Print To Knowledge Formats • Print formats are Metacode, DJDE, AFP, PCL, PostScript, and so on • True Knowledge Management does not yet exist - it’s often blob management • XML and its many related standards will make KM possible, if you think of KM as something like human knowledge • As noted, XML out of existing processes can be hazardous MHE - Consultants for Document and Datament Technologies
The Growth Of The XML Bubble MHE - Consultants for Document and Datament Technologies
Com- pliance Archive New Sales Reprints Policy Print Reports Notices CRM 1:1 Mark. Campaign Manage. Billing HR Pol. & Proc. EDI MHE - Consultants for Document and Datament Technologies
Com- pliance Archive New Sales Reprints Policy Print Reports Notices CRM 1:1 Mark. Campaign Manage. Billing HR Pol. & Proc. EDI XML EBPP MHE - Consultants for Document and Datament Technologies
Com- pliance Archive New Sales Reprints Policy Print Reports Notices CRM 1:1 Mark. Campaign Manage. XML Billing HR Pol. & Proc. EDI EBPP MHE - Consultants for Document and Datament Technologies
Com- pliance Archive New Sales Reprints Policy Print Reports Notices CRM 1:1 Mark. Campaign Manage. XML Billing HR Pol. & Proc. EDI EBPP MHE - Consultants for Document and Datament Technologies
Com- pliance Archive New Sales Reprints Policy Print Reports Notices CRM 1:1 Mark. Campaign Manage. XML Billing HR Pol. & Proc. EDI EBPP MHE - Consultants for Document and Datament Technologies
Com- pliance Archive New Sales Reprints Policy Print Reports Notices CRM 1:1 Mark. Campaign Manage. XML Billing HR Pol. & Proc. EDI EBPP MHE - Consultants for Document and Datament Technologies
Com- pliance Archive New Sales Reprints Policy Print Reports Notices CRM 1:1 Mark. Campaign Manage. XML Billing HR Pol. & Proc. EDI EBPP MHE - Consultants for Document and Datament Technologies
William J. “Bill” McCalpin EDPP, CDIA, MIT, LIT Principal, MHE 1400 Cheyenne Dr. Richardson, Texas 75080-3921 972-231-3660 (v) 972-690-4521 (f) mccalpin@mhe-consulting.com MHE - Consultants for Document and Datament Technologies