470 likes | 573 Views
Explore the evolution of document presentation from traditional to electronic formats, the significance of printing in the internet era, & future internet formats.
E N D
Traditional Electronic PrintingOn The Internet William J. “Bill” McCalpin EDPP, CDIA, MIT, LIT Principal, MHE MHE - Consultants for Document and Datament Technologies
Xplor 21st Global Conference and Exhibit Miami Beach, Florida October 30, 2000 MHE - Consultants for Document and Datament Technologies
Printing Versus The Internet MHE - Consultants for Document and Datament Technologies
Printing Versus The Internet • Electronic printing is an $125,000,000,000 (US) industry worldwide (www.xplor.org) • There are now an estimated 98,685,000 host computers on the Internet (www.mids.org) • Xplor International estimates that the production of paper documents and electronic documents is still increasing • So, for a while yet, we’re living in a hybrid world MHE - Consultants for Document and Datament Technologies
Printing Versus The Internet • Customer service needs identical look and feel in paper and electronic documents • Regulatory agencies continue to have an interest in document presentation • Customers need a re-education process as documents change media • Hence, there are good reasons in the short run to be concerned about presentation MHE - Consultants for Document and Datament Technologies
The Nature Of Print Streams MHE - Consultants for Document and Datament Technologies
EBCDIC Versus ASCII • BCD - Binary Coded Decimal • BCDIC - Binary Coded Decimal Interchange Code • EBCDIC - IBM Extended Binary Coded Decimal Interchange Code • ASCII - American Standard Code for Information Interchange MHE - Consultants for Document and Datament Technologies
EBCDIC Line Data • EBCDIC encoded - 8 bit • Record-oriented because of IBM OS’s • Carriage controls • Machine carriage controls • ANSI carriage controls MHE - Consultants for Document and Datament Technologies
ASCII Line Data • ASCII encoded - 7 bit • ‘Record’ orientation is not intrinsic to OS • Text files use print controls to delimit records • Common print controls • x’0d’ carriage return • x’0a’ line feed • x’0c’ form feed MHE - Consultants for Document and Datament Technologies
The EBCDIC Family Tree • EBCDIC text • 1403 data - EBCDIC records with a carriage control • LCDS - ‘Line conditioned’ data stream • 3800 Mod I • 3211 data with Xerox DJDEs • Others • AFP, MO:DCA, and IPDS MHE - Consultants for Document and Datament Technologies
The ASCII Family Tree • ASCII text • ASCII text with print controls • ASCII text with escape sequences Epson MX-80 Xerox UDK (XES) QMS QUIC IBM PPDS HP PCL Xerox Metacode • Print programming languages using ASCII Interpress PostScript MHE - Consultants for Document and Datament Technologies
1403, 3211, other EBCDIC line data streams, including Xerox DJDE 3800 Mod I and other IBM data streams ASCII text files of all sorts 1 This is text F44444E88A48A4A8AA 100000389209203573 FCL F This is textRF 02222256672672767700 C00000489309304584DA Line Data And Conditioned Line Data MHE - Consultants for Document and Datament Technologies
Epson and many other impact printers Xerox UDK (XES) QMS QUIC IBM PPDS HP PCL Xerox Metacode AFP, MO:DCA, and IPDS X’01060001040002000154686973206973207465787401’ AMB 100 AMI 300 STO 0,90 SCFL 3 SVI 14 TRN “This is text” Print Data With Escape Sequences MHE - Consultants for Document and Datament Technologies
Interpress PostScript (and PDF) %!PS-Adobe-2.0 %%Title: Blue Book Program 7, on page 157 %%EndComments/Times-Roman findfont 18 scalefont setfont 72 500 moveto (This is text) show ... Print Programming Languages MHE - Consultants for Document and Datament Technologies
The Nature Of Internet Formats MHE - Consultants for Document and Datament Technologies
Common Internet Formats • The most commonly used data format on the Internet is HTML - HyperText Markup Language • The next expected wave on the Internet is XML (eXtensible Markup Language) and its related standards such as XSL, SVG, etc. • As a secondary standard, PDF is widely used to present static documents MHE - Consultants for Document and Datament Technologies
HTML • HTML is an instance of SGML • HTML has a set of 40 to 50 tags, which are “grammar” based • HTML tags have default presentation characteristics, but these can be overridden with CSS (Cascading Style Sheets) MHE - Consultants for Document and Datament Technologies
Sample HTML <!doctype html public "-//w3c//dtd html 4.0 transitional//en"> <html> <h1>Poison Ivy Vineyards</h1> <p>Poison Ivy Vineyards is an experiment in growing wine-quality grapes in a backyard in a residential neighborhood in Richardson, Texas. This website serves as a running diary of the steps I took to create the vineyard and - eventually - to make wine.</p> </html> MHE - Consultants for Document and Datament Technologies
XML • XML is eXtensible Markup Language, which means that you can make up the tags • Since a browser can’t know how to format the tags, default formatting is in outline form • Normally, you would use XSL (CSS) to describe how each tag is to be formatted MHE - Consultants for Document and Datament Technologies
Sample XML <NAME>William J. "Bill" McCalpin, EDPP, CDIA, MIT, LIT</NAME> <JOBTITLE>Principal</JOBTITLE> <AFFILIATION>MHE</AFFILIATION> <ADDRESS> <STREET>1400 Cheyenne Dr.</STREET> <CITY>Richardson</CITY> <STATE>Texas</STATE> <ZIPCODE>75080</ZIPCODE> <EMAIL>mccalpin@mhe-consulting.com</EMAIL> </ADDRESS> MHE - Consultants for Document and Datament Technologies
Sample XSL This is an <emph>important</emph> point. <xsl:template match="emph”> <fo:sequence font-weight="bold”> <xsl:process-children/> </fo:sequence> </xsl:template> MHE - Consultants for Document and Datament Technologies
PDF • PDF is Adobe’s Portable Document Format • PDF is a print stream, not an SGML instance • PDF is similar to PostScript, but more portable, because it carries its own resources • PDF provides good fidelity, at a price MHE - Consultants for Document and Datament Technologies
Sample PDF %PDF-1.1 ... 2 0 obj << /CreationDate (D:19960809191047) /Producer (Acrobat Distiller 2.1 for Windows) /Creator (Adobe PageMaker 6.0) /Author (Doc) /Keywords () /Title (bills) /Subject () >> endobj MHE - Consultants for Document and Datament Technologies
Limits Of Browsers MHE - Consultants for Document and Datament Technologies
A Normal HTML Page MHE - Consultants for Document and Datament Technologies
Default Font Increased MHE - Consultants for Document and Datament Technologies
Using Ghouly Solid MHE - Consultants for Document and Datament Technologies
Adjusting The Fonts MHE - Consultants for Document and Datament Technologies
Methods Of Moving Traditional Electronic Print To The Internet MHE - Consultants for Document and Datament Technologies
Five Methods • Conversion to PDF • Rasterization to gif or jpeg • Recomposition into HTML/XML • “Conversion” to normal HTML/XML • Translation to highly formatted HTML/XML MHE - Consultants for Document and Datament Technologies
Conversion to PDF • This is a print stream to print stream conversion • The output in PDF usually looks very similar to the original printed document • Many tools which create the PDF also add value, such as hypertext links, bookmarking, et cetera, to the PDF document MHE - Consultants for Document and Datament Technologies
Pros And Cons Of PDF • Pros • High fidelity to original document • Reader is widespread and free • Reasonably transportable • Widely used in some circles (e.g., IRS) • Cons: • PDF files tend to be large • PDF documents are paper-sized centric • Browser requires a “plug-in”* MHE - Consultants for Document and Datament Technologies
%PDF-1.1 ... 2 0 obj << /CreationDate (D:19960809191047) /Producer (Acrobat Distiller 2.1 for Windows) /Creator (Adobe PageMaker 6.0) /Author (Doc) /Keywords () /Title (bills) /Subject () >> endobj PDF Sample MHE - Consultants for Document and Datament Technologies
Sources For * To PDF • Composition Tools - create new PDF documents from source code • Transforms - translate existing formatted print streams into PDF • Larger Systems- composition or translation capabilities inserted transparently into document systems • See Xplor Products and Services Reference Guide MHE - Consultants for Document and Datament Technologies
Rasterization to gif or jpeg • The print stream is”rasterized”, that is, converted to a bit map format • GIF: Graphical Interchange Format (GIF) - Invented by CompuServe for graphics. Supports only 256 colors, or 8 bits. • JPEG (Joint Photographic Experts Group) Specifically for more than 256 colors, with better compression, but is “lossey” • Excellent discussion of each at http://www.efuse.com/Design/web_graphics_basics.html MHE - Consultants for Document and Datament Technologies
Pros And Cons Of Rasterization • Pros: • Image is exact copy of original document • Image can be viewed on any browser which takes gifs and jpegs • Cons: • Resolution is hardcoded at one size • There’s no text to search • Download is longer • No correspondence of printed pages and “HTML” pages MHE - Consultants for Document and Datament Technologies
Sample Rasterization • This page was originally created in PDF, then rasterized, and converted to a jpeg MHE - Consultants for Document and Datament Technologies
Recomposition into HTML/XML • Data is extracted from a print stream • Templates have been created in advance • The extracted data is merged into the templates • There may be fewer or more output pages in HTML than were in the print stream • Templates are built to be the most effective in the browser window MHE - Consultants for Document and Datament Technologies
Pros And Cons Of Recomposition • Pros: • HTM/XMLL pages are well-suited for the browser • HTML/XML is considered by some to be simpler than PDF • Cons: • HTML/XML pages don’t necessarily match the printed pages • All pages (templates) must be pre-composed MHE - Consultants for Document and Datament Technologies
This document is a sample telephone bill which have been divided into 11 HTML pages Note how the HTML pages are divided by subject, not by page overflow Sample Recomposition MHE - Consultants for Document and Datament Technologies
“Conversion” to normal HTML/XML • Both data and formatting information are extracted from the print file • Some formats easily correspond to an HTML tag, e.g., a heading to <h1> • More complex formatting can be approximated by the use of table tags MHE - Consultants for Document and Datament Technologies
Pros And Cons of “Conversion” • Pros: • HTML/XML pages look similar to printed pages • Pages are in HTML/XML, not PDF or raster • Cons: • Fidelity is approximate • Reader can substantially alter the presentation • Graphics may not be supported MHE - Consultants for Document and Datament Technologies
Sample “Conversion” MHE - Consultants for Document and Datament Technologies
Translation to highly formatted HTML/XML • This method uses particular CSS commands to do “exact” placement of text in the window of the browser • This is as close as XML gets (today) to being a print stream • Fonts are still subject to user override MHE - Consultants for Document and Datament Technologies
Pros And Cons Of Translation • Pros: • Author has very good control over the presentation of text • Cons: • Much of the value of a tagged language is lost • Portrait print pages still don’t fit on landscape browser windows • May not work with all browsers • Fonts can still be overridden MHE - Consultants for Document and Datament Technologies
Sample Translation • <HTML> • <HEAD> • .ps9{position:absolute;top:676px;left:454px;width:65px;} • .ps10{position:absolute;top:676px;left:535px;width:66px;} • .ps11{position:absolute;top:676px;left:1102px;width:70px;} • <SPAN CLASS="ps9"><NOBR>Balance</NOBR></SPAN> • <SPAN CLASS="ps10"><NOBR>Forward</NOBR></SPAN> • <SPAN CLASS="ps11"><NOBR>5,000.00</NOBR></SPAN> MHE - Consultants for Document and Datament Technologies
William J. “Bill” McCalpin EDPP, CDIA, MIT, LIT Principal, MHE 1400 Cheyenne Dr. Richardson, Texas 75080-3921 972-231-3660 (v) 972-690-4521 (f) mccalpin@mhe-consulting.com MHE - Consultants for Document and Datament Technologies