1 / 15

Text processing

Text processing. Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 12.1. Desktop publishing. Traditional publication systems: WYSIWYG - What you see is what you get Typewriters examples of early WYSIWYG systems

luann
Download Presentation

Text processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text processing Programming Language Design and Implementation (4th Edition) by T. Pratt and M. Zelkowitz Prentice Hall, 2001 Section 12.1

  2. Desktop publishing Traditional publication systems: • WYSIWYG - What you see is what you get • Typewriters examples of early WYSIWYG systems • More complex today - Multiple fonts, colors, embedded graphics • Need for embedded commands to describe layout of document Three approaches to desktop publishing: • WYSIWYG • Page description languages • Document compiling

  3. WYSIWYG A common approach in PC world • Tools like Microsoft Word and Corel WordPerfect • Embedded commands in document to control layout (fonts, colors, font size, location of objects) • Rich Text Format (RTF) - An ASCII language for describing such layout. Can be used to pass information among different processors

  4. LaTeX TeX: Document processing system • developed by Donald Knuth • a macro processing system for creation of string text (i.e., documents) • Arcane syntax LaTeX: Macros for TeX • a set of macros developed for TeX by Leslie Lamport • creates a series of environments and control structures similar to programming language structures • for lack of a better term, we often refer to the compiling of the book as various chapters are processed by the TeX program • This book developed using LaTeX

  5. LaTeX execution Executes much like a traditional compiler: • First pass: Read in text and create output format. Create symbol table for all internal references (section numbers, page numbers, figure numbers) Create table of contents and index, if desired • Second pass: Read in text and create output format. This time, internal references are correct because of symbol table created during pass 1. • Third pass: If no changes made to symbol table by pass 2, same as pass 2; otherwise repeat pass 2, again until no further changes are made to symbol table [Why more than 2 passes? - Think of putting a table of contents at beginning of a report.]

  6. LaTeX features LaTeX creates environments that make TeX easier to use. These behave much like C or Pascal scope rules For example, one can begin and end a list of items: • Numbered \begin{enumerate} \item text [Prints as number 1] \item text [prints as number 2] \end{enumerate} [End of list] • Bulleted (“itemized”) • Named (“description”) Starting new sections or subsections automatically adjusts the appropriate section numbers. LaTeX has a syntax similar to the block-structured style of a programming languages.

  7. LaTeX structure

  8. LaTeX execution By invoking LaTeX, the latex.tex macros are read into TeX to create commands for chapters, sections, subsections, figures, tables, lists, and the numerous other structures needed to write simple documents. The documentstyle command (in LaTeX) allows the user to add other style features. • The required article parameter causes article.sty to be read in to tailor latex.tex with commands needed for an article. For example, there are no chapters in articles, but for style book (i.e., book.sty), chapters are defined. • 11pt defines the size of the text font (11-point type), and art11.sty is read giving additional information on line and character spacing for 11- point type. The TeX program along with article.sty and art11.sty form the standard way to process a LaTeX article. • Mystyle.sty defines addition maccros a user can add to tailor LaTeX for a specific document.

  9. Page description languages A Postscript program consists of five components: 1. An interpreter for performing calculations. A simple postfix execution stack is the basic model. 2. A language syntax. This is based on Forth. 3. Painting extensions. An extension to Forth with painting commands for managing the process of painting text and pictures on a sheet of paper. 4. Defines a virtual machine for drawing information (text and graphics on a page). The showpage operator causes the described page to be displayed 5. Conventions. A series of conventions, not part of the formal Postscript language, that various printers use for consistency in presentation. Use of these conventions makes it easier for transporting postscript documents from one system to another.

  10. Postscript execution model A Postscript program consists of a sequence of commands that represent the postfix of the algorithm necessary to paint the document. Postscript execution begins with two entries initially on the stack, which the program may not remove: Systemdict is the system dictionary, which represents the initial binding of Postscript objects to their internal representation. Userdict is the user dictionary, which represents the new definitions included within this execution of a Postscript program. This may include redefinition of primitive objects already defined in systemdict.

  11. Sample Postscript command Each argument is stacked on Postscript stack: /box {newpath 0 0 moveto 0 1 lineto 3 1 lineto 3 0 lineto closepath} def /box: Add name box to stack. / says this is a definition and not to evaluate arguments, only move to stack (like quote in LISP) newpath: start a new path moveto: Take top two stack arguments and move cursor to that (X,Y) location lineto: Draw line from current cursor to the (X,Y) address, which is the top two stack numbers closepath: Draw line back to newpath location def: Everything within { ... } is defined to be command box [Note that the command box now draws a rectangle from (0,0) to (0,1) to (3,1) to (3,0) and back to (0,0)]

  12. Summary Note differences between models: LaTeX and MS Word - define the layout of the final document Postscript - defines a program which computes the final layout. A Postscript printer contains an interpreter that executes the Postscript program to produce the final printed document

  13. Postscript execution stacks 1. The operand stack contains the operands as they are stacked,executed, and unstacked. 2. The dictionary stack contains only dictionary objects. This stack defines the scope and context of each definition. 3. The execution stack contains executable objects. For the most part, these are functions in intermediate stages of execution. 4. The graphics state stack manages the context for painting objects on the page A Postscript program is a sequence of ASCII characters. As each token is read, its definition is accessed in the stack (by first looking in userdict and then systemdict) and executed by an appropriate action.

  14. Document conventions Conventions built into all Postscript interpreters: • The leading comment should be%!PS That informs the interpreter that the file is a Postscript program. • Each page of a document is usually bracketed by a save and a restore command to isolate that page from the effects of other pages. • %%DocumentFonts: a list of fonts used in the document • %%Title: an arbitrary string, the title of the document • %%Creator: the name of the program that created the file • %%CreationDate: the date and time of creation • %%Pages: the number of pages in the document. • %%BoundingBox: the four values that represent the lower left and upper right corners of the page that are actually painted by the program. This allows the pages to be inserted into other documents.

  15. Postscript summary Postscript was developed to be a virtual machine architecture that can be used to create printable documents. Postscript of a document is not meant to be read by a programmer. However, the syntax is quite simple and easily understood. Postscript has been developed further by Adobe with the creation of their Portable Document Format (PDF). PDF is a form of compressed Postscript. PDF readers are freely available over the Internet, and most Web browsers can display PDF files. PDF has become ubiquitous for the transmission and display of formatted documents. Giving away PDF display programs was a shrewd move for Adobe because they sell the Acrobat program needed to create PDF documents.

More Related