1 / 39

Text Processing in Unix

Text Processing in Unix. Where is MS Word??????. Unix has never had a standard word processing system Although several word processors are now available (WordPerfect, StarOffice, ApplixWare) none has reached the status of a standard Portability is an issue

devin
Download Presentation

Text Processing in Unix

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Processing in Unix

  2. Where is MS Word?????? • Unix has never had a standard word processing system • Although several word processors are now available (WordPerfect, StarOffice, ApplixWare) none has reached the status of a standard • Portability is an issue • Unix has traditionally relied upon text processors instead • roff • nroff/troff • TeX

  3. Text Processing • Text processing is a two step process • First the document is created using a text editor such as vi • Document contains plain text and formatting codes • Second step is to print the document through a formatter, such as nroff • The formatter interprets the codes embedded in the document so that it is printed according to your specification • Note that the editor does not format, and the formatter is of no use for editing; they are two distinct operations

  4. The Formatting Process Text Editing Document File Text Formatting The Great AmericanNovel Final Doc

  5. Formatter Characteristics • roff-like formatting systems describe actions you want to have performed on your text • indent • italicize • bold • They use very primitive, low-level commands • It would be nice instead to define the type of text (paragraph, footnote, heading) and externally specify the particular format you want applied to this type of text

  6. A Programming Language • Like Postscript, nroff is a programming language • It has a set of preprocessors, like C • It has macros • It has a full set of registers • As with Unix, most people do not need to understand the full programming aspects in order to effectively use nroff, any more than you need to know C to use Unix

  7. Macros • Macros translate high level result oriented commands into low level commands • Start paragraph • Start heading • Start section • There are two principal macro packages • -mm macro memorandum • -ms • -mm is the more robust of the two packages • While -ms was designed to do everything, -mm was designed for robustness • Many simple errors in -ms lead to mystifying results while -mm usually provides helpful error messages

  8. Preprocessors • Although nroff is a powerful formatter, it can’t do everything • You can set tabs but nroff can’t analyze the widths of columns and automatically set widths • nroff has some ability for drawing lines and boxes but it is hard to do manually • Although nroff has access to mathematical symbols, building equations directly is too hard • Preprocessors are used to extend the abilities of nroff • Preprocessors are designed to address a specific typesetting specialty, such as equations or line drawings

  9. The preprocessor translates portions of your document to nroff primitive commands and leaves the rest alone • Preprocessors work in "the Unix way" • They are filters • They perform one specific function rather than causing nroff to bloat • They are used with pipes • tbl mydoc | troff • Four common preprocessors • tbl - for managing tabular data • eqn (or neqn) - for typesetting mathematical equations • refer - for bibliographical references • pic - for creating line drawings

  10. Basic nroff Commands • nroff contains two types of commands • dot commands • embedded commands • Dot Commands • Must stand alone on a line • Have a period (dot) as the first character on the line • Many dot commands accept one or more arguments to give them information about what to do • Example: .sp produces one extra (blank) line of output • .sp 3 produces three extra blank lines • At least one space or tab must separate the arguments

  11. Embedded Commands • Embedded commands start with a backslash ( \ ) • Often used to access a special character • Most accept an argument which must immediately follow the command without any intervening space • Some features, such as point size or font changes, can be done with either a dot or embedded command • In general, dot commands are used to control global features • Embedded commands used to control local features • .\" is used to protect remarks from being printed

  12. Spacing Commands • .sp controls vertical spacing. • By itself, .sp inserts a blank line • .sp num enters num blank lines • .sp numi adds num inches of blank space • Absolute movement is also possible • .sp | 3i moves to the position 3 inches from the top of page • .sp | -2i moves to 2 inches from the bottom of page • Note: most macro packages turn on no-space mode after printing page headers so any top-of-page requests will be ignored. If you always want to produce the space, put a .rs on the line preceding the .sp command

  13. .vs controls vertical spacing between lines • Default is generally about 20% more than the font point size • To change, use .vs psp, where ps is point size, ie; .vs 12p for 12 point spacing (12/72 inch) • .ls controls line spacing • .ls 2v causes double spaced output

  14. .ne tests to see how much space is left between the current location and the next trap, which generally signifies the end of text on the page • .ne 2v tests to see if there are at least two lines of space remaining on the page • If there isn't enough space, the page position is advanced to the next trap, which usually prints the footer and advances to the next page • .ne is used to avoid widows, where the first or last line of a paragraph is isolated on a different page from the rest of the paragraph

  15. .in is used to control indenting • Measured in ems (approximately the width of an M) • .in 5m indents all following text by 5 ems • .ti controls temporary indent for the next line • .br causes a break in the text-filling process • Often used in header/return address portion of a letter to separate lines Ken Frazer .br PO Box 1234 .br Coppell, TX

  16. .bp forces a new page • .ta sets the tab stops and requires arguments • .ta 8n 16n 24n 32n 40n sets tabs every 8 character positions • Can also use other measurement units • .ta .5i 1i 1.5i 2i • Tabs remain in effect until changed so in theory, you only need to set them once • However, in practice you should set them every time you use them because the standard macro packages and preprocessors fiddle with them all the time

  17. \u and \d are used to produce half line motions • \| and \^ are narrow space codes • \| is 1/6 em, \^ is 1/12 em • \c is the end-of-line continuation mark • Normally an end-of-line character is converted to a space when the lines are stitched together. When a \c is used at the end of a line, the space is discarded so the following line is attached to the current line without an intervening space.

  18. Filling and Adjusting • .na and .ad control the adjustment of the margins • .na (no adjust) causes nroff to stop adjusting margins. Words will still be collected to form an output line but intervening spaces will not be added to make the margins align. • .ad tells nroff to resume adjusting margins • .ce centers the following input line(s) until a .ce 0 command is encountered • Input lines following the .ce command are not filled

  19. .nf and .fi control the nroff fill mode • .nf causes nroff to stop collecting input lines to produce appropriately long output lines • .fi tells nroff to resume filling lines • .nf can be used instead of .br for producing letter headings .nf Ken Frazer PO Box 1234 Coppell, TX .fi .sp 2 Dear Alice,

  20. Fonts • .ft is used to switch fonts • Fonts are referred to by either number or by one- or two-character names • 1 is Times Roman • 2 is Times Roman italic • R is also Times Roman • RI is Times Roman italic • \fn is another way to switch fonts • \f1plain\f2italics\f3bold\f1 will produce plainitalicsbold

  21. Hyphenation • nroff will automatically hyphenate words at the end of a line, but like any automatic hyphenator, it sometimes makes mistakes • .hn turns off automatic hyphenation • .hy enables automatic hyphenation. It accepts a numeric argument that controls when hyphenation is used • .hy 2 disables hyphenation for the last line of a page • .hy 4 disables splitting off the last two characters of a word • .hy 8 disables splitting off the first two characters of a word

  22. .hw allows you to specify how certain words should be hyphenated • .hw de-vice proc-ess cata-logue un-known trans-portable

  23. File Switching • nroff has two commands to switch from one input file to another • .so switches from from the current file to the file named as an argument. When the inserted file is completely read, nroff returns to the original file and continues reading from the point just past the .so. • .nx switches from the current file to the file named as an argument. All processing stops when the new file is completely read. Any text in the original file that follows the .nx command will not be processed.

  24. -ms Macros • -ms was the first widely used macros package • To use: • nroff -ms inputfile | lpr

  25. Commands • .NH n produces a numbered heading .NH n text .LP • where n is the heading level, text is the heading text, .LP starts a new paragraph • .SH produces a section heading .SH text .LP

  26. Paragraph Commands • .PP starts a normal paragraph with first line indented • .LP starts a normal paragraph with all lines flush left • .IP label starts an indented paragraph. All lines are indented on the left and the optional label is printed to the left of the first line. • .XP starts an exdented paragraph. All lines except the first will be indented on the left.

  27. Overall Document Format • .1C or .2C switches to one-column (the default) or two-column format • .DA date prints the date at the bottom of the page. The optional date argument overrides the current date. • .ND inhibits printing the date at the bottom of the page. • .OH 'L'C'R' These macros produce Headers and .EH 'L'C'R' Footers on Odd or Even pages. .OF 'L'C'R' Each three-part header consists of .EF 'L'C'R' text L for the left, C for the center,and R for the right. In a header or footer % will print the page number.

  28. Type Styles • .R Switch font to Roman, italic, or bold. .I wd1 wd2 For I or B, if wd1 is supplied, it .B wd1 wd2 alone will be in italic or bold. If wd2 is supplied, it will follow wd1, without a separating space, and be in the surrounding font. • .SM Switch to a smaller, normal-size, or .NL larger typeface. .SM or .LG can be .LG repeated to increase the size change. • .UL word Underline a single word.

  29. Displays and Footnotes • .DS x Display text in no-fill mode. text will be text moved to the following page, leaving a .DE blank region, if it doesn't fit. Optional argument x may be L for a flush-left display, I for a slightly indented display, C for a line-by-line centered display, or B for a block centered display. The default is an indented display. • .LD Display multipage text..ID .LD replaces DS L.CD .ID replaces DS I .CD replaces DS C

  30. .KS text will be moved to the following page text if it doesn't fit. A blank space may be .KE produced at the bottom of the current page. • .KF text will float to the start of the following text page if it doesn't fit on the current page. .KE Following text may be moved forward to fill the bottom of the page.

  31. .EQ xntext will be processed by the eqn text preprocessor. Optional argument x may .EN be I for an indented equation, L for a flush-left equation, or C for a centered equation. Centered is the default. An argument n may follow the equation type. It will be placed flush left to identify the equation. • .TS text will be processed by the tbl text preprocessor. .TE

  32. .RS text will be shifted to the right.text.RE • .FS text is a footnote that will be placed at text the bottom of the page. Berkeley -ms .FE allows \** to number footnotes automatically.

  33. First Page Formats • .RP uses the AT&T Released Paper style • .TM uses the Berkeley Thesis style • .TL uses text for a title .TL text • .AU specifies an author's name (text) and optional address and phone number (loc and ext, respectively) .AU loc ext text

  34. .AI specifies an author's institution .AI text • .AB is used for the abstract .AB text .AE • .SG inserts the author's signature (name) in the text

  35. Table of Contents • .XS n uses text as a TOC entry, with page number n .XS n text .XE • .PX prints the table of contents

  36. Using the Preprocessors • Implemented as filters so piping is appropriate • tbl infile | pic | neqn | nroff | lpr

  37. tbl Preprocessor .TS center box; C S RI L. Text Preprocessors .sp .3v tbl Tables of data eqn Equations refer References pic Line Drawings .TE Text Preprocessors tbl eqn refer pic Tables of data Equations References Line Drawings

  38. eqn Preprocessor .EQ int { { e sup {i omega t} + e sup {-i omega t} } over {2 pi} } .EN

  39. pic Preprocessor .PS box ht .4 wid .6 box ht .6 wid .8 with .c at last box.c PC: box ht .3 wid 1 with .n at last box.s “PC” at PC.c + (-.35,0) box ht .15 wid .3 at PC.c box ht same at PC.c + (.3,0) .PE PC

More Related