1 / 19

TeX2Star

A System for Converting TeX to OpenOffice By Jeffrey Starr. TeX2Star. Overview. Why does conversion matter? Why has it not already been done? Why is it difficult? Proposal: TeX->OpenOffice Proposal: TeX->DVI->OpenOffice Solution Unsolved problems. What is OpenOffice?.

sela
Download Presentation

TeX2Star

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A System for Converting TeX to OpenOffice By Jeffrey Starr TeX2Star

  2. Overview • Why does conversion matter? • Why has it not already been done? • Why is it difficult? • Proposal: TeX->OpenOffice • Proposal: TeX->DVI->OpenOffice • Solution • Unsolved problems

  3. What is OpenOffice? • Open Source office suite • Based on StarOffice, currently owned by Sun Microsystems • Cross-Platform • XML based, standards driven • Semantic-based format

  4. What is TeX? • Written by Donald E. Knuth • Solution to declining standardsin mathematical typography • Heavily used in mathematics and physics • Both a program and a programming language • Presentation-based format

  5. Why Bother to Convert? • TeX rare outside mathematical circles • Conflicts with publishing software • Does not fit within current word processing model • TeX's purpose to is to produce journal-quality typography, not facilitate editing of content.

  6. TeX has no direct editable outputs. Aside: Editable Output • TeX has many presentation outputs: • DVI • PostScript • PDF • PNG • TIFF • Fax

  7. Solution: TeX->OpenOffice • Why use the outputs? Read the original document. • Perfect knowledge of content and (presentational) intent • Write a program that reads TeX and outputs OpenOffice, instead of DVI

  8. Problems with TeX->OpenOffice • TeX is a large system • Eight years development • Too large for a semester • Irregular • Non-Balanced • Many special cases

  9. TeX is Irregular • An irregular language is one in which typical rules of processing are violated • Irregular '\atop': (TeX) • {numerator \atop denominator} • Regular '\frac': (LaTeX) • \frac{numerator}{denominator}

  10. TeX is not balanced • A language that is balanced will have an explicit beginning and end to each grouping • Non-balanced font commands: (TeX) • \bf this is bold \rm this is normal, roman text • Balanced font commands: (LaTeX) • \textbf{this is bold} this is back to normal

  11. TeX has many special cases • \par may either: • explicitly end a paragraph • do nothing (if in math mode) • do nothing (if in restricted horizontal mode) • tell TeX to build the current page • \par is also irregular (acts on material already processed and in the reverse direction) and unbalanced (may or may not be proceeded by \indent, a primitive to start a paragraph)

  12. Solution: TeX->DVI->OpenOffice • Let TeX deal with TeX • Run TeX on the original text • Read the resultant DVI output • Process the DVI output to OpenOffice

  13. Problem: Lack of semantic data • DVI contains font definitions, text stream, and description of black boxes • Fonts contain characters, but do not say what those characters are • Especially a problem with kerning “ff” vs. “ff” • Also a problem with bold and italics text --- bold and italics are their own fonts

  14. Solution: Add Annotations • Use interpositioning and the TeX primitive '\special' to send extra information to DVI file • \special leaves comments that can be read later • Reading the DVI with proper annotation allows the text to retain some level of semantic information • Difference between knowing that the next character is smaller and raised versus knowing that the next character is a superscript

  15. Problem: Unbalanced Tags • Some primitives are balanced, but many are not • Tags may affect the document for an arbitrary length of time or are local to a paragraph or specific block of text

  16. Solution: Balancing • Algorithm: • Given: database of tags • start tag, end tag, 'insert end tag' tags • Go through list of tags, find one that needs help balancing • Go forward along list, finding nearest tag that closes the previous tag, or end of document • Insert end of tag into the list of tags

  17. Post Document Editing • Further balancing and insertion of tags may be necessary after first sweep through file • Tables: • OpenOffice format requires number of columns to be specified • We don't know how many columns will be needed until after we read the entire table • Solution: After processing, go back and insert the needed information

  18. Unsolved Problems • Footnotes: • Defined by position in the page • Automatic positioning conflicts with paragraph detection tool • Unable to discern between footnotes, extra paragraph, header, or footer • Non-English alphabets

  19. Conclusion • Semantics of document are lost in TeX itself, so no hope of recovery • Overt presentation can be recovered for editing • Method works to translate an irregular, non-well formed language into a regular, well-formed language (XML)

More Related