- By
**Ava** - Follow User

- 609 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Unicode Support for Mathematics' - Ava

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Overview

- Unicode math characters
- Semantics of math characters
- Unicode and markup
- Multiple ways of encoding math characters
- Not yet standardized math characters
- Inputting math symbols

Unicode Math Characters

- 340 math chars exist in ASCII, U+2200 – U+22FF, arrows, combining marks of Unicode 3.0
- 996 math alphanumeric characters are in Unicode 3.1’s Plane 1
- 591 new math symbols and operators are in Unicode 3.2’s BMP
- One math variant selector
- One new combining character (reverse solidus).

Basic Set of Alphanumeric Characters

- Latin digits (0 - 9)
- Upper- & lowercase Latin letters (a - z, A - Z)
- Uppercase Greek letters Α - Ω plus the nabla ∇ and the variant of theta Θ given by U+03F4
- Lowercase Greek letters α - ω plus the partial differential sign ∂ and glyph variants of ε, θ, κ, φ, ρ, and π
- Only unaccented forms of letters are used

Math Alphanumeric Characters

- Math needs various Latin and Greek alphabets like normal, bold, italic, script, Fraktur, and open-face
- May appear to be font variations, but have distinct semantics
- Without these distinctions, you get gibberish, violating Unicode rule: plain text must contain enough info to permit the text to be rendered legibly, and nothing more
- Plain-text searches should distinguish between alphabets, e.g., search for script H shouldn’t match H, etc.
- Reduces markup verbosity

Legibility Loss

Without math alphabets, the Hamiltonian formula

H = dτ[εE2 + μH2]

becomes an integral equation

H = dτ[εE2 + μH2]

Math Alphanumeric Chars (cont)

Plain a-z, A-Z, 0-9, -, -Ω

Bold a-z, A-Z, 0-9, -, -Ω

Italic a-z, A-Z, -, -Ω

Bold italic a-z, A-Z, -, -Ω

Script a-z, A-Z

Bold script a-z, A-Z

Fraktur a-z, A-Z

Bold Fraktur a-z, A-Z

Double struck a-z, A-Z, 0-9

Sans-serif a-z, A-Z, 0-9

Sans-serif bold a-z, A-Z, 0-9, -, -Ω

Sans-serif italic a-z, A-Z

Sans-serif bold italic a-z, A-Z, -, -Ω

Monospace a-z, A-Z, 0-9

How Display Math Alphabets?

- Can use Unicode surrogate pair mechanisms available on OS
- Alternatively, bind to standard fonts and use corresponding BMP characters
- Second approach probably faster and to display Unicode one needs font binding in any event. But most traditional fonts are not suited to math alphabetic characters
- A single math font may look more consistent

Math Alphabetics via Glyph Variants

- One approach to the math alphanumerics would be to use a set of math glyph variant selectors
- Such a tag would follow a base character imparting a math style
- Approach was dropped since it seemed likely to be abused
- One math variant selector does exist to offer a different line slant for some composite symbols
- Other variant selectors are being defined for nonmath purposes, e.g., Han variants

Multiple Character Encodings

- As with nonmath characters, math symbols can often be encoded in multiple ways, composed and decomposed
- E.g., ≠ can be U+003D, U+0338 or U+2260
- Recommendation: use the fully composed symbol, e.g., U+2260 for ≠
- For alphabetic characters, use combining-mark sequences to get consistent typography
- Some representations use markup for the alphabetic cases. This allows multicharacter combining marks.

Compatibility Holes

- Compatibility holes (reserved positions) exist in some Unicode sequences to avoid duplicate encodings (ugh!)
- E.g., U+2071-U+2073 are holes for ¹²³, which are U+00B9, U+00B2, and U+00B3, respectively
- Math alphanumerics have holes corresponding to Letterlike symbols.
- Recommendation: you can use the hole codes internally, but must import and export the standard codes.

Nonstandard Characters

- People will always invent new math characters that aren’t yet standardized.
- Use private use area for these with a higher-level marking that these are for math.
- This approach can lead to collisions in the math community (unless a standard is maintained)
- Cut/copy in plain text can have collisions with other uses of the private use area

Unicode and Markup

- Unicode was never intended to represent all aspects of text
- Language attribute: sort order, word breaks
- Rich (fancy) text formatting: built-up fractions
- Content tags: headings, abstract, author, figure
- Glyph variants: Poetica font: 58 ampersands; Mantinia font: novel ligatures (TT, TE, etc.)
- MathML adds XML tags for math constructs, but seems awfully wordy

Unicode Plain Text

- Can do a lot with plain text, e.g., BiDi
- Grey zone: use of embedded codes
- Unicode ascribes semantics to characters, e.g., paragraph mark, right-to-left mark
- Lots of interesting punctuation characters in range U+2000 to U+204F
- Extensive character semantics/properties tables, including mathematical, numerical

Unicode Character Semantics

- Math characters have math property
- Math characters are numeric, variable, or operator, but not a combination
- Properties are useful in parsing math plain text
- MathML doesn’t use these properties: every quantity is explicitly tagged
- Properties still can be useful for inputting text for MathML (noone wants to type all those tags!)
- Sometimes default properties need to be overruled
- Would be useful to have more math properties

Plain Text Encoding

- TEX fraction numerator is what follows a { up to keyword \over
- Denominator is what follows the \over up to the matching }
- { } are not printed
- Simple rules give unambiguous “plain text”, but results don’t look like math
- How to make a plain text that looks like math?

Simple plain text encoding

- Simple operand is a span of alphanumeric characters
- E.g., simple numerator or denominator is terminated by any operator
- Operators include arithmetic operators, most whitespace characters, all U+22xx, an argument “break” operator (displayed as small raised dot), sub/superscript operators
- Fraction operator is given by the Unicode fraction slash operator U+2044

Fractions

- abc/d gives
- More complicated operands use parentheses ( ), brackets [ ], or { }
- Outermost parens aren’t displayed in built-up form
- E.g., plain text (a + c)/d displays as
- Easier to read than TEX’s, e.g., {a + c \over d}
- MathML: <mfrac><mrow><mi>a</mi><mo>+</mo> <mi>c</mi></mrow><mrow><mi>d</mi> </mrow></mfrac>
- Neat feature: plain text looks like math

Subscripts and Superscripts

- Unicode has numeric subscripts and superscripts along with some operators (U+2070-U+208E)
- Others need some kind of markup like <msup>…</msup>
- With special subscript and superscript operators (not yet in Unicode), these scripts can be encoded nestibly
- Use parentheses as for fractions to overrule built-in precedence order

Presentation markup

- Presentation markup directs how the math should be rendered.

<mrow> <mi>E</mi>

<mo>=</mo>

<mrow>

<mi>m</mi>

<mo>⁢</mo>

<msup>

<mi>c</mi>

<mn>2</mn>

</msup>

</mrow>

</mrow>

Content markup

- Content markup describes the meaning of the expression, not the format.

<rel>

<eq/>

<ci>E</ci>

<apply>

<times>

<ci>m</ci>

<apply>

<power/>

<ci>c</ci>

<cn>2</cn>

</apply>

</times>

</apply>

</rel>

Symbol Entry

- GUI PCs can display a myriad glyphs, mathematics symbols, and international characters
- Hard to input special symbols. Menu methods are slow. Hot keys are great but hard to learn
- Reexamine and improve symbol-input and storage methods
- With left/right Ctrl/Alt keys, PC keyboard gives direct access to 600 symbols. Maximum possible = 2100 = 1030
- Use on-screen, customizable, keyboards and symbol boxes
- Drag & drop any symbol into apps or onto keyboards

Hex to Unicode Input Method

- Type Unicode character hexadecimal code
- Make corrections as need be
- Type Alt+x to convert to character
- Type Alt+x to convert back to hex (useful especially for “missing glyph” character)
- Resolve ambiguities by selection
- Input higher-plane chars using 5 or 6-digit code
- New MS Word standard

Built-Up Formula Heuristics

- Math characters identify themselves and neighbors as math
- E.g., fraction (U2044), ASCII operators, U2200–U22FF, and U20D0–U20FF identify neighbors as mathematical
- Math characters include various English and Greek alphabets
- When heuristics fail, user can select math mode: WYSIWYG instead of visible math on/off codes

Operator Precedence

- Everyone knows that multiply takes precedence over add, e.g., 3+5×3 = 18, not 24
- C-language precedence is too intricate for most programmers to use extensively
- TEX doesn’t use precedence; relies on { } to define operator scope
- In general, ( ) can be used to clarify or overrule precedence
- Precedence reduces clutter, so some precedence is desirable (else things look like LISP!)
- But keep it simple enough to remember easily

Layout Operator Precedence

Subscript, superscript ¯

Integral, sum ò S P

Functions Ö

Times, divide / * × · •

Other operators Space ". , = - + Tab

Right brackets )]}|

Left brackets ([{

End of paragraph FF EOP

Mathematics as a Programming Language

- Fortran made great steps in getting computers to understand mathematics
- Java and C# accept Unicode variable names
- C++ has preprocessor and operator overloading, but needs extensions to be really powerful
- Use Unicode characters including math alphanumerics
- Use plain-text encoding of mathematical expressions
- Can’t use all mathematical expressions as code, but can go much further than current languages go
- When to to multiply? In abstract, multiplication is infinitely fast and precise, but not on a computer

{

gammap = gamma*sqrt(1 + I2);

upsilon = cmplx(gamma+gamma1, Delta);

alphainc = alpha0*(1-(gamma*gamma*I2/gammap)/(gammap + upsilon));

if (!gamma1 && fabs(Delta*T1) < 0.01)

alphacoh = -half*alpha0*I2*pow(gamma/gammap, 3);

else

{

Gamma = 1/T1 + gamma1;

I2sF = (I2/T1)/cmplx(Gamma, Delta);

betap2 = upsilon*(upsilon + gamma*I2sF);

beta = sqrt(betap2);

alphacoh = 0.5*gamma*alpha0*(I2sF*(gamma + upsilon)

/(gammap*gammap - betap2))

*((1+gamma/beta)*(beta - upsilon)/(beta + upsilon)

- (1+gamma/gammap)*(gammap - upsilon)/

(gammap + upsilon));

}

alpha1 = alphainc + alphacoh;

}

Conclusions

- Unicode provides great support for math in both marked up and plain text
- Unicode character properties facilitate plain-text encoding of mathematics but aren’t used in MathML
- Heuristics allow plain text to be built up
- Need two more Unicode assignments: subscript and superscript operators
- On-screen keyboards and symbol boxes aid formula entry
- Unicode math characters could be useful for programming languages

Download Presentation

Connecting to Server..