1 / 22

MathML and Digital Libraries

MathML and Digital Libraries. Timothy W. Cole ( t-cole3@uiuc.edu ), University of Illinois at Urbana-Champaign Workshop on International Scientific Data, Standards & Digital Libraries @ JCDL 2005, Denver, CO. Digital Typography?.

mchism
Download Presentation

MathML and Digital Libraries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MathML and Digital Libraries Timothy W. Cole (t-cole3@uiuc.edu), University of Illinois at Urbana-Champaign Workshop on International Scientific Data, Standards & Digital Libraries@ JCDL 2005, Denver, CO

  2. Digital Typography? Typography, when considered “as the servant of mathematics,” should have as a goal “to communicate mathematics effectively by making it possible to publish mathematical papers and books of high quality, without excessive cost.” -- Donald Knuth (inventor of TEX), 1978 • Where do we go from TEX? t-cole3@uiuc.edu University of Illinois at UC

  3. MathML Antecedents • HTML Math (Dave Raggett, 1994; HTML 3.0/3.2) • Limited functionality • Community preference to keep HTML simple, generic • SGML ISO 12083 (1994) Mathematics DTD • Exclusively presentational • Not as sophisticated as MathML • W3C Math Working Group Formed 1997 • MathML 1.0 released 1998 • MathML 2.0 released 2001 t-cole3@uiuc.edu University of Illinois at UC

  4. MathML – Presentation Markup • Used to describe the layout structure of mathematical notation – focus on visual constructs (about 30 elements). Includes: • Token elements – primarily PCData content models • Layout Schemata, for building expressions – element content models • Semantic hints via invisible characters are encouraged <math xmlns='http://www.w3.org/1998/Math/MathML‘> <mrow> <msqrt> <mi>z</mi> </msqrt> <mo>=</mo> <msup> <mi>z</mi> <mfrac><mn>1</mn><mn>2</mn></mfrac> </msup> </mrow> </math> t-cole3@uiuc.edu University of Illinois at UC

  5. MathML – Content (Semantic) Markup • Intended to support encoding of the underlying mathematical structure of an expression, rather than any particular rendering for the expression – about 120 elements • Limited scope, designed to express “commonplace mathematical constructs,” i.e., through about first 2 years of college (calculus) • can be extended,e.g., through the use of OpenMath <math xmlns='http://www.w3.org/1998/Math/MathML‘> <apply><eq/> <apply><root/> <degree><cn type='integer'>2</cn></degree><ci>z</ci></apply> <apply><power/> <ci>z</ci><cn type='rational'>1<sep/>2</cn></apply> </apply> </math> t-cole3@uiuc.edu University of Illinois at UC

  6. Displaying MathML in Web Browsers • Mozilla / Netscape 7.x renders MathML natively • But not all elements supported, or supported well • Internet Explorer does not render MathML natively • But plug-ins available – TechExplorer, MathPlayer • Syntax for using MathML different between browsers • W3C provides XSLT that allows a page to viewable in both • Rendering quality also depends on fonts available • Thousands of code points have been added to Unicode • STIXFonts.org is creating more than 4,000 freely available new glyphs for these Unicode points (rest were pre-existing) t-cole3@uiuc.edu University of Illinois at UC

  7. Other Software Support for MathML (2004) • Editors: MathType + Word, Integre MathML Editor, SciWriter, MathFlow, Scientific Workplace, MathFlow + Epic/XMetaL, OpenOffice, others • Interactive Components: Techexplorer, WebEQ • TeX Converters: TeXML, Hermes, tex4ht, others • Computation: Maple, Mathematica, REDUCE, others t-cole3@uiuc.edu University of Illinois at UC

  8. Using MathML (as of 2004) • MathML incorporated into a number of document types: • Docbook, NIH Journal Archiving & Publishing, etc. • MathML is being used in STM publications : • In production: American Institute of Physics, US Patent Office • In pilot: Springer, Wiley, Marcel Dekker, Elsevier, others • MathML behind the scenes in e-Learning software: • Blackboard, WebCT, eCollege, Diploma.EDU, MapleTA, other • BUT adoption by author community limited to date t-cole3@uiuc.edu University of Illinois at UC

  9. Current Status & Open Issues • MathML 2.0 is stable, available, & in increasing use • Proving useful for many publication & educational applications • Tailored for use on the Web; interactivity demostrated • Commitment to stay open & maintain backward compatibility • Limitations, Issues, & Needs • Utility for research mathematics? • Use in / cross-walks with other Sci-Tech MLs • Low-level: fonts / glyphs, character entity definitions, … • Semantic utility not yet proven (being investigated):- Comparing MathML expressions for degree of equivalence- Searching by MathML expressions t-cole3@uiuc.edu University of Illinois at UC

  10. An Example: Wolfram Functions Website

  11. MathML in Functions Site

  12. But… • Rendering of MathML typically not exactly the same application to application • For most complex functions, MathML not enough – must augment with annotations meaningful to Mathematica • Applications still rely on something other than MathML for complex, behind the scenes functionality • No canonical forms – no accepted normalization rules t-cole3@uiuc.edu University of Illinois at UC

  13. Experiment: Using MathML to Link Literature • Hypotheses: • Readers of a journal article may want to know more about mathematical functions used in article … And vice-versa • Functions site keywords/phrases & MathML found in body of articles may be useful for linking, search & discovery • Approach: • Examine journal article testbed for presence of Functions site keywords; add links & terminology to article metadata • Extract selected MathML occurrences from article body; “normalize”; add to article metadata t-cole3@uiuc.edu University of Illinois at UC

  14. Article Testbed: Illinois DLI-I / D-Lib Test Suite • Funded 1994-98 under DLI-I (NSF, DARPA, & NASA) Continued 1998-2001 under CNRI’s D-Lib Test Suite • Featured: • Multi-publisher, Markup-Based Full-Text Journal Testbed. • Studies of Processing, Indexing, Normalization, Retrieval, Rendering, Linking & End-User Searching Needs. • Testbed contains 75,000+ Articles from 60 Journal Titles • Received as SGML (various DTDs); converted to XML • Content from AIP, APS, ASCE, IEE, ASM, ACM, Elsevier • Additional support from IEEE, NRL, NTT Learning Systems t-cole3@uiuc.edu University of Illinois at UC

  15. Searching for Occurrences of Functions Site Metadata Terms in Article Testbed • Functions Site: 83,000 pages containing keywords in HTML <meta> elements • Approximately 7,500 unique keywords • Not all keywords useful for full-text search • Eliminated numerals, Mathematica expressions, single word phrases, phrases containing selected words • About 1,000 keywords left after automated filtering • 367 of these keywords appear in journal article testbed • 44,000 articles contain at least one keyword • On average each of those 44,000 contain 2 keywords t-cole3@uiuc.edu University of Illinois at UC

  16. Issues in Using Functions Keywords/phrases for Characterizing Articles • Vocabulary switching • Keywords in mathematics, not necessarily terms used in physics, electrical engineering, & computer science • Meaning of words in full-text less precise • Synonyms, less specific use, etc. • Context – same descriptive keywords may map to many different branches of Functions site • E.g., 300 occurrences of “q-series” in Functions site;which are relevant to 18 articles containing this keyword? • Use hierarchical tree of Functions Site to place user in reasonable proximity t-cole3@uiuc.edu University of Illinois at UC

  17. Using MathML Found in Article Body • 2.5 million occurrences of MathML in testbed, but most not useful for searching & linking • MathML used for greek characters in text • Many instances short, inline fragments • Quality – many instances can’t be parsed by Mathematica • Criteria • Explicitly equations (block display format) • Minimum length, 200 bytes • Accepted by Mathematica kernel • About 3% of occurrences meet criteria t-cole3@uiuc.edu University of Illinois at UC

  18. Observations from this early experiment • Currently, text cues in articles seem more effective for linking than mathematical cues • Mathematics embedded in articles needs lots of processing to begin to be useful for discovery, linking, … • We need to understand mathematical equivalents of language grammar, noun-phrases, … t-cole3@uiuc.edu University of Illinois at UC

  19. Searching Mathematics Directly

  20. Search Results

  21. More Like This One

  22. Links • Wolfram Functions Website http://functions.wolfram.com/ http://functions.wolfram.com/formulasearch • W3C Math Working Group http://www.w3.org/Math/ http://www.w3.org/Math/testsuite/ • Design Science http://www.dessci.com/ http://www.dessci.com/en/products/webeq/interactive/default.htm • Enhancing the Searching of Mathematics (IMA Hot Topic) http://www.ima.umn.edu/complex/spring/searching.html t-cole3@uiuc.edu University of Illinois at UC

More Related