Discourse Level Software Current Statusand Future Directions Nov. 16, 2004 Lars Huttar (email@example.com) Knowledge Management Services
Abstract (I) • Discourse analysis (DA, a.k.a. textlinguistics) is a task frequently cited as needing computer-assisted tools. • Some tools are currently available for certain tasks, but as yet, no user-ready applications specifically for the discourse charting commonly used on the field.
Abstract (II) • This presentation will review a few of the existing tools most pertinent to DA on the field, and software that is planned or under development. • I will also mention the conceptual model for constituent charting described in my thesis, which uses XML encoding of text and analysis, from which a chart is rendered via XSL.
Overview • The need for discourse analysis software • What’s already out there? • What’s coming down the pike?
Need for Discourse Software The task: • Help the user produce charts, diagrams, and summaries of texts in such a way as to facilitate discovery of discourse patterns and to expedite testing of hypotheses.
Import (interlinear) text Segment and move pieces into chart columns Mark genre(s) Configurable auto-highlighting, e.g. color by POS. Toggle highlighting of certain features Manual annotation of features incl. coherence and prominence Search text, IT, and annotations Chart/summary of results, hyperlinked to data Accessible to MTTs/OTTs Geoffrey Hunt Kent Spielmann Major features desired
Current Practice • Pencil & paper • MS Word • MS Excel • A few bravesouls useother tools
The Right Tools? Specialized tools could make it quicker and easier!
How to Address the Need? • Use existing software • SIL FieldWorks DA tool(s) • Extend existing tools?
What’s already here? • MDA • BART • RSTTool • MATE • CiCaDA
Multilinear Discourse Analysis • Generate statistics and diagrams relating to span analysis, topic continuity statistics, and other issues • Input is an SFM marked up text (e.g. from Shoebox) • In Beta 2 • More info: firstname.lastname@example.org
Biblical Analysis Research Tool • BART – has features supporting discourse analysis of biblical texts • Comes with extensive built-in morphosyntax markup; supports customizable tagging and complex queries. • Only for biblical texts; can’t enter vernacular texts. • Part of TW, or available from WordSearch Corp. • www.sil.org/translation/bart.htm
RSTTool • Lets user diagram relations between text “chunks.” • Free download from http://www.wagsoft.com/RSTTOOL • User can define own set of relations, schemas, etc. such as SSA or Longacre’s propositional relations. • Can generate statistics based on the tree structures built by the user. • File format is XML-based. • Text can be edited even after struc-turing has begun.
MATE Workbench • Tool “to aid in the display, editing and querying of annotated speech corpora” • Encodes data in XML and displays via XSL-like stylesheets; could be programmed to produce various displays. • In “early demo” version (2001). Looks like it has potential, but I can’t get it to runon my machine. • http://mate.nis.sdu.dk/
CiCaDA • Produce fairly feature-complete constituent charts from XML data using XSLT stylesheets. • Encode text, column assignments, and chart configuration in XML; chart is produced automatically. • Open standards promote modification/ reuse of data. • There is no “application;” no user-friendly way to enter the XML data.
Helps available • LinguaLinks Library has several items, including: • Analyzing Discourse: a Manual of Basic Concepts – Dooley & Levinsohn (avail. on the web as well as in LLL). Very practical.
Do you know of others? • Please let me know if you are aware of other useful discourse-level software tools!
What’s coming? • TCC • AGTK • FieldWorks DA tools
TCC • “A tool for drawing syntax trees” – could also be used for discourse “chunking” and highlighting • Looks very easy to use. Collapsible tree makes it easy to browse large text structures. • Supports Latin-1 charset. • Author taking feedback to make TCC more useful for SIL’s work. • Still in beta. No release sched. • Info: http://ulrikp.org/
Annotation Graph ToolKit • AGTK is a toolkit for annotating texts • TreeTrans – edit syntactic trees; charting & chunking possible • InterTrans – interlinearize text (very beta) • Saves in an abstract XML format; potential good basis for “Lego” solution • Not ready for end users.
SIL FieldWorks DA Tool(s) • FW DA software is still on the drawing board but is a high priority. • Would leverage the huge benefits of all the work that has gone into FieldWorks! • FW tools already support interlinear text, text annotations/tagging and highlighting. • Preliminary work has begun on design of constituent charting features. • Wish list for DA features exists but requirements not yet prioritized.Guidance team has not yet beenformed.
Conclusion • There are some good tools already out there for certain tasks related to DA. Unfortunately they don’t interoperate much, and there are no domain-aware applications for constituent charting. • SIL FieldWorks tools, as they become available, should cover certain DA tasks well, such as constituent charting.