kirrkirr software for browsing and visual exploration of a structured warlpiri dictionary n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary PowerPoint Presentation
Download Presentation
Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary

Loading in 2 Seconds...

play fullscreen
1 / 31

Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary. Kevin Jansz kjansz@sultry.arts.usyd.edu.au Department of Linguistics, University of Sydney, Australia Christopher Manning Departments of Computer Science and Linguistics, Stanford University, USA

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary' - gaura


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
kirrkirr software for browsing and visual exploration of a structured warlpiri dictionary

Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary

Kevin Janszkjansz@sultry.arts.usyd.edu.au

Department of Linguistics, University of Sydney, Australia

Christopher Manning

Departments of Computer Science and Linguistics, Stanford University, USA

Nitin Indurkhya

School of Applied Science, Nanyang Technological University, Singapore

objectives
Objectives
  • Provide innovative ways for representing a dictionary, through creative use of web technology
  • Provide practical, educationally useful access to information that can be customised to suit the needs of many users (at low labour cost)
  • Examine the richness of lexical structure

Initial target: the Warlpiri dictionary.

research program lexicon
Research Program: Lexicon
  • A language is more than individual words with a definition
    • it is a vast network of associations between words and within and across the concepts represented by words
  • Aim to provide people with a better understanding of this conceptual map.
  • Traditional paper dictionaries offer very limited ways for making such networks visible
  • There are no such limitations on a computer
research computational lexicography
Research: Computational Lexicography
  • Dictionaries on computers are now commonplace
    • Few utilise the potential of the new medium
    • Many present a plain, search-oriented representation of the paper version
  • Goal: fun dictionary tools that are effective for language learning, browsing
    • Like flicking through pages of a paper dictionary
    • Words are grouped by their meaning and their association with each other
    • Key to the effectiveness of this browsing is that the user has control over the way this is presented.
initial focus warlpiri
Initial focus: Warlpiri
  • Warlpiri is an Australian Aboriginal language spoken in the Tanami desert (NW of Alice)
  • There are a number of factors influencing this choice:
    • One of the most comprehensive lexical databases for any Australian Language (Laughren & Nash 1983)
    • Relatively large community of people interested in learning their traditional language
    • Until now, results haven’t been produced in a format usable by the community (only raw printouts)
educational goals
Educational goals
  • Dictionary structure and usability are often dictated by professional linguists, while the needs of others (speakers, semi-speakers, young users, second language learners) are not met
  • The low level of literacy in the region makes an e-dictionary potentially more useful than a paper edition
    • less dependent on good knowledge of spelling and alphabetical order.
    • Making it fun and easy to use, and providing multimedia content and the pronunciations of words is a considerable help as well
kirrkirr a warlpiri dictionary browser
Kirrkirr: A Warlpiri dictionary browser

(Jansz 1998; Jansz, Manning and Indurkhya 1999)

  • An environment for the interactive exploration of dictionaries.
  • Current work has just been with Warlpiri, the design is general (Arrernte coming soon!)
  • Attempts to more fully utilise graphical interfaces, hypertext, multimedia, and different ways of indexing and accessing information
  • It can either be run over the web [high bandwidth] or run locally (here Java’s main advantage is cross-platform support).
overview
Overview
  • Animated Graph layout of word relationships
overview1
Overview
  • Graph layout
  • Formatted entries
overview2
Overview
  • Graph layout
  • Formatted entries
  • A Notes facility for ‘jotting in the margin’
overview3
Overview
  • Graph layout
  • Formatted entries
  • Notes
  • Multimedia: audio, pictures
overview4
Overview
  • Graph layout
  • Formatted entries
  • Notes
  • Multimedia
  • Advanced searching interfaces
overview5
Overview
  • Graph layout
  • Formatted entries
  • Notes
  • Multimedia
  • Advanced searching
  • Semantic Domain Browsing
overview6
Overview
  • Graph layout
  • Formatted entries
  • Notes
  • Multimedia
  • Advanced searching
  • Semantic Domain Browsing
  • Others in planning: formatting (XSL) editing, figuration patterns.
  • These attempt to cater to users with different interests and competence levels
mrd structure
MRD Structure
  • The internal structures of current Machine Readable Dictionaries (MRDs) usually merely mimic the structure of the printed form (Boguraev 1990)
  • Some work, notably WordNet (Miller 1995) has involved a fundamental rethinking of dictionary content and organisation (in WordNet, organisation via “synsets” which are related via links of part, subkind, opposite)
  • But there has been little in the way of software to make such research truly usable by different communities of users.
the lexical database
The lexical database
  • Original materials stored in an ad hoc format of markup using backslash codes with some (rather odd) nesting of structural tags
  • These were converted to XML using an error-correcting stack-based parser (written in PERL).
    • The inconsistency and flexibility of dictionary entries actually made this a surprisingly difficult task.
    • But parser tries to impose data integrity
  • Use of XML gives a clear structure to the lexical data, and makes available many (free) tools
  • Result remains a portable, tangible text file
xml indexing challenges
XML indexing - challenges
  • Few XML parsers make single entries retrievable from the file
  • Typically, the entire XML document is put in memory
  • This is not practical when parsing significant XML databases (e.g., the Warlpiri dictionary is approx. 10Mb).
xml dictionary indexing xdi
XML Dictionary Indexing (XDI)
  • Hierarchical structure of XML lends itself to indexing
    • Each entry in the XML file can be considered as a separate entity
  • To make the Warlpiri dictionary usable for Kirrkirr an ad hoc indexing system was developed
    • Uses a slightly modified Ælfred XML parser
    • Entries indexed by headword in a separate index file
  • The system returns an XML document object containing the single dictionary entry, facilitating:
    • processing for related words (Graph layout)
    • XSL processing to HTML
slide20

Kirrkirr’s XML Index Process

Kirrkirr

Dictionary Browser

XML Parser

XML Document

Object

HTML document

+

XSL file

XSL Processor

Index in Memory

XML Formatted Warlpiri dictionary file

headword  file position

headword  file position

headword  file position

<DICTIONARY>

<ENTRY>

...

</ENTRY>

<ENTRY>

...

</ENTRY>

<ENTRY>

...

</ENTRY>

</DICTIONARY>

Across file system or web

xdi in kirrkirr
XDI in Kirrkirr
  • The XML indexing process considerably improves efficiency as only requested entries are parsed
  • Parsed entires are kept temporarily in a cache
  • Thus Kirrkirr uses XML as a median between the structure and indexing of a relational database, with the freedom and functionality of text.
xql potential
XQL - Potential
  • An alternative to investigate for the future is using a standard query language – such as XQL – to get material out of the XML dictionary, rather than using our ad hoc index.
  • At the moment not a huge issue since most retrieval is focussed on components of a particular word
xql optimizations
XQL - Optimizations
  • Revamp data structure
    • reduce redundancy, amount to load at start-up
  • PDOM (Persistent Document Object Model)
    • represents XML document as a collection of objects in a tree like model
  • XQL (Extensible Query Language)
    • query language for XML
    • e.g. /DICTIONARY/ENTRY[9]
    • DICTIONARY/ENTRY[HW='jaja']
performance startup time
Performance - Startup time
  • Impact on Startup time.
customised presentation of dictionary content
Customised Presentation of Dictionary Content
  • Produced dynamically from the XML by using XSL (via James Clark’s XT)
  • XSL allows easy modelling of some user preferences.
  • This is useful as many users find information overload quite confusing and demotivating
  • Can produce bilingual or monolingual dictionary
  • Opportunities for various output styles, and formats such as RTF or TeX for printing.
performance xsl presentation
Performance - XSL Presentation
  • Creates minimal load on the application
  • Requires file creation permission for the applet
  • Takes load off file system (no need for 9000+ pre-generated files)
  • Gives the user the opportunity to customise the formatting.
user study
User study

Mim Corris & Jane Simpson

  • User testing with Warlpiri children (primary and secondary students), adults and teachers.
  • Purely qualitative observational study of dictionary use. (Doing anything much else would be difficult)
  • Teachers using a domain-specific dictionary extract still found the interface more efficient to use for language tasks.
initial reactions enthusiastic
Initial reactions - enthusiastic
  • Despite teachers concerns that the system would be too hard for children, primary students used the software with relative ease.
  • Students were given the opportunity to spend ‘free time’ with Kirrkirr
    • time was spent looking up unfamiliar words from the day before.
conclusions
Conclusions
  • While we have focused our research on Warlpiri, the system can be easily applied to other languages
  • The Key to the effectiveness of the browsing interfaces is that the user has the ability to customise their functionality due to the flexibility of the XML & Kirrkirr technology
  • Throughout this research, the educational interests of the user have been the highest priority.
  • Hope to better understand the usefulness & practicality of innovative dictionary browsing environments.
slide30

Links

  • Kirrkirr homepage:www.sultry.arts.usyd.edu.au/kirrkirr
  • Kevin’s Thesis Homepage: www.sultry.arts.usyd.edu.au/kjansz/thesis
kirrkirr software for browsing and visual exploration of a structured warlpiri dictionary1

Kirrkirr: Software for browsing and visual exploration of a structured Warlpiri dictionary

Kevin Janszkjansz@sultry.arts.usyd.edu.au

Department of Linguistics, University of Sydney, Australia

Christopher Manning

Departments of Computer Science and Linguistics, Stanford University, USA

Nitin Indurkhya

School of Applied Science, Nanyang Technological University, Singapore