1 / 27

M ātāpuna Dictionary Database System

M ātāpuna Dictionary Database System. The. Open Source Multi-user Web-based. Dictionary Writing System. Dave Moskovitz DWS 2004, Brno. Outline. M ātāpuna – Dave Moskovitz – www.thinktank.co.nz. Background info Design criteria Functions Database structure Future development Lab

hollie
Download Presentation

M ātāpuna Dictionary Database System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mātāpuna Dictionary Database System The • Open Source • Multi-user • Web-based Dictionary Writing System Dave MoskovitzDWS 2004, Brno

  2. Outline Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Background info • Design criteria • Functions • Database structure • Future development • Lab • Call for collaboration

  3. Background – New Zealand / Aotearoa / Māori Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • 4m people; 268,000 km2 • 15% Māori; 1 in 4 of those speakMāori • Median age 22; median income NZD14,000(compared to 35 and 18,500 for pākehā) • Māori is an official language, polynesian language group • Uses standard roman character set with macrons

  4. Background – New Zealand in the Pacific Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

  5. Background – The Mātāpuna project Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • First monolingual dictionary of Māori, written from Māori cultural perspective • Target of 20,000 entries (1 entry = 1 definition) • Designed for language learners with some proficiency • 3+ year project under auspices of Te Taura Whiri i te Reo Māori / The Māori Language Commission • 4 writers, one editor, one lexicographer, one project manager, admin support, one geek

  6. Background – The Mātāpuna team Mātāpuna – Dave Moskovitz – www.thinktank.co.nz Pou Temara Phil Matthews Te Waireka Walker Ruka Broughton Wiha Te Rakihawea Hēni Jacobs Sharon Armstrong Not in photo: Te Awanuiārangi Black, Te Haumihiata Mason, Dave Moskovitz

  7. Background – Dave Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • BA (Hons) Comp Sci Univ. California Berkeley • Began PhD in Applied Linguistics – NZ Sign Language phonology • 25 years in IT industry • Background in Application Development, Systems Architecture, System Performance, Internet • 3rd lexicography project, after Dictionary of NZ Sign Language and Oxford NZ Dictionary • Open Source bigot

  8. Background – Software Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Free Software – Open Source – GPL • Uses Linux, Apache, mod_perl, Postgres, runs on any old hardware (eg Pentium 600) • Browser based • About 4,000 lines of Perl code • Won Computerworld excellence award for use of IT in Government

  9. Open Source is Good for Lexicography Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Free • Market is too small to support proprietary software • Everyone’s needs are unique – and you can modify the source code to suit • Open source programmers not hard to find • Low risk and futureproof: no vendor lock-in • Everyone helps each other • Software is open, but data is not (necessarily)

  10. Design Criteria Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Easy to use by untrained lexicographers • Support workflow and management as well as entry • End-to-end processing • Produce printed output as well as web access • Multiuser • Multilingual interface, easy to add languages • Unicode-based, allows any character set to be used

  11. Sample Output Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

  12. Functions Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Add • Search • Edit • Corpus search • Reports

  13. Functions - Add Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

  14. Functions - Search Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

  15. Functions - Edit Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

  16. Functions – Corpus search Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

  17. Functions – Reports Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

  18. Functions – Validation Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Field-based, including:- orthography- punctuation- blank- undefined word / not in defining vocab- synonym rules

  19. Functions – Workflow Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Basic workflow:Add → Self check → Editor 1 → Editor 2 • Editor can make minor changes, or send the entry back to the owner • Owner is notified of any changes by email • You can always view the history of an entry

  20. Functions – Synonym handling Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Entries allow for synomym ‘families’ • Master – slave (tuakana – teina) relationship • Masters can’t have masters and slaves can’t have slaves • Slave definitions printed from master • All cross-references managed

  21. Functions – Multilingual interface Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • 186 text snippets • Can add additional languages

  22. Functions – Multilingual interface Mātāpuna – Dave Moskovitz – www.thinktank.co.nz

  23. Database Structure Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • wordclass • category • examplesource • headword • qastatus • hwarchive • matapunauser • activityjournal

  24. Future development Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Multiple citations • Bilingual / multilingual • More corpus material (and better corpus performance) • Advanced search • Better user administration • XML / SGML export • More languages • … what do you want or need ????

  25. Lab – Words from the Olympics Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • 15 users, 15 categories of words • Rawiri is the editor • Practise entering definitions, linking synonyms, playing with major and minor senses, searching, breaking validation rules … • Be nice to Rawiri, he can send work back to you to get fixed

  26. Call for collaboration Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • This is Free Software • Use it and contribute enhancements • It’s robust and capable of producing a major lexicographical work • We are interested in your feedback and participation

  27. Call for collaboration Mātāpuna – Dave Moskovitz – www.thinktank.co.nz • Contact:Dave MoskovitzThinktank Consulting LimitedPO Box 15-212Wellington, New Zealanddave@thinktank.co.nz+64 27 220 2202

More Related