1 / 35

Dictionaries for the Human Language Technologies virtual network

Dictionaries for the Human Language Technologies virtual network. Dr Mariëtta Alberts Focus Area Manager Standardisation and Terminology Development Pan South African Language Board (PanSALB). Outline of presentation. Introduction Reviewing Human Language Technologies Scope of HLT

maja
Download Presentation

Dictionaries for the Human Language Technologies virtual network

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dictionaries for the Human Language Technologies virtual network Dr Mariëtta Alberts Focus Area Manager Standardisation and Terminology Development Pan South African Language Board (PanSALB)

  2. Outline of presentation • Introduction • Reviewing Human Language Technologies • Scope of HLT • Potential of HLT • Multilingualism and HLT • The South African HLT initiative • History of South African HLT project • National Facility • South African HLT model • Terminology Training initiative of PanSALB • Conclusion Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  3. 1. Introduction • South Africa is on the verge of establishing a Human Language Technology (HLT) Centre • The Centre will probably be managed as a national facility • It will provide an appropriate and sustainable virtual (or otherwise) infrastructure conducive to the development and effective management of reusable electronic text and speech resources Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  4. 2. Reviewing Human Language Technologies (HLT) • Human Language Technologies are enabling technologies • They enable human beings to interact with computers by using human language (text and speech) Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  5. Human Language Technologies range from: • high-level parsing and machine translation • applications in education and training • public service (e-governance and e-commerce applications) • voice-operated educational systems • voice-operated commercial systems that can be used by illiterate people Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  6. Human Language Technologies: • Provide interfaces that enable spoken human-machine interaction (telephone-based information systems, automated booking systems); • Provide linguistic assistance (spelling and grammar checking) • Provide access to multilingual polythematic information • Empower people to actively participate in the Information Society Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  7. 2.1 The scope of HLT: • Text based language processing • Text analysis (e.g. spellcheckers, term extraction, search engines) • Summarisation • Text translation • Speech processing • Speech recognition (e.g. desktop or telephony environment) • Speech synthesis Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  8. 2.2 Potential of HLT: • Access for all to the information era • Enhanced mother-tongue or first language teaching • Affordable multilingual documents • Improved functionality and quality of languages • Contact with the developing-world context Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  9. Potential of HLT... • Availability of multilingual words and polythematic terminology: indicator of development • Specialised communication has a central axle or hub in terminology • Standardised terminology contributes to quality of translations, interpreting and communication • Streamlined translation and interpreting services provide competitive advantages Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  10. 2.3 Multilingualism and HLT: South African situation • South Africa has a severe illiteracy rate • Only 22% of the citizens can function through medium of English • A small percentage of South Africans have access to computers - fewer still are IT literate • The divide is even greater in the rural versus urban scene Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  11. Effective e-government is necessary (i.e. birth certificates, identity documents, marriage and death certificates, telephone, electricity and water bills, traffic fines, etc.) • All citizens should have access to information in the languages they understand best (e.g. 11 official languages; South African Sign Language; Khoe and San languages) • Government should communicate to citizens in their own languages regarding key services (e.g. health; safety and security; education; postal services; justice (courts); banks (economy); media (electronic and print); labour (jobs); social welfare (pensions); etc.) Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  12. Language Policy and Legislation • Multilingual policy since 1994 - South African Constitution of 1996 (Act 108 of 1996) • Mechanisms of protecting and promoting linguistic rights were put in place • Section 6 of the South African Constitution specifically mentions the principles of language policy which takes into consideration the multilingual nature of the South African society Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  13. Establishment of PanSALB • The Pan South African Language Board (PanSALB) (Act 59 of 1995) was established: • to develop, promote and ensure use of South Africa’s eleven official languages, South African Sign Language (SASL) and the Khoe and San languages, and • to promote respect for other languages used in the country (e.g. heritage languages (Dutch, French, German, Hindu, KiSwahili, Portuguese, Tamil, etc. ) Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  14. PanSALB ensures the implementation of the National Language Policy Framework (NLPF) to ensure access to services to all citizens through: • 9 Provincial Language Committees (PLCs) • Assist Provinces with language policy formulation and implementation • 13 National Language Bodies (NLBs) • Standardisation (e.g. spelling and orthography rules) • Terminology development • Dictionary needs (general vocabulary) • Literacy and media • Research and Education • 11 National Lexicography Units (NLUs) • Compilation of comprehensive monolingual and other types of dictionaries Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  15. 3. The South African HLT initiative 3.1 History • Lexinet research programme of HSRC (1988) (Wordnet, Termnet, Docnet, Transnet, Ailang, etc.) • PanSALB and DACST (now DAC) initiated the HLT project in 1999 • The former Minister of DACST appointed a panel of experts to investigate the establishment of a HLT virtual network • The HLT task team concluded that a HLT National Facility should be established • The developers of the envisaged HLT National Facility should ensure that HLT advance multilingualism in different respects, i.e.: Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  16. Key government documents in the languages the citizens can understand best • Electronic systems to connect lexicographers and terminologists with other language practitioners • Electronic systems to disseminate lexicographical and terminological data • Electronic systems to connect translators and other language workers with word and term banks • Central government assistance to meet communication needs of all its citizens • Local and provincial governments to serve as focal points of information dissemination (e.g. multipurpose community centres) Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  17. 3. The South African HLT initiative 3.2 National Facility • Purpose of HLT project: • to fast track the use and development of indigenous languages • to promote the SA government’s policy of multilingualism • to facilitate better service delivery for citizens to access or supply information in any of the official languages Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  18. Basic premises for the development of HLT: • development and effective management of reusable text and speech resources in all official languages of SA; • capacity building with respect to research and development in the field of HLT; and • stimulation of an HLT industry that will provide language-based electronic products which, in turn, will be applicable in all relevant sectors, especially in the government sector. Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  19. 3.3 SA Human Language Technologies Model • The South African HLT model is based on a model being implemented by the European Union (EU) • EU model is effectively implemented in the EU Framework Programmes (FP 3/4/5/6) • South African HLT model will grow exponentially as expertise and resources are developed Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  20. Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  21. 3.3.1 Aims of envisaged HLT virtual network • An e-government process needs to provide citizens with: • Access to online facilities • Required and necessary service delivery • Infrastructure to make it work • Two basic prerequisites are: • A technical infrastructure (IT access; proven and multipurpose IT systems; online language services) • Human capital(capacity building e.g. trained and reskilled language practitioners) Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  22. 3.3.2 Identified needs: • Low general awareness level regarding HLT benefits • Interdisciplinary curricula at tertiary level to advance HLT development • Systematic presentation of short dedicated HLT courses • Theoretical and practical training in the fields of lexicography and terminology • Job creation should be carefully planned • Upgrade and maintain a knowledge base on HLT Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  23. 3.3.3 Proposed three-step strategy for development of HLT model: • Step 1: Applied research and capacity building, production of language resources, development of enabling technologies and of a HLT industry. • Step 2: Development of a legal framework to ensure systematic acquisition, administration and conservation of electronic language resources. • Step 3: Development of an infrastructure to manage the implementation of the proposed HLT model Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  24. Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  25. 3.3.4 Role players • Government services: national, provincial and local (e.g. e-government, e-learning, e-commerce, etc.) • Parastatal institutions (e.g. PanSALB) • Private sector • Academia (tertiary education) • Education (primary and secondary education) Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  26. 3.3.5 Progress • Parsing (Zulu and other African languages) by Special Interest Group (SiG), African Languages Association of Southern Africa (ALASA) • Speech recognition (Tourism: pilot booking service) • Amalgamated Banks of South Africa (ABSA) multilingual pilot project: ATM screen prompts and telephone banking prompts in African languages (Zulu, Xhosa and South Sotho) Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  27. Progress... • TISSA (Telephone Interpreting Service of South Africa) (all ports of entry; health services; police charge offices; etc.) • Spellcheckers: Afrikaans developed by North-West University; African Languages by University of Pretoria/North West University; future development combined effort • Microsoft human/machine interface: combined effort re terminology development • Afrilingo: e-learning tool for language acquisition (11 official SA languages) Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  28. Progress ... • TshwaneLex: dedicated computer software program for data capturing (lexicography) • 11 National Lexicography Units (NLUs) of PanSALB: Monolingual dictionaries for each of the 11 official South African languages • NLUs: Data collection and building of corpora • NLUs: on-line dictionaries (e.g. Afrikaans, Northern Sotho (Sesotho sa Leboa)) • TshwaneTerm: dedicated computer software program for data capturing (terminology)?? Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  29. Progress ... • National term bank (multilingual, polythematic): Terminology Coordination Section (TCS) of the National Language Service (NLS), Department of Arts and Culture (DAC) • Latin terminology: interactive multilingual e-learning project (PanSALB, CLTAL, Trydian Interactive) • Mathematics on-line dictionary project: South African Multilingual Mathematical Lexicon (SAMML) Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  30. Lexicographical and Terminological information available on HLT virtual network • SA Government has approved the development of a human language technology (HLT) virtual network • All lexicography and terminology endeavours to be part of HLT virtual network • For multilingual words and terms to be available on HLT virtual network to end-users (subject specialists, students, language practitioners, general public) - dictionaries are needed!!! Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  31. 4. New terminology training initiative from PanSALB: • Members of TCs, NLBs: Guidelines to verify and authenticate terms • Skills development: Language practitioners: terminologists, lexicographers (e.g. NLUs), translators, interpreters, linguists, teachers, journalists, language students, etc. • Skills development: subject specialists • Reskilling: Unemployed language workers Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  32. Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  33. NLUs Lexicography School for Languages NLBs Terminology Statistics PLCs University A Human Language Technology Virtual Network Zoology TCS NLS Psychology LUs Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  34. 5. Conclusion: • Development of skills • Enhancement of South African languages • Development of languages into functional languages • Dissemination of multilingual polythematic (speech and text) information within the South African community • Better communication among all citizens in different spheres of life • Improvement of computer literacy Afrilex,13 - 15 July 2005, UFS, Bloemfontein

  35. “Utilising technology for the development of the South African languages and developing these languages for use with Human Language Technology applications such as spellcheckers, translation memories and speech-recognition systems will enhance the status of the indigenous languages and will result in increased job opportunities in the language field.” Dr Ben Ngubane (former Minister of Arts Culture Science and Technology) 2003

More Related