280 likes | 302 Views
A comprehensive overview of the creation of Tamil WordNet for enhancing machine translation capabilities, including details on resources, funding, software used, statistics, modules, and project timeline.
E N D
29 April 2013 DRAVIDIAN WORDNET
29 April 2013 Tamil Thesaurus • Preliminary work on lexical semantics. • Monumental work on Tamil Thesaurus. • Ontologicial classification of Tamil Vocabulary • Rajendran, S. (2001) tamizhc coRkaLanjciyam. (in Tamil).Tamil University Publication.
29 April 2013 Domains in Tamil Thesaurus • Tamil vocabulary is classified into four major domains: • Entities • Abstracts • Events and • Relationals
29 April 2013 Lexical Hierarchy of the Domain `Construction’ parumaippeyarkaL `concrete nouns ' aHRinaippeyarkaL `irrational nouns' uyirillaatavai `non-living beings' uruvaakkiya maRRum patananjceyta poruTkaL `manufactured and processed items' kaTTappaTTavai `constructed'
29 April 2013 Nouns Relations Example Synonymy viiTu ‘house’ - illam `house‘ Hypernymy-Hyponymy paLLi 'school' – kalviccaalai 'educational institution‘ Hyponym-Hypernymy kalluuri 'college' – aracukkalluuri `govt college‘ Holonymy-Meronymy ndaaRkaali 'chair' - kaal 'leg‘ Meronymy-Holonymy cakkaram 'wheel' to vaNTi 'cart‘ Related Verb paTittal ‘reading’ – paTi ‘read’ Coordinate terms kooyil `temple' – macuuti 'mosque'
29 April 2013 Verbs Relations Example Synonym paTi ‘read’ – payilu ‘read’ Hypernymy cuvai ‘taste’ – uNar Troponymy keeL ‘ask’– kenjcu ‘plead’ Nominal paruku `drink’ – parukutal `drinking’ Related Noun kaNTupiTi `discover’ – kaNTupiTippu `discovery’
29 April 2013 Tamil WordNet • Objective: To build a WordNet for Tamil to enhance machine translation • Resources: Tamil Thesaurus, Technical Glossaries (Tamil University Publications), Princeton English WordNet • Funding Agency: Tamil Software Development Fund, Tamil Virtual University - 4 lacs • Time Frame: 18 months
29 April 2013 Details • Software used • Front-end – Java • Back-end - Mysql Database • Project Deliverables • 50k root words • Relationships coded • Stand-alone and web-based interface • Embedded morphological analyser
29 April 2013 Statistics • Total Words: 50497 • Unique Senses: 41013 • Nouns: 46710 • Verbs: 2881 • Adjectives: 416 • Adverbs: 490
29 April 2013 Total Words: 50497Unique Senses: 41013 Project Completed (2004) http://www.nrcfosshelpline.in/code/wiki/TamilWordnet
29 April 2013 Standalone version – Tamil WordNet (Snapshot)
29 April 2013 Standalone version – Tamil WordNet (Snapshot)
29 April 2013 Web-version – Tamil WordNet (Snapshot)
29 April 2013 Web-version – Tamil WordNet (Snapshot)
29 April 2013 First Effort on Dravidian Languages • National Workshop on WordNet for Dravidian Languages • 2-3 June 2003 • Organized by AU-KBC Research Centre, Chennai, Central Institute of Indian Languages, Mysore and Tamil University. • Hands-on experience on specified domain – construction • Report available on Global WordNet website
29 April 2013 MHRD Project • Creation of Machine Translation tools and resources for English to Dravidian Languages: Pilot Study • to develop Machine Translation(MT) system and needed linguistic resources for • English-Dravidian languages(Tamil, Malayalam, Telugu and Kannada), • This would facilitate the creation of rich educational contents in Indian languages. • This research effort is to make all the tools and translation system to be based on Machine Learning methodologies so that computer graduates and other such non-linguists are able to immediately participate in the national mission on literacy by contributing additional tools for language translation.
29 April 2013 Modules • Module 1: Machine Translation • aims at developing teaching material corresponding to the tools developed so that it can be delivered as part of undergraduate computer science and engineering curriculum on data mining/machine learning. • This will ensure a critical amount of man power required for sustaining translation effort needed for national mission on education. • Module 2: Training • aims at training 500 faculties selected from across the country on machine translation methodologies using machine learning techniques. • Module 3: Dravidian WordNet • aims at developing a Dravidian WordNet required for translation.
29 April 2013 Total Budget • IIT Bombay – 15 lacs • Amrita University – 40 lacs • Tamil University – 15 lacs • University of Hyderabad – 15 lacs • Dravidian University – 15 lacs • Time Frame • 12 months • March 30, 2009 – March 29, 2010
29 April 2013 Work done • Part of a one year Pilot project involving Tamil, Telugu, Malayalam and Kannada • Funding Agency: Ministry of HRD • Duration: 18 months (July 2009-Dec 2010) • Deliverable: 13k synsets • 7k synsets linked to IndoWordNet, available at http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php
29 April 2013 Statistics on Dravidian WordNet
29 April 2013 Publications • `Tamil WordNet’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran) • `Building a WordNet’ for Dravidian Languages, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Rajendran, S.Gopakumar, V.Dhanalakshmi) • `Representation of Kinship in WordNet’, Proceedings of the 9th International Tamil Internet Conference, Coimbatore, 23-27 June 2010 (S.Arulmozi) • `Polysemy in Tamil and other Indian Languages’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi & Panchanan Mohanty) • `Telugu WordNet’, Proceedings of the Fifth Global WordNet Conference, IIT-Bombay, 31 Jan-4 Feb 2010 (S.Arulmozi)
29 April 2013 First IndoWordNet Workshop • Amrita University • 11-14 June 2009 • Necessity for developing linked WordNets of different languages of India was stressed • Challenges such as language divergence, lexical semantics, embedding WordNet in MT and cross-lingual search applications can be achieved • Participation from groups: Hindi, Marathi, Sanskrit, Nepali, Assamese, Bodo, Manipuri, Konkani, Kashmiri, Tamil, Telugu, Malayalam, Kannada • Proposal on Indhradhanush
29 April 2013 Dravidian WordNet • Present Project • Funded by DIT.
29 April 2013 Links • Tamil WordNet – Open Source http://www.nrcfosshelpline.in/code/wiki/TamilWordnet • VerbNet (English) http://verbs.colorado.edu/~mpalmer/projects/verbnet.html • Princeton English WordNet http://wordnet.princeton.edu/ • Global WordNet Association http://www.globalwordnet.org/ • WordNets in the World http://www.globalwordnet.org/gwa/wordnet_table.htm • WordNet Bibliography http://lit.csci.unt.edu/~wordnet/ • IndoWordNet http://www.cfilt.iitb.ac.in/wordnet/webhwn/wn.php
29 April 2013 Thank you!