1 / 36

Centre for Excellence in Computational Engineering and Networking (CEN ),‏ Amrita Vishwa Vidyapeetham

Centre for Excellence in Computational Engineering and Networking (CEN ),‏ Amrita Vishwa Vidyapeetham. A Novel Approach to Morphological Analysis for Tamil Language. Presented by M.Anand Kumar V.Dhanalakshmi CEN, Amrita. Guided by Dr.K.P.Soman Head, CEN Amrita University.

love
Download Presentation

Centre for Excellence in Computational Engineering and Networking (CEN ),‏ Amrita Vishwa Vidyapeetham

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  2. A Novel Approach to Morphological Analysis for Tamil Language Presented by M.Anand Kumar V.Dhanalakshmi CEN, Amrita. Guided by Dr.K.P.Soman Head, CEN Amrita University. Dr.S.Rajendaran Head, Dept.Linguistics Tamil University. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  3. Overview • Introduction • Tamil Morphological analyzer • Machine Learning(SVM) • Formulation(data creation) • GUI • Conclusion Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  4. Introduction • Grammar of any Language can be broadly divided into Morphology and Syntax. • Morphology studies deal with words and their construction. Syntax deals with how to put the words together in some order to make meaningful sentences Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  5. Introduction cont... • Morphological analysis is the process of segmenting words into morphemes and analyzing the word formation. • Morphemes are smallest meaning bearing unit. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  6. Introduction cont... • Morph analyzer is the basic tool towards building word processing tools like spell checkers, grammar checkers etc. • They are also the first step towards NLP systems that do deep linguistic processing like natural language interfaces, machine translation, search engines, text summarization, information extraction and story understanding system. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  7. TamilMorphologicalAnalyzer • Tamil language is morphologically rich and agglutinative. • Each root word is affixed with several morphemes. • Individual Analyzers for Noun and Verb. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  8. TamilNoun • Noun Root + Case Marker Euphonic Increment Plural Marker Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  9. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  10. TamilNoun Examples • வண்டுகளுக்கு = வண்டு+கள்+உக்கு • ஆற்றில் = ஆறு+ற்+இல் • மரத்தை = மரம்+த்+ஐ Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  11. TamilVerb • Verb Root+ Tense Marker PNG marker Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  12. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  13. TamilVerb Examples • படித்தான் = படி+த்த்+ஆன் • போனாள் = போ+இன்+ஆள் • வருகின்றேன் = வா+கின்ற்+ஏன் Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  14. MachineLearning • It is a branch of Artificial Intelligence concerned with the design of algorithms that learn from the examples. • A major focus of machine learning research is to automatically learn to recognize complex patterns and make intelligent decisions based on data. • Computers are made to learn based on data. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  15. MachineLearning • In machine learning all the rules including complex spelling rules are also handled based on classification. • Machine learning approaches don’t require any hand coded morphological rules . • Machine learning approaches are directly applied to all the natural language processing tasks. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  16. MachineLearning • For any machine learning approaches data creation plays the key role. • It needs only corpora with linguistical information. The morphological or linguistical rules are automatically extracted from the annotated corpora. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  17. Support Vector Machine • Support vector approaches have been around since the mid 1990s. • Support Vector Machine is a approach to supervised pattern classification which has been successfully applied to a wide range of classification problems. • Morphological Analyzer problem is converted into classification problem. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  18. SVMTool • SVMTool is an open source generator of sequential taggers based on Support Vector Machine. • Generally SVMTool is developed for POS tagging but here this tool is used in morphological analyzer for classification task. • The SVMTool software package consists of three main components, namely the model learner (SVMTlearn), the tagger (SVMTagger) and the evaluator (SVMTeval). Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  19. DrawbacksinRuleBasedSystem • Rule based approach. • Set of rules and dictionaries. • Morphemes Dictionaries. • In rule based approaches each rule works on the output of previous rule. So if one rule fails, it will affect the entire rule that follows. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  20. Formulation-Data Creation • Classification for verbs and nouns . • 32 paradigms for Verb • 25 for Noun . • 563 inflections for Verb • 313 for Noun . Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  21. Formulation-Data Creation • Classification for verbs and nouns . VERB NOUN 25 for Noun 313 Inflections for Noun 32 paradigms for Verb 563 inflections for Verb Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  22. Models • Two Models. • Model-I  Segmentation of Morphemes • Model-II  Grammatical Tagging of Morphemes and also to handle the morphotactics Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  23. Data Creation for Model-I • Romanization • Grapheme Segmentation. • Splitting Syllable. • Consonant-Vowel Representation. • Segmentation Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  24. Input Data Formulation Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  25. Sample Data • Model –I Training Data • Input -- Characters with Consonant-Vowel representation. • Output-- Characters with Morpheme Boundaries Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  26. Sample Data • Model –I (Segmentation of Morphemes)Training Data .of Morphemes Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  27. Model-I Input : padiththAn. Output : padi*thth*An*. Model I- Segment the morphemes from the given word Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  28. Model-II (Sample Data ) • Model –II Training Data • Input -- Morphemes • Output– Grammatical Category of Morpheme in a morphotactics order Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  29. EXAMPLE • Model –II (Morpheme Tagging) Training Data. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  30. Model-II Input : padi*thth*An*. Output : padi <Verb> thth <Past> An <3sm> Model II- Identifies the grammatical category of the morphemes in a morphotactics order Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  31. Implementation Pre-processing Segmentation of morphemes Identifying morpheme • Morphological analyzer is redefined as a classification task . Generally there are three steps in morphological analyzer. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  32. Implementation Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  33. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  34. Conclusion • This Novel approach for Morphological analyzer based on Machine learning gives better result . • The corpus created for this purpose can be used for developing Tamil spell checker and text processing tools. • We are currently implementing the same methodology for other Dravidian languages. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  35. References • Anandan. P, Ranjani Parthasarathy, Geetha T.V.2002. Morphological Analyzer for Tamil, ICON 2002, RCILTS-Tamil, Anna University, India. • Daelemans Walter, G. Booij, Ch. Lehmann, and J. Mugdan (eds.)2004 , Morphology. A Handbook on Inflection and Word Formation, Berlin and New York: Walter De Gruyter, 1893-1900 • Dhanalakshmi V, Anandkumar M, Vijaya M.S, Loganathan R, Soman K.P, Rajendran S,2008, Tamil Part-of-Speech tagger based on SVMTool, Proceedings of the COLIPS International Conference on Asian Language Processing 2008 (IALP), Chiang Mai, Thailand. 2008: 59-64. • Jes´us Gim´enez and Llu´ıs M`arquez,2006 SVMTool:Technical manual v1.3, August 2006. • John Goldsmith. 2001. Unsupervised Learning of the Morphology of a Natural Language. Computational Linguistics, 27(2):153–198. • Rajendran, S., Arulmozi, S., Ramesh Kumar, Viswanathan, S. 2001. Computational morphology of verbal complex. Paper read in Conference at Dravidan University, Kuppam, December 26-29, 2001. Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

  36. நன்றி Centre for Excellence in Computational Engineering and Networking (CEN),‏ Amrita VishwaVidyapeetham

More Related