1 / 22

A Simple English-to-Punjabi Translation System

A Simple English-to-Punjabi Translation System. By : Shailendra Singh. Introduction. Internet has influenced multilingualism and language industry Internet users require information in the language they understand comprehensively

gates
Download Presentation

A Simple English-to-Punjabi Translation System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Simple English-to-Punjabi Translation System By : Shailendra Singh

  2. Introduction • Internet has influenced multilingualism and language industry • Internet users require information in the language they understand comprehensively • Machine Translation (MT) as a computer system is needed to translate from a source language to target language • Currently English-to-Punjabi computer translation systems are mostly based on word to word translation only • The focus here is on a simple machine translation from English-to-Punjabi

  3. Introduction • English is structured in terms of Subject –Verb-Object • Punjabi is structured in terms of Subject –Object-Verb

  4. Literature Review • MT has numerous strategies applied over time • Strategies are ranging from direct approach to the latest ones like example based machine translation

  5. Direct Approach • Most Primitive • Translation is word-for-word or phrase-to-phrase • Need very large bilingual dictionary • Very little of language analysis is involved because mostly just based on dictionary • In short the translation result is very inaccurate and many errors • Example system : SYSTRAN

  6. Rule Based • Full with many different types of rule example are syntactic rule, lexical rules, lexical transfer rules, rules for syntactic generation, rules for morphology and etc • Starts with building morphological tree and transformed into syntactic tree and lastly into semantic tree (Hutchins 1994). • Crucial step is transformation of source language to target language • All the rules here actually refers to the particular grammars of the languages involved in translation • Example system : The Ariane, SUSY

  7. Rule Based • Advantages : Deep analysis on the translation process • Disadvantages : 1) Requires much linguistic knowledge. 2) Impossible to write rules that cover all a language. 3) Transformation rules are always specified for a single language pair and the system is therefore difficult to overlook. 4) Introduce inconsistency when the rules increase and involves a lot of cost

  8. Knowledge Based • Mainly describes a rule based system displaying extensive semantic and pragmatic knowledge of a domain, including an ability to reason, to some limited extent, about concepts in the domain. Arnold et al. (1994, page 190). • Mostly the features are the same like rule based • Distinctive in terms focused towards a particular domain thus minimizes ambiguity • Example system : KANT – translates electronic manuals

  9. Knowledge Based • Advantages : Avoid ambiguity thus gives a quality result on the translation • Disadvantages : 1)Focused just on a domain thus limits its capability 2)Domain chosen must have enough knowledge to accommodate the translation

  10. Statistical Based • Geared by an experiment in 1989 by a team from IBM • The results of experiment seams to be attractive and acceptable • This new methodology was fully based on statistical methods • Marked a new approach towards MT which is called as ‘corpus base’ • Vast corpus of language is a main component • Alignment of words and phrase will be done with the corpus and later calculate the probabilities (Hutchins 1994) • Example system : The Candide

  11. Statistical Based • Disadvantages : 1) Requires training on huge data with good quality bilingual corpora 2) Cannot work for complex translation as the process becomes too complex to handle

  12. Example Based • Translation is done by analogy • Relies on past translation examples • Past translation examples are regarded as accurate in term of syntactically, grammatically and also semantically • Example of translation are kept in a store also known as corpus. Hence this is also ‘corpus base’ • Sentence to be translated will be matched against examples in the database • The closest match will be selected and replaced according to cater for the input sentence

  13. Methodology • In this paper we are taking the EBMT approach • Justification for choosing EBMT : • Has eliminated the problems of tractability, scalability and performance which is found in older MT strategies – (extensive knowledge) • Very minimal formal work has been done on Punjabi language it self thus it builds a barrier if we would take rule based approach • In EBMT correct and accurate example translations are needed which fulfills to the situation

  14. Methodology • The first step in EBMT is preparing the corpus • Design for the corpus is as below :

  15. Methodology • The sentence to be translated is “I am going to eat vegetables” • First do sentence tokenizing • Following is morphological analysis and tagging of part of speech • Key matching with the ‘corpus’ • In this case the key is eat

  16. Methodology • Based on the template here the input sentence will be NP.PeatN output will be NP.NkhaanP The necessary filling of the template will be done based on the lexicon look up

  17. Methodology • “I am going to eatvegetables” • “Mehsavajikhaan lageaa hai”

  18. Analysis • Not much of linguistic knowledge is needed • Faster in terms of performance because not much of linguistic processing is involved

  19. Conclusion • EBMT is suitable in the case where there is not much of formal linguistic knowledge of languages involved is available • EBMT is good in the case where you do not need much deep analysis of linguistic

  20. References • Hutchins J. ( 1994), Research methods and system designs in machine translation a ten-year review, 1984-1994, International conference 'Machine translation: ten years on‘. • Arnold, D., Balkan, L., Humphreys, R. L., Meijer, S. & Sadler, L. (1994), Machine translation: an introductory guide, Blackwells /NCC, London.

  21. Thank You

  22. Q & A

More Related