automatic translation of nominal compound into hindi n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Automatic Translation of Nominal Compound into Hindi PowerPoint Presentation
Download Presentation
Automatic Translation of Nominal Compound into Hindi

Loading in 2 Seconds...

play fullscreen
1 / 39

Automatic Translation of Nominal Compound into Hindi - PowerPoint PPT Presentation


  • 221 Views
  • Uploaded on

Automatic Translation of Nominal Compound into Hindi. Prashant Mathur IIIT Hyderabad. Soma Paul IIIT Hyderabad. OUTLINE. What is a Nominal Compound (NC) ? Translation variation of English NC into Hindi Motivation Approach Results Future Work Bibliography. Nominal Compound.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Automatic Translation of Nominal Compound into Hindi' - chas


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
automatic translation of nominal compound into hindi

Automatic Translation of Nominal Compound into Hindi

Prashant Mathur

IIIT Hyderabad

Soma Paul

IIIT Hyderabad

outline
OUTLINE
  • What is a Nominal Compound (NC) ?
  • Translation variation of English NC into Hindi
  • Motivation
  • Approach
  • Results
  • Future Work
  • Bibliography

Prashant Mathur

nominal compound
Nominal Compound
  • A construct of two or more nouns.
  • The rightmost noun being the head, preceding nouns modifiers.

Oil Pump: a device used to pump oil

Customer satisfaction indices : index that indicates the satisfaction rate of customer

  • Two word nominal compounds are the object of study here

Prashant Mathur

outline1
OUTLINE
  • What is a Nominal Compound (NC) ?
  • Translation variation of English NC into Hindi
  • Motivation
  • Approach
  • Results
  • Future Work
  • Bibliography

Prashant Mathur

variation in translating english nc into hindi
Variation in translating English NC into Hindi

As Nominal Compound

  • ‘Hindu texts’ hindU SastroM, ‘milk production’ dugdha utpAdana

As Genitive Construction

  • ‘rice husk’ cAval kI bhUsI,
  • ‘room temperature’ kamare ka tApamAna

As one word

  • Cow dung gobar

As Adjective Noun Construction

  • ‘nature cure’ prAkratik cikitsA, ‘hill camel’ ‘pahARI UMTa’

As other syntactic phrase

  • wax work mom par kalAkArI ‘work on wax’,
  • body pain SarIr meM dard ‘pain in body’

Others

  • Hand luggage haat meM le jaaye jaane vaale saamaan

Prashant Mathur

outline2
OUTLINE
  • What is a Nominal Compound (NC) ?
  • Translation variation of English NC into Hindi
  • Motivation
  • Approach
  • Results
  • Future Work
  • Bibliography

Prashant Mathur

motivation
Motivation
  • Issues in translation
    • Choice of the appropriate target lexeme during lexical substitution; and
    • Selection of the right target construct type.
    • Occurrence of NCs in a corpus is high in frequency, however individual compound occur only a few times.
    • NCs are too varied to be precompiled in an exhaustive list of translated candidates

Prashant Mathur

therefore
Therefore …
  • NCs are to be handled on the fly.
  • The task of translation of NCs from English into Hindi becomes a challenging task of NLP

Prashant Mathur

with google translator
With Google translator
  • When tested on the same dataset that has been used to evaluate our system

Prashant Mathur

outline3
OUTLINE
  • What is a Nominal Compound (NC) ?
  • Translation variation of English NC into Hindi
  • Motivation
  • Approach
  • Results
  • Future Work
  • Bibliography

Prashant Mathur

approach
Approach
  • Translation template generation
  • Extraction of NC from English corpus
  • Sense disambiguation of components
  • Lexical substitution of the component nouns using Bi-Lingual Dictionary
  • Preparing translation candidates
  • Corpus Search of translation candidates and their Ranking.

Prashant Mathur

translation template generation
Translation Template Generation

We did the survey of 50,000 sentences of parallel corpora and found out the following construction types.

Prashant Mathur

some templates
Some Templates

Total of 44 templates were formed, some of them are showed below.

  • Nominal Compound
    • H1 H2
  • Genitive
    • H1 kA H2
    • H1 ke H2
    • H1 kI H2
  • Long Phrases
    • H1 pe H2
    • H1 meM H2
    • H1 par H2
    • H1 ke xvArA H2
    • H1 se prApwa H2
  • Adjective
    • H1-ikA H2
  • Single-Word
    • H1

Prashant Mathur

approach1
Approach
  • Translation template generation
  • Extraction of NC from English corpus
  • Sense disambiguation of components
  • Lexical substitution of the component nouns using Bi-Lingual Dictionary
  • Preparing translation candidates
  • Corpus Search of translation candidates and their Ranking.

Prashant Mathur

extraction
Extraction

1Tree-Tagger is a POS-Tagger which gives some extra information.

Word  Tree-Tagger  word POS TAG lemma

rods  rods_NNS_rod

2As assumed previously we consider only Noun-Noun formation as Nominal Compound.

Prashant Mathur

approach2
Approach
  • Translation template generation
  • Extraction of NC from English corpus
  • Sense disambiguation of components
  • Lexical substitution of the component nouns using Bi-Lingual Dictionary
  • Preparing translation candidates
  • Corpus Search of translation candidates and their Ranking.

Prashant Mathur

slide18

Lexical Substitution

Prashant Mathur

step 3 sense disambiguation of components
Step 3 : Sense Disambiguation of components
  • To reduce the number of translation candidates
  • Example :

Campaigns for road safety are organized to keep everyone safer on the Indian roads

Prashant Mathur

slide20

WordNet Sense-Relate by Ted Peterson.

  • 80% accuracy in case of NC disambiguation.

Prashant Mathur

approach3
Approach
  • Translation template generation
  • Extraction of NC from English corpus
  • Sense disambiguation of components
  • Lexical substitution
  • Preparing translation candidates
  • Corpus Search of translation candidates and their Ranking.

Prashant Mathur

lexical substitution
Lexical Substitution
  • Now how to translate it into Hindi ?
    • We don’t have direct wordnet mapping from English to Hindi.
    • We use alternative method to translate.

Prashant Mathur

step 4 lexical substitution
Step 4: Lexical Substitution
  • Acquire all possible translations for all the words within a synset.

Prashant Mathur

contd
Contd…
  • Select those Hindi words which are common translations to all English words of a synset, if there is one

Selected words are: maarg, saDak, raastaa

All words are selected

Prashant Mathur

approach4
Approach
  • Translation template generation
  • Extraction of NC from English corpus
  • Sense disambiguation of components
  • Lexical substitution
  • Preparing translation candidates
  • Corpus Search of translation candidates and their Ranking.

Prashant Mathur

step 5 preparing translation candidate
Step 5: Preparing Translation Candidate
  • For “road safety”
  • Templates generated are:

mArga para surakRA,

mArga surakRA,

SaDak para surakRA,

SaDak kI surakRA

...

Prashant Mathur

approach5
Approach
  • Translation template generation
  • Extraction of NC from English corpus
  • Sense disambiguation of components
  • Lexical substitution
  • Preparing translation candidates
  • Corpus Search of translation candidates and their Ranking.

Prashant Mathur

step 6 corpus search
Step 6 Corpus Search
  • Hindi Corpus (Raw): 28 million words
  • Indexed
  • Search – pattern match

Prashant Mathur

example
Example
  • election time  cunAva ke samaya
  • temple community  maMxira kA samAja
  • marriage customs  vivAha kI praWA

But we didn’t found any translation for

road safety  Ф

Prashant Mathur

ctq corpus based translation quality
CTQ (Corpus based Translation Quality)
  • Rate a given translation candidate for both
    • The fully specified translation and
    • Its parts in the context of the translation template in question.

CTQ (w1H , w2H , t) = αP(w1H , w2H , t) +βP(w1H,t) P(w2H , t) P(t)

  • t is the translation template used
  • w1H, w2H are the translations of components of NC
  • α = 1, β=0 if P(w1H , w2H , t) > 0 (didn’t perform variation in α,β constants)

Prashant Mathur

contd1
Contd..
  • Example
  • road safety P(w1H , w2H , t) = 0
  • road  mArga, mArgake, mArgameM, saDaka, saDaka par …
  • safety  surakRA, kesurakRA, meMsurakRA, … so on
  • P (mArga, meM) * P(meM, surakRA) * P(meM) = (2.28*10-5) * (9.14*10-6) * (.286) = 6 * 10-11
  • P (mArga, kI) * P(kI, surakRA) * P(kI) = (1.35 × 10-5) * (3.82857143 × 10-5) * (.228) = 1.17 × 10-10
  • Higher probablity for “mArgakIsurakRA”

Prashant Mathur

ranking
Ranking
  • Baseline Ranking:
    • Count based ranking
  • A stronger ranking measure CTQ

(borrowed from Baldwin and Tanaka (2004))

Prashant Mathur

results
Results

62.1

56.2

54.1

53.6

50

46.1

28

28.5

24.6

19

24

14

Prashant Mathur

contd2
Contd..
  • Measure taken to improve recall:
    • By using genitives as default construct when translation for a NC is not found
  • Motivation:
    • We conduct one experiment on development data
    • We verify whether the NCs for which no translation found during corpus search can be legitimately translated as a genitive construct
    • We found the heuristics is working for 59% cases

Prashant Mathur

results1
Results
  • Using genitive as default construct where the system fails to produce a translation

57

54

44.5

24.8

Prashant Mathur

related works
Related works
  • Similar approaches (search of translation templates in the corpus) adopted in
    • Bungum and Oepen (2009) for Norwegian to English nominal compound translation
    • Tanaka and Baldwin (2004) for English to Japanese nominal compound and vice versa

Prashant Mathur

conclusion
Conclusion
  • Novelty of our approach
    • Using a WSD tool on Source language - to select the correct sense of nominal components
    • The result : The number of possible translation candidates to be searched in the target language corpus is significantly reduced.

Prashant Mathur

future work
Future Work
  • Multinary NC translation
  • Using semantic features provided in UW-Dictionary
  • Varying α & β in ranking technique to produce more effective results.

Prashant Mathur

bibliography
Bibliography
  • Translation by Machine of Complex Nominals: Getting it right
    • Tanaka and Timothy Baldwin
  • Translation Selection for Japanese-English Noun-Noun Compounds
    • Tanaka, Takaaki and Timothy Baldwin
  • Automatic Translation Of Noun Compounds
    • Rackow, Ido Dagan, Ulrike Schwall
  • Norwegian to English nominal compound translation
    • Bungum, Oepen

Prashant Mathur