1 / 19

EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA

EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA. GUIDE : Prof. Amitabha Mukerjee By : Amit Kumar (10074) Ankit Modi (10104). INTRODUCTION.

terri
Download Presentation

EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA GUIDE : Prof. AmitabhaMukerjeeBy : Amit Kumar (10074)AnkitModi (10104)

  2. INTRODUCTION A Complex Predicate (CP) is a multi-word compound that functions as a single verbEx : उसनेकिताबवापसकरदियामुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेने आते हैं - यह जानकार खुशी होती है कि आप किसी की मदद कर सकते हैं |

  3. INTRODUCTION CP = Word+ Light VerbEx : उसनेकिताबवापसकरदिया करदिया (CP) = कर(W) + दिया (LV) A Light Verb is a verb that has little semantic content of its own and it therefore forms a predicate with some additional expression, which is usually a noun.Ex : देना, लेना, पाना, उठाना

  4. PROBLEM STATEMENT Given a parallel English­Hindi corpora, we have to detect complex predicates  (CPs) Using the fact that a CP is a multi­word expression with its meaning being distinct from the light verb (LV).

  5. MOTIVATION CPs improve expressiveness of a language and Hindi is abundant in it

  6. MOTIVATION CPs improve expressiveness of a language and Hindi is abundant in it Detection of CPs is a tough task

  7. MOTIVATION CPs improve expressiveness of a language and Hindi is abundant in it Detection of CPs is a tough task Their detection provides important resource for tasks such as Wordnet construction, Linguistic analysis etc

  8. Aligned English-Hindi corpus I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework मुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेने आते हैं - यह जानकार खुशी होती है कि आप किसी की मदद कर सकते हैं |

  9. Aligned English-Hindi corpus Search for Hindi LV & its morphological forms I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework मुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मदद करसकते हैं |

  10. Aligned English-Hindi corpus Search for Hindi LV & its morphological forms Search for equivalent English meaning of LVs I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework मुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मदद करसकते हैं |

  11. Aligned English-Hindi corpus Search for Hindi LV & its morphological forms Search for equivalent English meaning of LVs I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework Scan left of those LVs whose English meaning is not found मुझेबच्चोंकेमाता-पिताओंकेसाथकामकरना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मददकर सकते हैं |

  12. Aligned English-Hindi corpus Search for Hindi LV & its morphological forms Search for equivalent English meaning of LVs I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework Collect the Hindi word (W) if it is not a stop word or else keep scanning Scan left of those LVs whose English meaning is not found मुझेबच्चोंकेमाता-पिताओंकेसाथकामकरना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मददकर सकते हैं |

  13. Aligned English-Hindi corpus Search for Hindi LV & its morphological forms Search for equivalent English meaning of LVs I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help Framework CP = W+LVunless W is an exit word Collect the Hindi word (W) if it is not a stop word or else keep scanning Scan left of those LVs whose English meaning is not found मुझेबच्चोंकेमाता-पिताओंकेसाथकामकरनाभीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मददकर सकते हैं |

  14. Sample Results As of now, we have extracted 10,000 CPs But we need to add more morphological forms in Hindi LV list.

  15. Code Snapshot

  16. Resources • English- Hindi parallel Corpora:http://ufal.mff.cuni.cz/euromatrixplus/downloads.html • List of Hindi Light Verbs : Reverse Complex Predicates by ShakthiPoornima, Department of Linguistics, SUNY university of Buffalo • Morphological forms of English verbs :http://www.englishpage.com/irregularverbs/irregularverbs.html • Morphological forms of Hindi verbs : Extracted from the large Hindi corpus (Blog Corpus)

  17. References • [1] Mining Complex Predicates In Hindi Using A Parallel HindiEnglish Corpus, R. Mahesh K. Sinha, IIT Kanpur • [2] Detecting Complex Predicates in Hindi using POS Projection across Parallel Corpora, AmitabhaMukerjee, AnkitSoni and Achla M Raina, IIT Kanpur • [3] Complex Predicates in Indian Languages and wordnets. Pushpak Bhattacharyya, DebasriChkrabarti and Vaijayanthi M. Sarma. Language Resources and Evaluation 40(34): 331355 • Wikepedia: 1. http://en.wikipedia.org/wiki/Light_verb2. http://en.wikipedia.org/wiki/Compound_verb

  18. Thank you Questions ?

  19. Other Approaches [2] This problem was solved using word alignment and POS tagging of parallel sentences [3] Derivation of complex predicates has also been dealt with linguistically and computationally CPs had been mined using computational methods and then, were categorized using statistical analysis [Sriram and Joshi, 2005]. Chakrabarti et al (2008) present a method for automatic extraction of CPs only from a corpus based on linguistic features

More Related