1 / 19

# EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA - PowerPoint PPT Presentation

EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA. GUIDE : Prof. Amitabha Mukerjee By : Amit Kumar (10074) Ankit Modi (10104). INTRODUCTION.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA ' - terri

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### EXTRACTING COMPLEX PREDICATES IN HINDI ACROSS PARALLEL CORPORA

GUIDE : Prof. AmitabhaMukerjeeBy : Amit Kumar (10074)AnkitModi (10104)

INTRODUCTION CORPORA

A Complex Predicate (CP) is a multi-word compound that functions as a single verbEx : उसनेकिताबवापसकरदियामुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेने आते हैं - यह जानकार खुशी होती है कि आप किसी की मदद कर सकते हैं |

INTRODUCTION CORPORA

CP = Word+ Light VerbEx : उसनेकिताबवापसकरदिया

करदिया (CP) = कर(W) + दिया (LV)

A Light Verb is a verb that has little semantic content of its own and it therefore forms a predicate with some additional expression, which is usually a noun.Ex : देना, लेना, पाना, उठाना

PROBLEM STATEMENT CORPORA

Given a parallel English­Hindi corpora, we have to detect complex predicates  (CPs)

Using the fact that a CP is a multi­word expression with its meaning being distinct from the light verb (LV).

MOTIVATION CORPORA

CPs improve expressiveness of a language and Hindi is abundant in it

MOTIVATION CORPORA

CPs improve expressiveness of a language and Hindi is abundant in it

Detection of CPs is a tough task

MOTIVATION CORPORA

CPs improve expressiveness of a language and Hindi is abundant in it

Detection of CPs is a tough task

Their detection provides important resource for tasks such as Wordnet construction, Linguistic analysis etc

Aligned English-Hindi corpus CORPORA

I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help

Framework

मुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेने आते हैं - यह जानकार खुशी होती है कि आप किसी की मदद कर सकते हैं |

Aligned English-Hindi corpus CORPORA

Search for Hindi LV & its morphological forms

I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help

Framework

मुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मदद करसकते हैं |

Aligned English-Hindi corpus CORPORA

Search for Hindi LV & its morphological forms

Search for equivalent English meaning of LVs

I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help

Framework

मुझेबच्चोंकेमाता-पिताओंकेसाथकाम करना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मदद करसकते हैं |

Aligned English-Hindi corpus CORPORA

Search for Hindi LV & its morphological forms

Search for equivalent English meaning of LVs

I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help

Framework

मुझेबच्चोंकेमाता-पिताओंकेसाथकामकरना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मददकर सकते हैं |

Aligned English-Hindi corpus CORPORA

Search for Hindi LV & its morphological forms

Search for equivalent English meaning of LVs

I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help

Framework

Collect the Hindi word (W) if it is not a stop word or else keep scanning

मुझेबच्चोंकेमाता-पिताओंकेसाथकामकरना भीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मददकर सकते हैं |

Aligned English-Hindi corpus CORPORA

Search for Hindi LV & its morphological forms

Search for equivalent English meaning of LVs

I also enjoy working with the children's parents who often come to me for advice - it's good to know you can help

Framework

CP = W+LVunless W is an exit word

Collect the Hindi word (W) if it is not a stop word or else keep scanning

मुझेबच्चोंकेमाता-पिताओंकेसाथकामकरनाभीअच्छालगताहैजो कि अक्सरसलाहलेनेआते हैं - यह जानकार खुशीहोती है कि आप किसी की मददकर सकते हैं |

Sample Results CORPORA

As of now, we have extracted 10,000 CPs But we need to add more morphological forms in Hindi LV list.

Code Snapshot CORPORA

Resources CORPORA

• List of Hindi Light Verbs : Reverse Complex Predicates by ShakthiPoornima, Department of Linguistics, SUNY university of Buffalo

• Morphological forms of English verbs :http://www.englishpage.com/irregularverbs/irregularverbs.html

• Morphological forms of Hindi verbs : Extracted from the large Hindi corpus (Blog Corpus)

References CORPORA

• [1] Mining Complex Predicates In Hindi Using A Parallel HindiEnglish Corpus, R. Mahesh K. Sinha, IIT Kanpur

• [2] Detecting Complex Predicates in Hindi using POS Projection across Parallel Corpora, AmitabhaMukerjee, AnkitSoni and Achla M Raina, IIT Kanpur

• [3] Complex Predicates in Indian Languages and wordnets. Pushpak Bhattacharyya, DebasriChkrabarti and Vaijayanthi M. Sarma. Language Resources and Evaluation 40(34): 331355

• Wikepedia: 1. http://en.wikipedia.org/wiki/Light_verb2. http://en.wikipedia.org/wiki/Compound_verb

Thank you CORPORA

Questions ?

Other Approaches CORPORA

[2] This problem was solved using word alignment and POS tagging of parallel sentences

[3] Derivation of complex predicates has also been dealt with linguistically and computationally

CPs had been mined using computational methods and then, were categorized using statistical analysis [Sriram and Joshi, 2005].

Chakrabarti et al (2008) present a method for automatic extraction of CPs only from a corpus based on linguistic features