1 / 26

Indo WordNet A WordNet for Hindi

Indo WordNet A WordNet for Hindi. Debasri Chakrabarti, Dipak Kumar Narayan, Prabhakar Pandey, Madhu Prasad Sharma. Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay. Introduction. WordNet – A lexical database

camdyn
Download Presentation

Indo WordNet A WordNet for Hindi

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Indo WordNet A WordNet for Hindi Debasri Chakrabarti, Dipak Kumar Narayan, Prabhakar Pandey, Madhu Prasad Sharma Centre for Technology Development for Indian Languages Computer Science and Engineering Department, IIT Bombay

  2. Introduction • WordNet – A lexical database • Searching the dictionary conceptually • Different organizing principle for different syntactic category • Synsets or the Synonymy Sets are the basic building blocks • Lexical knowledge base is the heart of any intelligent information processing system

  3. WordNet for Hindi • Hindi WordNet is an on-line lexical database for Hindi language • Design has been inspired by the famous English WordNet • Unique features • Graded antonyms and meronymy relationships • Efficient underlying database design • Cross part of speech linkage

  4. Semantic relations in WordNet • Synonymy • Hypernymy / Hyponymy • Antonymy • Meronymy / Holonymy • Gradation • Entailment • Troponymy

  5. Semantic Relations • Synonymy • True synonyms are rare • Synonymy related to a context • {Gar ‚ kmara} • {Gar ‚ Aavaasa} • {Gar ‚ janmakuMDlaIya sqaana} • {Gar ‚ svadoSa}

  6. Semantic Relations • Hypernymy and Hyponymy • Relation between word meaning (synsets) • X is a hyponym of Y if X is a kind of Y • Hyponymy is transitive and asymmetrical • Hypernymy is inverse of Hyponymy lionanimalliving entityentity Saor  pSau  sajaIva  Aist%va

  7. Semantic Relations • Antonymy • Oppositeness in meaning • Relation between word forms • Meronymy and Holonymy • Part-whole relation, branch is a part of tree • X is a meronymy of Y if X is a part of Y • Meronym is transitive and asymmetrical • Holonymy is inverse relation of Meronymy

  8. Troponym and Entailment • Entailment • { Kra-Ta laonaa – saaonaa £ • Troponym • { laÐgaD,anaa ‚ kdmatala krnaa – calanaa £ • ¡ fusafusaanaa – baaolanaa £

  9. Antonymy Relation

  10. Meronymy Relation

  11. Gradation

  12. Classification of verbs • Simple verbs (sarla iËyaa): saaonaa‚ Kanaa • Conjunct verbs (saMyau@t iËyaa) • Compound verbs (samaaisak iËyaa) Á Kanaa–pInaa • Causative verbs (p`orNaa%mak iËyaa) Á saulavaanaa

  13. WordNet Sub-Graph saMrcanaa Hyponymy Aavaasa , inavaasa Hypernymy Meronymy rsaao[-Gar Hyponymy Aa^Mgana Sayana kxa M e r o n y m y Gar , gaRh Gloss baramada manauYyaaoM ka Cayaa huAa vah sqaana jaao dIvaaraoM sao Gaor kr banaayaa jaata hO Hyponymy AQyana kxa Aitiqa gaRh AaEama JaaopD,I

  14. Design and Implementation • Basic relations or lexical links are between synonym sets • Lexical database is stored in MySQL package • Sub-tasks identified • Database design • Data entry interface • Implementation of Organizer Utility • Application programs to access and display the information in the lexical database

  15. Data Entry Interface • GUI designed in Java/JFC • Separate screen for data entry of different categories • Automatic generation of synset id’s • Screen to view the entered data

  16. Synset Entry Interface

  17. Organizer Utility • Designed to preprocess the data • Reflexive pointers are generated • e.g. if A hypernym of B then B hyponym of A is automatically generated • Each semantic relation is mapped to a separate table (normalized) • Font conversion • Roman Hindi  DV-TTYogesh

  18. Storage Structure • Relation between Synsets • tblNounHypernyms • Relation between Word-forms • tblNounAntonyms

  19. System Statistics • Over 8500 synsets entered in the database • MySQL used as the back-end database server • Data entry interface designed in Java/JFC • Organizer utility written in perl • Web based data retrieval system developed in HTML and PHP • DV-TTYogesh Font used to display Hindi Text

  20. Application of WordNet • Word Sense Disambiguation • Interface to Internet Search Engines • Text classification • Information Retrieval system • Document Similarity

  21. Conclusion • The structure of Hindi Language have been studied and new features have been introduced in the Hindi WordNet • Currently over 8500 synsets have been inserted into the database • The MySQL database has been found to be quite efficient • The web interface for querying the lexical database is under continuous evolution

More Related