1 / 33

IndoWordNet Database Design

IndoWordNet Database Design. Presented By : Konkani NLP Team Goa University. Brief Outline. Objectives Background Requirements Proposed database design Database design details Issues to be resolved Tools and Scripts API’s IndoWordNet API Layers of API Class Diagram

fedora
Download Presentation

IndoWordNet Database Design

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IndoWordNet Database Design Presented By: Konkani NLP Team Goa University IndoWordNet Database Design

  2. Brief Outline • Objectives • Background • Requirements • Proposed database design • Database design details • Issues to be resolved • Tools and Scripts • API’s • IndoWordNet API • Layers of API • Class Diagram • Sample API code IndoWordNet Database Design

  3. Objectives To finalize the database design. To finalize tools/script necessary for distributing the database. API design. API demonstration. IndoWordNet Database Design

  4. Background IndoWordNetis a multilingual WordNetthat links WordNets of different Indian languages. A WordNet is a crucial resource for a language which aids in NLP tasks such as Machine Translation, Information Retrieval, etc. Databases necessarymaintain the data for one or multiple WordNets. Database needs to support development of online and offline applications. IndoWordNet Database Design

  5. Requirements Database design should accommodate multiple languages. Store synsets of different languages. Store semantic relations. Store lexical relationships. Store ontological details. Allow any additional information to be stored for each synset. Avoid duplication of data. Open, scalable, modular design. Independent of storage technology. IndoWordNet Database Design

  6. Proposed Database Design • Software Platform: Reference implementation done using Mysql • Mysql is freeware • Supported by Windows & Linux O.S • Database design details • wordnet_master • It contains language independent data. • wordnet_<respective_language> • It contains language dependent data. • It contains the synset data for a language. • wordnet_admin • It contains data necessary for administrative purpose. IndoWordNet Database Design

  7. wordnet_master The wordnet_master maintains the data shared by all the languages. The wordnet_master includes tables for semantic relations. It will include all ontology related tables in English. The language specific data will be available in the wordnet_<respective_language>database. IndoWordNet Database Design

  8. List of tables in wordnet_master • wn_master_category • To maintain the different grammatical categories such as noun, verb, etc. • wn_master_language • To maintain the language information in a database. • wn_master_language_lss_range • To maintain language specific synset range w.r.t. the given language. • wn_master_synset_file • To associate a file with a synset. IndoWordNet Database Design

  9. Tables for maintaining semantic relation • wn_rel_hypernymy_hyponymy • To maintain the hypernymy and hyponymy type of a relation which is a IS-A-KIND-OF type of a semantic relationship between synsets. • wn_rel_meronymy_holonymy • To maintain the meronymy and holonymy type of a relation which is a PART-WHOLE type of a semantic relationship between synsets. • wn_rel_troponymy • To maintain the troponymy type of a semantic relationship between synsets. • wn_rel_causative • To maintain the causative semantic relation between synsets. IndoWordNet Database Design

  10. wn_rel_entailment • To maintain the entailment type of a semantic relationship between synsets. • wn_rel_similar • To maintain the relation between similar types of synsets • wn_rel_also_see • To maintain the relation between synsets other than the regular semantic relations . • wn_rel_noun_verb_link • To maintain the semantic relation between synsets namely a noun synset and associated verb synset. IndoWordNet Database Design

  11. wn_rel_noun_adjective_attribute_link • To maintain the semantic relation between synsets, namely a noun synset and associated adjective attribute that go together. • wn_rel_adjective_modifies_noun • To maintain the semantic relation between synsets namely an adjective synset and the corresponding noun synset which it modifies. • wn_rel_adverb_modifies_verb • To maintain the semantic relation between synsets namely an adverb synset and the corresponding verb synset which it modifies. • wn_rel_near_synsets • To maintain the near synsets relation between synsets. IndoWordNet Database Design

  12. wn_property_antonymy_gradation • To maintain the different types of relation properties, like antonym relation have properties such as colour, gender, etc. • wn__property_meronymy_holonymy • To maintain the different types of relation properties, for relations like meronymy, holonymy that have properties like component-object, feature-activity, etc. • wn_relation_types • To maintain the relation information of all the relation tables. • wn_semantic_relations • To maintain the semantic relations w.r.t. the synsets. IndoWordNet Database Design

  13. Tables for maintaining ontology relation • wn_ontology_nodes • To maintain the different ontology types or positions. (Common information in English) • wn_ontology_tree • To maintain the hierarchical relationship of the ontology types. • The root node in the ontology hierarchy has id value = 1. • wn_ontology_synset_map • To link a synset/concept to a particular position in the ontology. IndoWordNet Database Design

  14. wordnet_<respective_language> The wordnet_<respective_language> database will keep tables which will have information related to the particularlanguage. It will include tables to keep synset details, words in the language, examples, etc. <respective_language>is to be replaced by any of the languages of the IndoWordNet group. viz. Assamese, Hindi, Konkani, Oriya, Punjabi, Urdu, etc as applicable. wordnet_bodo IndoWordNet Database Design

  15. wordnet_admin • This database is used to keep otherrelated tables such as: • Feedback table • FAQ table • Website administration tables • User + password table • … IndoWordNet Database Design

  16. Fig 1: Some of the important tables which are part of the WordNet with colour coding to show common data shared by all languages and data different for each language Language dependent data Language independent data IndoWordNet Database Design

  17. Issues to be Resolved • The tables below: • wn_rel_adjective_modifies_noun • wn_rel_adverb_modifies_verb • wn_rel_noun_adjective_attribute_link • wn_rel_noun_verb_link -are to be stored as Language independent dataor Language dependent data? ( in view of change in POS category reported by language groups) IndoWordNet Database Design

  18. In table ‘wn_ontology_nodes’ the data should be only in English and the data in other language can be kept in their respective language database. • Need to be done NOW To approve master and <respective_language> tables of each language. IndoWordNet Database Design

  19. Tools & Scripts • Tool to populate data into the various tables of the database. • Population of data into tables such as • wn_synset • wn_word • wn_synset_words • wn_synset_example • Scripts to create language specific data tables. • Scripts to dump and restore data. • Scripts to manage/update incremental changes done to tables in wordnet_master IndoWordNet Database Design

  20. Graphical User Interface to Populate data into the database Tables IndoWordNet Database Design

  21. Questions? IndoWordNet Database Design

  22. API’s • An Application Programming Interface (API) is a set of commands, functions and protocols which programmers can use when building a software. • It allows the programmers to use predefined functions to interact with systems, instead of writing them from scratch. • Characteristics of good API • Easy to learn and use, Hard to misuse. • Easy to read and maintain code that uses it. • Is programming language neutral. • Sufficiently powerful to support all computational requirements. IndoWordNet Database Design

  23. IndoWordNet API • It allows a user to use the API without the knowledge of the database design. • The API is object-oriented design. • The API is designed in such a way that it supports single/multiple languages. • API design consist of two layers: • Application layer • Database layer • The Database layer will change depending on the DBMS but the Application layer will mostly remain unchanged. IndoWordNet Database Design

  24. Application layer • The Application layer incorporates the logical part of the IndoWordNet requirements, so as toprovide classes and objects to perform all the operations to be performed on the synset, relations, ontology, other master data, etc. • Reference Implementation is being done in Java and PHP. IndoWordNet Database Design

  25. Application Layer consists of the following classes: • IWAPIClass • A class that allows to initialise API library for use. • Maintain master tables. • Manage connectivity to language specific databases. • IWSynset • A class that represents a Synset • IWWord • A class that represents a Word • IWSynsetCollection • Collection of Synsets • IWWordCollection • Collection of words for a synset • IWOntology • A class that represents Ontology • Each synset is mapped into some place in the ontology tree IndoWordNet Database Design

  26. IWOntologyCollection • Collection of child nodes for a given onto node • IWExampleCollection • Collection of examples • IWFile • A class that represents a File • IWDataFile • A class that represents a data file • IWPictureFile • A class that represents a picture files • IWFileCollection • Collection of files IndoWordNet Database Design

  27. The Application Layer allows us to perform operations such as: • get all the synsets • get various relations for a given synset/ word • get words for a given synset • add a new source or domain • add a new relation • update the records in the table • delete a synset/ source/ domain • modify ontology information IndoWordNet Database Design

  28. Database layer The Database layer deals with encapsulation of the database design. It provides a standard interface to the application layer. The Database layer supports all the operations needed to be performed on the database. IndoWordNet Database Design

  29. Database Layer consists of the following classes: • IWDb • A class that connects to a Language Dependent Database. • IWCon • A class that sets up a connection to a database • IWStatement • A class which contains all the queries pertaining to the application layer • Also the basic functions such as updation, deletion, insertion, selection, etc. • IWResult • A class which returns results to the application layer, the results of executed queries • IWField • A class which returns to the application, the proper data-type irrespective of the db data-type or vice versa IndoWordNet Database Design

  30. Class Diagram IndoWordNet Database Design

  31. Sample API code • Set up of database connection • IWDbdbobject= new IWDb ( IWAPIClass.Language_Name); • Create object for synset • IWSynsetsynsetobject= new IWSynset ( synsetID, dbobject); • Get concept for a synset • String concept = synsetobject.getConcept(); • Set concept to a synset • booleanflag =synsetobject.setConcept (“ conceptDefination”); • Get word collection for a synset • IWWordCollectionwords =synsetobject.getWords(); IndoWordNet Database Design

  32. Questions? IndoWordNet Database Design

  33. THANK YOU IndoWordNet Database Design

More Related