1 / 59

Language and Tools for Lexical Resource Management

Language and Tools for Lexical Resource Management. Asanee Kawtrakul (1) Aree Thunkijjanukij (2) Preeda Lertpongwipusana(1) Poonna Yospanya(1) (1)Department of Computer Engineering, Faculty of Engineering, (2) Thai National AGRIS center Kasetsart University. Acknowledgement.

Download Presentation

Language and Tools for Lexical Resource Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Language and Tools for Lexical Resource Management Asanee Kawtrakul (1) Aree Thunkijjanukij (2) Preeda Lertpongwipusana(1) Poonna Yospanya(1) (1)Department of Computer Engineering, Faculty of Engineering, (2) Thai National AGRIS center Kasetsart University 23 January 2003 APAN-Fukuoka

  2. Acknowledgement • JIRCUS: Japan International Research Center for Agricultural Sciences • Organizing committee • Kasetsart University

  3. Outline • Background & Motivation • Problems in Lexical Resource Preparation • Requirements for Lexical Resource Management • Proposed Language and tools • Conclusion and Next steps

  4. Background and Motivation • Thailand is the agricultural basis country • having a rich knowledge and data in agricultural field, • A great quantity of agricultural information was scattered in unstructured and unrelated text • Skimming/Digesting and integrating becomesessential • Knowledge is around the world • Knowledge Discovery without language barrier is also needed

  5. Internet SummarizationModule TranslationModule GatheringModule Indexingand ClusteringModule AgriculturalDocument collection Data Cube The Basic Idea behind.. GraphicalUser Interface

  6. Textual Data as a Input Let us focus on Canada’s agricultural products. In 1998, there were 1,216 registered commercial egg producers in Canada. Ontario produced 39.8% of all eggs in Canada, Quebec was second with 16.6%. The western provinces have a combined egg production of 35.6% and the eastern provinces have a combined production of 8.0%. With a courtesy of Agriculture and Agri-Food Canada, http://www.agr.ca/cb

  7. Summarization and Translation as a Result

  8. The Development of Agricultural System for Knowledge Acquisition and Dissemination • 5 years Project (2001-2005) • The Collaborative work between: • Thai National AGRIS center: • Providing Bilingual Thesaurus (AGROVOC) • Department of Computer Engineering • Developing NLP techniques for Searching, Summarizing and Translation including tools for lexical resource management • Funded by Kasetsart University Research and Development Institution

  9. Document Warehouse Acquisition System Linguist/Domain Expert Very Large Corpus Rules Thesaurus Lexicon Linguistic Knowledge Base • Intelligent Search Engine • With Translation • With Summarization Document Indexing & Clustering Gathering Module Internet/Intranet

  10. Thai Agricultural Thesaurus • Total number of English vocabulary is 27,531 terms • Translate in to Thai only 10,280 terms (except scientific names) • Scientific name were not be translated • ex. Oryza (genus) sativa (specy) of rice or family

  11. Problem in hand-coded Thesaurus • Scalability • Reliability and Coherence • Rigidity • Cost

  12. Fermented Fish Fermented Fish Foods Processed Products Bakery Product Canned Products Deistic Foods Dried Products Frozen Foods Frozen Products Fermented Foods Fermented Products Alcoholic Beverage milk Fermented Foods Fermented Fish

  13. Foods Processed Products Products Fermented Foods Local Product Fermented Fish

  14. Commercial Vegetables: The September index, at 107, was up 1.9 percent from last month but 3.6 percent below September 1998. Price increases for lettuce, tomatoes, broccoli, and celery more than offset price decreases for onions, carrots, and cucumbers Commercial Vegetable tomatoes Cucumbers Carrots Broccoli

  15. BT VEGETTABLES BROCCOLI type=leaf vegetable color=green SWEET PEPPER type=fruit vegetable color=red, green, yellow TOMATOES type=fruit vegetable color=red, yellow NT tomatoes CHERRY TOMATOES type=fruit vegetable color=red RT LYCOPERSICON ESCULENTUM BT type=taxonomic SOLANACEAE color=red NT CAPSICUM NICOTIANA Expert Domain Commercial Vegetable broccoli carrot tomato User Category tomato tomatoes Keyword Assigned

  16. Other Major Problems(1) • Accessing to textual information • Language variation: • Many ways to express the same idea Ex: thinning flower uses deblossoming thinning branch uses pruning • how the computer can know that words a person uses are related to words found in stored text? Ex: user: thinning branch computer: pruning

  17. Requirement (1) • Accessing to textual information • Need intelligent browsing from related concept to related concept, rather than from occurrence of stemmed character strings

  18. Other Major Problems(2) • Transforming from unstructured to structured information

  19. Requirement (2) • Need Application-based Frame about product price • Knowledge representation in table form • Consisting of attributes and their values Attributes Values

  20. The September All Farm Products Index was 97 percent of its1990-92base, down1.0 percent from the August index and 2.0 percent below the September 1998 Index Problems in Translation: Pragmatic and Semantic 0.97* averagePrice of year from1990-1992 Using Ontology September Of year ?? August Year1997 Down 0.02*price(September 1998)

  21. “Year 1990-1992” meaning

  22. Requirement (3) • Lexicon should having the semantic constraints between lexical entities, restriction on usage categories

  23. Summary of Problemsrelated to lexicon • In terms of coverage • Extensional coverage, i.e., number of entries • Intensional coverage, i.e., the number of information fields • In terms of semantic domain covered by the application • Meaning Interpretation with respect to objects, subject matter, topics of discourse, and pragmatic interpretation • The user category with reference to the intended system users • Commercial product vs Plant products vs Family products

  24. One Solution • Encoding world knowledge in the structures attached to each lexical item which needs both language and tools

  25. The Design of Lexicon: Requirement Specification • Macrostructure: Lexicon structure in terms of relations between lexical entries • i.e. Hierarchical taxonomies which are characteristic of thesauri of semantically related word family • Microstructure: types of information for each entry • Pronunciation or phonemic transcription • Syntactic properties • Meaning • Pragmatics of their use in real context and language

  26. Microstructure (cont’) • Lexical entity could contain slots/scripts for each specific domain and need intelligent Analyzer and understanding language • Supplies information extraction • Supplies the missing value

  27. Lexical Resource Management Language • which is able to: • Handle heterogeneity of linguistic knowledge structures. • Handle exceptions and inconsistencies of natural languages. • Provide an intuitive means to store and manipulate both linguistic and world knowledge.

  28. Language Features • The language is designed in a way that will enable: • Supports for heterogeneous structures. • Sufficient provisions to handle exceptions and inconsistencies of natural languages (this is achieved through the +/- operators). • Deduction of knowledge from rules. • Detection and prevention of potential integrity violations.

  29. Language and Tools Specification requirement • Flexibility – almost any structures can be defined in this model. • Extensibility – extending a structure is simple. • Maturability – structure reformation and deformation are supported. • Integrity – meta-relations help prevent malformed or ill-semantic data entries. • Dealing with inconsistencies is feasible.

  30. Some Syntactic Elements • Knowledge manipulations are achieved through these primitives: • def is used to define structures not already existing. • redef changes aspects of existing structures. • undef removes specified structures from the knowledge base. • ret is used to retrieve structures from the knowledge base.

  31. Examples • Hierarchies: tree structures representing generalization semantics, or classes, of atoms. thing animate inanimate human animal A semantic tree represented by a hierarchy structure

  32. Usage Examples • Defining a hierarchy • def thing(animate(human+animal)+inanimate). • Adding the ‘plant’ and ‘vehicle’ concepts • def animate(plant+vehicle). • Reparenting the ‘vehicle’ concept • redef animate(vehicle) inanimate(vehicle). • Removing the ‘human’ concept • undef human. (provided that there is only a single instance of ‘human’)

  33. Usage Examples (2) • Defining case frames for verbs • First, we need to define meta-relations for words belonging to the sub-hierarchy ‘verb’. • def meta case(verb, sub:thing). • def meta case(verb, sub:thing, obj:thing). • Then, we define case frames for several verbs. • def case(eat, sub:human+animal, obj:food). • def case(fly, sub:bird-penguin). (here, we emphasize the use of +/- operators)

  34. Hierarchy & Set c1 c3 f1 f4 w2 w1 c2 f2 f3 w3 w4 p1 w7 w5 w6

  35. Defining a Hierarchy c1 def c1(“w1”(“w3”)+c2(“w4”)+“w2”). def “w5”+“w6” under “w4”. w2 def “p1”(“w7”) under “w2”. w1 c2 w3 w4 p1 w7 w5 w6

  36. Manipulating the Hierarchy c1 redef “w4” under “w2”. undef “w1”. w2 w1 c2 w3 w4 p1 w7 w5 w6

  37. Defining a Set c3 f1 f4 f2 f3 def c3{[f1]+[f2]+[f3]}. def [f4] in c3.

  38. Defining a Relation def meta r1(c2, c3). Template defined. c2 def r1(“w4”, [f1]). Relation defined. def r1(“w1”, [f3]). Constraint violated. Definition not allowed. c2 r1’ w1 c3 w4 r1 f1 f4 f2 w5 inherited w6 f3

  39. Synset & Surrogates • A synset is an unnamed set identified by its unique ID. • Members of a synset are considered synonymous with different degrees of synonymity. • Distance graph is automatically constructed within a synset with surrogates being representatives of synset members. • Entities with identical features are attached to the same surrogates.

  40. Synset & Surrogates f4 f1 f1 p1 w1 surrogate network internally constructed f1 p2 f2 s2 s1 w2 w6 f4 synset#1 s3 s5 f3 p3 s4 w3 f3 f2 w4 f1 f4 f4 f3

  41. Synset & Multilingual Lexicon • Synset members are not confined within language scope, that is, entities from different language may belong to the same synset. • Distance matrix are computed from number of different features over each pair of surrogates. • Traversing from a word to nearest-distant words is handled by the system. We can determine words with potentially nearest semantics here.

  42. Expected Result

  43. Keyword Generated

  44. “Fruit vegetable”,red Keyword Generated

  45. BT VEGETTABLES tomatoes “Fruit vegetable”,red Keyword Generated Expert Domain

  46. BT VEGETTABLES BROCCOLI type=leaf vegetable color=green tomatoes “Fruit vegetable”,red Keyword Generated Expert Domain

  47. BT VEGETTABLES BROCCOLI type=leaf vegetable color=green SWEET PEPPER type=fruit vegetable color=red, green, yellow tomatoes “Fruit vegetable”,red Sweet pepper Keyword Generated Expert Domain

  48. BT VEGETTABLES BROCCOLI type=leaf vegetable color=green SWEET PEPPER type=fruit vegetable color=red, green, yellow TOMATOES type=fruit vegetable color=red, yellow tomatoes “Fruit vegetable”,red Sweet pepper Tomatoes Keyword Generated Expert Domain

  49. BT VEGETTABLES BROCCOLI type=leaf vegetable color=green SWEET PEPPER type=fruit vegetable color=red, green, yellow TOMATOES type=fruit vegetable color=red, yellow NT tomatoes CHERRY TOMATOES type=fruit vegetable color=red “Fruit vegetable”,red Sweet pepper Tomatoes Cherry Tomatoes Keyword Generated Expert Domain

  50. BT VEGETTABLES BROCCOLI type=leaf vegetable color=green SWEET PEPPER type=fruit vegetable color=red, green, yellow TOMATOES type=fruit vegetable color=red, yellow NT tomatoes CHERRY TOMATOES type=fruit vegetable color=red RT LYCOPERSICON ESCULENTUM BT type=taxonomic “Fruit vegetable”,red SOLANACEAE Sweet pepper color=red NT Tomatoes CAPSICUM Cherry Tomatoes NICOTIANA Keyword Generated Expert Domain

More Related