360 likes | 374 Views
This research paper addresses the problem of finding the appropriate word(s) whose meaning matches a given definition. It proposes a Meaning-to-Word (MTW) system that uses Turkish Monolingual Dictionary and Turkish WordNet as resources for word retrieval. The system employs techniques such as tokenization, stemming, stop word elimination, stem matching, and query expansion to improve the accuracy of word retrieval. The results show that the MTW system outperforms traditional dictionary-based methods in finding suitable words for a given definition.
E N D
USING WORDNET TO RETRIEVE WORDS FROM THEIR MEANINGS İlknur Durgar El-Kahlout and Kemal Oflazer Sabancı University İstanbul, Turkey
Problem • For a given definition, find the appropriate word (or words) • Traditional dictionary is of no use • From a dictionary, find an appropriate word that has a “similar” definition
Examples ? • User definition: Akımı ölçmek için kullanılan alet (A device that is used to measure the currenta) • In the dictionary: akımölçer: elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (ammeter: a device that measures the intensity of electrical current, amperemeter)
Applications • Computer-assisted language learning • Solving crossword puzzles • Reverse dictionary
Outline • Problem statement • Meaning-to-Word System (MTW) • Our Approach • Methods • Results • Result Summary • Conclusion
Problem Statement • Find the “similarity” between two definitions Akımı ölçmek için kullanılan alet (A device that is used to measure the current) Elektrik akımının şiddetini ölçmeye yarayan araç, ampermetre (a device that measures the intensity of electrical current,amperemeter)
Meaning-to-Word (MTW) • addresses the problem of finding the appropriate word (or words), whose meaning “matches” the given definition • Two subproblems • finding words whose definitions are "similar" to the query in some sense • ranking the candidate words using a variety of ways
Information Flow in MTW User Definition query Search in Dictionary candidates Rank Candidates List of words
Available Resources • Turkish Monolingual Dictionary • About 50.000 entries • Turkish WordNet • About 11.000 synsets
Normalization User Definition Normalization query Search in Dictionary candidates Rank Candidates List of words
Normalization • Tokenization • Stemming • Stop Word Elimination
Query Processing User Definition query Query Processing Search in Dictionary candidates Rank Candidates List of words
Query Processing • Subset Generation • Search with different set of words • Select informative words from user’s query Query: dahaönce hiçevlenmemiş kişi(a person who has never been married) {önce, evlen, kişi}(before, marry, person) {evlen, kişi}, {önce, kişi}, {önce, evlen} (marry, person)(before, person) (before, marry) {evlen}, {önce}, {kişi} (marry) (before) (person)
Query Processing • SubsetSorting • Unordered list of subsets are insufficient • Rank the generated subsets 1) By the number of words {önce,evlen, kişi} (before, marry, person) {evlen, kişi}(marry, person) 2) By the sum of frequency logarithm {evlen, kişi} (marry, person) {önce, kişi} (before, person)
Searching for Meanings User Definition query Search in Dictionary candidates Rank Candidates List of words
Searching for Meanings • Two methods • Stem Matching • Query Expansion (using WordNet)
Stem Matching • Morphological normalization of words • Find meanings that contain morphological variants of the original definition
Stem Matching (Ex.) (A device that is used to measure the current) { akımı ölçmek için kullanılan alet } ak (white)ölç(measure)için(to)kullan(use)alet(device) akım(current)iç(drink) kul (slave) akı (flux) Colored stems are the matching ones
Stem Matching (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayanaraç, ampermetre (a device that measures the intensity of electrical current,amperemeter)
Stem Matching (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayanaraç, ampermetre (a device that measures the intensity of electrical current,amperemeter)
Stem Matching • Drawbacks • Generate noisy stems ilim (science, my city)ilim (science), il (city) • Conflate two words with very different meanings to the same stem ilim (science, my city), ilde (in the city) il (city) • Cannot find relations between similar words kimse (someone) kişi (person) bölüm (part) kısım (portion)
Using Query Expansion • Two different approaches: • Expand query with relations (synonyms, specializations, generalizations) • Expand query with unexpanded query’s relevant answers • WordNet synonyms are used in MTW {besin,gıda} (food, nourishment) {iyileş,düzel} (to get better) /{iyileş,geliş} (to improve)
Query Expansion (Ex.) (A device that is used to measure the current) { akımı ölçmek için kullanılan alet } ak (white)ölç(measure)için(to)kullan(use)alet(device) akım(current)iç(drink) kul (slave) akı (flux) beyaz faydalan araç debiyararlan gereç akış köle
Query Expansion (Ex.) (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayanaraç, ampermetre (a device that measures the intensity of electrical current,amperemeter)
Query Expansion (Ex.) (A device that is used to measure the current) akımı ölçmek için kullanılan alet elektrik akımının şiddetini ölçmeye yarayanaraç, ampermetre (a device that measures the intensity of electrical current,amperemeter)
Ranking User Definition query Search in Dictionary candidates Rank Candidates List of words
Ranking • Very important part of MTW • Having the right answer in the retrieved set is not enough • Aim is to have the right answer at top of the retrieved set (Ex: in first top 50 answers)
Ranking • Simple but effective methods • Number of matched words • Subset informativeness - frequency of words in the subset • Ratio of number of matched words to the number of words in the candidate dictionary definition • Longest Common Subsequence - order of the matched words
Some Statistics • Training sets: • 50 queries from users • 50 queries from a dictionary • Test sets: • 50 queries from users • 50 queries from a separate dictionary
Stem Matching all stems included Low % in top 10 in user queries but very high results in dictionary queries
Stem Matching longest stem included (heuristics) Improvement in user queries, slightly better performance in dictionary queries
Query Expansion (WordNet) all stems included Better results in user queries, no change in dictionary queries
Query Expansion (WordNet) longest stem included (heuristics) Better performance than ‘longest stem matching’ in user queries, but worse performance in dictionary queries
Result Summary • Stem Matching (longest stem included) • 60% success in real user queries • 96% success in dictionary queries • Query Expansion (all stems included) • 68% success in real user queries • 92% success in dictionary queries
Conclusion • We have implemented a ‘Meaning to Word’ system for Turkish • Results on unseen data are rather satisfactory • Query expansion is better • Although, it cannot find the words for all queries • 68% of real user queries and 90% of dictionary queries are found in the first 50 results