1 / 21

Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words

Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words. Md. Aminul Islam Diana Inkpen University of Ottawa. Outline. Introduction Different Methods SOC-PMI Method A Walk-through Example Evaluation and Experimental results Applications and Future Work Conclusion

Download Presentation

Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Second Order Co-occurrence PMI for Determining the Semantic Similarity of Words Md. Aminul Islam Diana Inkpen University of Ottawa

  2. Outline • Introduction • Different Methods • SOC-PMI Method • A Walk-through Example • Evaluation and Experimental results • Applications and Future Work • Conclusion • References

  3. Introduction • Semantic relatedness:refers to the degree to which two words are related (or not). • Semantic similarity:is a special case or a subset of semantic relatedness. • Humans are able to easily judge if a pair of words are related in some way. For example, most would agree that table and furniture are more semantically similar than are car and toothbrush. Similarly, glass and water are semantically related but not similar.

  4. Introduction (cont…) • Measures of the semantic similarity of words have been used for a long time in applications in NLP and related areas, such as - Automatic creation of thesauri - Automatic indexing - Text annotation - Text summarization - Text classification - Word sense disambiguation - Information extraction and retrieval - Automatic correction of word errors in text

  5. Different Methods • Statistical Methods • Cosine • PMI • Average Mutual Information • LSA • PMI-IR • Dictionary-based Methods • Approaches using WordNet and other semantic networks • Approaches using Roget’s thesaurus • Hybrid Methods • Resnik information-based approach • Jiang and Conrath’s combined approach • Lin’s universal similarity measure

  6. SOC-PMI method • It’s a statistical method • It uses Pointwise Mutual Information to sort lists of important neighbor words of the two target words • Then it consider the words which are common in both lists and aggregate their PMI values (from the opposite list)

  7. A Walk-through Example • W1 = car and W2 = automobile • The following 12 sentences (Table 1) are our corpus of text after preprocessing (tokenization, stop-words elimination and lemmatization). Table 1: Sample Texts after Cleaning tokens m = 70 types n = 43

  8. A Walk-through Example (cont…) Table 2: Types and Frequencies type frequency function, f t(ti) = |{k: ck = ti}|, where i = 1, 2, …, n which tells us how many times the type ti appeared in the entire corpus e.g.,f t(‘car’) = 6

  9. A Walk-through Example (cont…) Table 3: Bi-gram frequencies for word W1 and W2 in a window of 11 words (α = 5) f b(ti, W) = |{k: tk = W and tk±j = ti }|, where i = 1, 2, …, n and – α ≤ j ≤ α, be the bi-gram frequency function. f b(ti, W) tells us how many times word ti appeared with word W in a window of size 2α + 1 words e.g., f b(‘engineer’,W1) = 3 f b(‘industry’,W2) = 7

  10. A Walk-through Example (cont…) Table 4: Set X and Set Y of Words with PMI Values • Apply point wise mutual information (PMI) function for only those words having f b(ti, W) > 0, • Now, for word W1, we define a set of words, X, sorted in descending order by their PMI values with W1and taken the top-most β1 words having f pmi(ti, W1) > 0. • X = {Xi}, where i = 1, 2, …, β1 and • f pmi(t1, W1) ≥ f pmi(t2, W1) ≥… f pmi(tβ1-1, W1) ≥ f pmi(tβ1, W1) • Similarly, for word W2, we define a set of words, Y, sorted in descending order by their PMI values with W2and taken the top-most β2 words having f pmi(ti, W2) > 0. • The Calculation of β1andβ2 has shown in next slide

  11. A Walk-through Example (cont…) = 24.88 we define the β-PMI summationfunction. For word W1, the β-PMI summation function is: And similarly, β2 = 24.88 • For this small corpus, we have chosen δ = 0.7 Then we compute , fβ(W1) = (f pmi(“recession”,W2))γ +(f pmi(“industry”, W2))γ = (2.544)3 + (3.029)3 = 44.255 Similarly, fβ(W2) = (f pmi(“industry”,W1))γ +(f pmi(“recession”, W1))γ = (1.807)3 + (2.544)3 = 22.364 We have chosen γ = 3 , where and which sums all the positive PMI values of words in set Y also common to the words in set X

  12. A Walk-through Example (cont…) = 3.101

  13. Evaluation and Experimental results • TOEFL and ESL synonym questions • Correlation of noun pairs Example of synonym question haphazardly a. dangerously b. densely c. randomly d. linearly

  14. Evaluation and Experimental results (cont…) • We used BNC • BNC has around 100 millions of tokens and 7 millions lines of texts Figure 1: Results on the 80 TOEFL questions

  15. Evaluation and Experimental results (cont…) Figure 2: Results on the 50 ESL questions

  16. Evaluation and Experimental results (cont…) Figure 3: Correlation of noun pairs

  17. Applications and Future Work • Measuring the semantic similarity of two texts • Detecting semantic outliers in speech recognition transcripts • Automatic correction of word errors in text • Discovering word senses directly from text • Solving semantic heterogeneity in databases • In text mining and data mining, for extracting interesting relational information from corpus. • As a tool to aid in the automatic construction of the synonyms of words

  18. Conclusions • SOC-PMI method provides the best result among all statistical approaches • It is comparable to all lexical approaches • It can determine the semantic similarity of two words even though they do not co-occur within the window size at all in the corpus

  19. References • Brown, P. F., DeSouza, P. V., Mercer, R. L., Watson, T. J., Della Pietra, V. J. and Lai, J. C. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18:467-479. • Buckley, C., Salton, J. A. and Singhal, A. 1995. Automatic query expansion using Smart: TREC 3. In The third Text Retrieval Conference, Gaithersburg, MD. • Grefenstette, G. 1993. Automatic thesaurus generation from raw text using knowledge-poor techniques. In Making sense of Words, 9th Annual Conference of the UW Centre for the New OED and Text Research. • Jarmasz, M. and Szpakowicz, S. 2003. Roget's thesaurus and semantic similarity, International Conference RANLP-2003, Borovets, Bulgaria, 212-219. • Landauer, T. K. and Dumais, S. T. 1997. A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of the Acquisition, Induction, and Representation of Knowledge. Psychological Review, 104(2):211-240. • Lesk, M. E. 1986. Automatic sense disambiguation using machine readable dictionaries: How to tell a pine cone from an ice cream cone. In Proceedings of the SIGDOC Conference 1986, Toronto, June. • Li, H. and Abe, N. 1998. Word clustering and disambiguation based on co-occurrence data. In COLING-ACL, pages 749-755. • Lin, C. Y. and Hovy, E. H. 2003. Automatic evaluation of summaries using n-gram co-occurrence statistics. In Proceedings of Human Language Technology Conference (HLT-NAACL 2003), Edmonton, Canada, May. • Lin, D. 1998. Automatic retrieval and clustering of similar words. In COLING-ACL, pages 768-774.

  20. References (cont…) • Madhavan, J., Bernstein, P., Doan, A. and Halevy, A. 2005. Corpus-based Schema Matching. In International Conference on Data Engineering (ICDE-05). • Miller, G. A. and Charles, W. G. 1991. Contextual correlates of semantic similarity. Language and Cognitive Processes, 6(1): 1-28. • Pantel, P. and Lin, D. 2002. Discovering word senses from text. In Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 613-619. • Resnik, P. 1995. Using information content to evaluate semantic similarity. Proc 14th International Joint Conference on Artificial Intelligence, 448-453, Montreal. • Rubenstein, H. and Goodenough, J. B. 1965. Contextual correlates of synonymy. Communications of the ACM, 8(10): 627-633. • Turney, P. D. 2001. Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL. In Proceedings of the Twelfth European Conference on Machine Learning (ECML-2001), pages 491-502. • Vechtomova, O. and Robertson, S. 2000. Integration of collocation statistics into the probabilistic retrieval model. In 22nd Annual Colloquium on Information Retrieval Research, Cambridge, England. • Xu, J. and Croft, B. 2000. Improving the effectiveness of information retrieval. ACM Transactions on Information Systems, 18(1):79-112. • Yarowsky, D. 1992. Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. In Proceedings of COLING-92, Nantes, France, pages 454-460.

  21. Thanks

More Related