1 / 21

Discriminating Word Senses Using McQuitty’s Similarity Analysis

Discriminating Word Senses Using McQuitty’s Similarity Analysis. Amruta Purandare University of Minnesota, Duluth Advisor : Dr Ted Pedersen Research supported by National Science Foundation (NSF) Faculty Early Career Development Award (#0092784). Discriminating “line”.

Download Presentation

Discriminating Word Senses Using McQuitty’s Similarity Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discriminating Word Senses Using McQuitty’s Similarity Analysis Amruta Purandare University of Minnesota, Duluth Advisor : Dr Ted Pedersen Research supported by National Science Foundation (NSF) Faculty Early Career Development Award (#0092784)

  2. Discriminating “line” They will begin line formation before ceremony Connect modem to any jack on your line Quit printing after the last line of each file Your line will not get tied while you are connected to net Stand balanced and comfortable during line up Lines that do not fit a page are truncated New line service provides reliable connections Pages are separated by line feed characters They stand far right when in line formation

  3. They will begin line formation before ceremony Stand balanced and comfortable during line up They stand far right when in line formation Your line will not get tied while you are connected to net Connect modem to any jack on your line New line service provides reliable connections Quit printing after the last line of each page Lines that do not fit a page are truncated Pages are separated by line feed characters

  4. Introduction • What is Word Sense Discrimination ? • Unsupervised learning Clusters Training Features Test Feature Vectors similarity matrix evaluate

  5. Representing context • Features (from training) • Bi grams • Unigrams • Second Order Co-occurrences/SOCs (Schütze98) • Mixture • Feature vectors (Binary) • Measuring similarity • Cosine • Match

  6. Feature examples

  7. McQuitty’s method • Pedersen & Bruce, 1997 • Agglomerative • UPGMA / Average Link • Stopping rules • Number of clusters • Score cutoff x+y/2 y x

  8. Evaluation sense1 ( Maj ) sense2 sense3 sense4 c2 c3 c1 c4

  9. Evaluation Accuracy=38/55=0.69 sense3 sense4 sense1 sense2

  10. Majority Sense Classifier Maj. =17/55=0.31 sense2

  11. Experimental Data

  12. Scope of the experiments • 584 experiments (73 * 4 * 2) • 73 Words: 72 Senseval-2, LINE • 4 Features: Bi grams, Unigrams, SOCs, Mix • 2 Similarity Measures: Match, Cosine • Window = 5 • for Bi grams and SOCs • Frequency cutoff = 2

  13. Senseval-2 Results POS wise 29 NOUNS 28verbs 15 adjs Maj=0.57 Maj=0.51 Maj=0.64 No of words of a POS for which experiment obtained accuracy more than Majority

  14. Senseval-2 Results Feature wise SOC UNI BI 32 18 38 72 words X 2 measures = 144

  15. Senseval-2 Results Measure wise COS MAT 49 39 72 words x 3 features = 216

  16. Line Results Maj = 0.16 On uniform distribution of 6 senses

  17. Sample Confusion Table (fine.soc.cos) S0 = elegant S1 = small grained S2 = superior S3 = satisfactory S4 = thin 60 precision = 36/60 = 60.00

  18. Conclusions • Small set of SOCs was powerful • Half the number of unigrams/bigrams • Scaling done by Cosine helps ! • Need more training data! • Need to improve feature… • Selection (Tests of associations) • extraction (Stemming) • matching (Fuzzy matching) …strategies for bi grams • Explore new features • POS • Collocations

  19. Recent work • PDL implementation • Cluto - Clustering Toolkit http://www-users.cs.umn.edu/~karypis/cluto • 6 clustering methods, 12 merging criteria • Plans • Comparing clustering in similarity space Vs vector space (Schütze, 1998) • Stopping rules

  20. Sense labeling They will begin line formation before ceremony Stand balanced and comfortable during line up They stand far right when in line formation formation Your line will not get tied while you are connected to net Connect modem to any jack on your line New line service provides reliable connections phone Quit printing after the last line of each file Lines that do not fit a page are truncated Pages are separated by line feed characters text

  21. Software Packages • SenseClusters (Our Discrimination Toolkit) http://www.d.umn.edu/~tpederse/senseclusters.html • PDL (Used to implement clustering algorithms) http://pdl.perl.org/ • NSP (Used for extracting features) http://www.d.umn.edu/~tpederse/nsp.html • SenseTools (Used for preprocessing, feature matching) http://www.d.umn.edu/~tpederse/sensetools.html • Cluto (Clustering Toolkit) http://www-users.cs.umn.edu/~karypis/cluto

More Related