1 / 24

Learning Within-Sentence Semantic Coherence

Learning Within-Sentence Semantic Coherence. Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University. Semantic (in)Coherence. Trigram: content words unrelated Effect on speech recognition:

ford
Download Presentation

Learning Within-Sentence Semantic Coherence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Within-Sentence Semantic Coherence Elena Eneva Rose Hoberman Lucian Lita Carnegie Mellon University

  2. Semantic (in)Coherence • Trigram: content words unrelated • Effect on speech recognition: • Actual Utterance: “THE BIRDFLU HAS AFFECTED CHICKENS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMANS SICK” • Top Hypothesis: “THE BIRDFLU HAS AFFECTED SECONDS FOR YEARS BUT ONLY RECENTLY BEGAN MAKING HUMAN SAID” • Our goal: model semantic coherence

  3. A Whole Sentence Exponential Model [Rosenfeld 1997] • P0(s) is an arbitrary initial model (typically N-gram) • fi(s)’s are arbitrary computable properties of s (aka features) • Z is a universal normalizing constant def

  4. A Methodology for Feature Induction Given corpus T of training sentences: • Train best-possible baseline model, P0(s) • Use P0(s) to generate corpus T0 of “pseudo sentences” • Pose a challenge: find (computable) differences that allow discrimination between T and T0 • Encode the differences as features fi(s) • Train a new model:

  5. Discrimination Task: Are these content words generated from a trigram or a natural sentence? • - - - feel - - sacrifice - - sense - - - - - - - - -meant - - - - - - - - trust - - - - truth • - - kind - free trade agreements - - - living - - ziplock bag - - - - - - university japan's daiwa bank stocks step –

  6. Building on Prior Work • Define “content words” (all but top 50) • Goal: model distribution of content words in sentence • Simplify: model pairwise co-occurrences (“content word pairs”) • Collect contingency tables; calculate measure of association for them

  7. Q Correlation Measure Derived from Co-occurrence Contingency Table • Q values range from –1 to +1

  8. Density Estimates • We hypothesized: • Trigram sentences: wordpair correlation completely determined by distance • Natural sentences: wordpair correlation independent of distance • kernel density estimation • distribution of Q values in each corpus • at varying distances

  9. Distance = 1 Distance = 3 Q Distributions • ---- Trigram Generated • Broadcast News Density Q Value

  10. Likelihood Ratio Feature she is a country singer searching for fame and fortune in nashville Q(country,nashville) = 0.76 Distance = 8 Pr (Q=0.76|d=8,BNews) = 0.32 Pr(Q=0.76|d=8,Trigram) = 0.11 Likelihood ratio = 0.32/0.11 = 2.9

  11. Simpler Features • Q Value based • Mean, median, min, max of Q values for content word pairs in the sentence (Cai et al 2000) • Percentage of Q values above a threshold • High/low correlations across large/small distances • Other • Word and phrase repetition • Percentage of stop words • Longest sequence of consecutive stop/content words

  12. Datasets • LM and contingency tables (Q values) derived from 103 million words of BN • From remainder of BN corpus and sentences sampled from trigram LM: • Q value distributions estimated from ~100,000 sentences • Decision tree trained and test on ~60,000 sentences • Disregarded sentences with < 7 words • “Mike Stevens says it’s not real” • “We’ve been hearing about it”

  13. Experiments • Learners: • C5.0 decision tree • Boosting decision stumps with Adaboost.MH • Methodology: • 5-fold cross validation on ~60,000 sentences • Boosting for 300 rounds

  14. Results

  15. Shannon-Style Experiment • 50 sentences • ½ “real” and ½ trigram-generated • Stopwords replaced by dashes • 30 participants • Average accuracy of 73.77% ± 6 • Best individual accuracy 84% • Our classifier: • Accuracy of 78.9% ± 0.42

  16. Summary • Introduced a set of statistical features which capture aspects of semantic coherence • Trained a decision tree to classify with accuracy of 80% • Next step: incorporate features into exponential LM

  17. Future Work • Combat data sparsity • Confidence intervals • Different correlation statistic • Stemming or clustering vocabulary • Evaluate derived features • Incorporate into an exponential language model • Evaluate the model on a practical application

  18. Agreement among Participants

  19. Expected Perplexity Reduction • Semantic coherence feature • 78% of broadcast news sentences • 18% of trigram-generated sentences • Kullback-Leibler divergence: .814 • Average perplexity reduction per word = .0419 (2^.814/21) per sentence? • Features modify probability of entire sentence • Effect of feature on per-word probability is small

  20. ---- Trigram Generated • Broadcast News Distribution of Likelihood Ratio Density Likelihood Value

  21. Discrimination Task • Natural Sentence: • but it doesn't feel like a sacrifice in a sense that you're really saying this is you know i'm meant to do things the right way and you trust it and tell the truth • Trigram-Generated: • they just kind of free trade agreements which have been living in a ziplock bag that you say that i see university japan's daiwa bank stocks step though

  22. Q Values at Distance 1 • ---- Trigram Generated • Broadcast News Density Q Value

  23. ---- Trigram Generated • Broadcast News Q Values at Distance 3 Density Q Value

  24. Outline • The problem of semantic (in)coherence • Incorporating this into the whole-sentence exponential LM • Finding better features for this model using machine learning • Semantic coherence features • Experiments and results

More Related