1 / 47

Detecting Genre Shift

Detecting Genre Shift. Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10. Natural Language Processing and Machine Learning. Extracting findings from scientific papers Genetic epidemiology (development domain) PubMed search produces thousands of papers

dillan
Download Presentation

Detecting Genre Shift

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting Genre Shift Mark Dredze, Tim Oates, Christine Piatko Paper to appear at EMNLP-10

  2. Natural Language Processing and Machine Learning • Extracting findings from scientific papers • Genetic epidemiology (development domain) • PubMed search produces thousands of papers • Manually reviewed to extract findings • Findings determine relevant papers/studies • Automate this process with ML/NLP methods • Create searchable database of findings • Allow machine inference over findings • Suggest new scientific hypotheses

  3. Genre Shift in Statistical NLP … told that John Paul Stevens is retiring this summer … … President Barack Obama is urging members to … … President Barack Obama is urging members to … Named Entity Recognition

  4. Supervised Machine Learning for Named Entity Recognition Today the Atlantic Ocean is in an uproar and North Carolina remains in a state of anxiety.

  5. Supervised Machine Learning for Named Entity Recognition

  6. Genre Shift in Statistical NLP … told that John Paul Stevens is retiring this summer … … PRESIDENT BARACK OBAMA IS URGING MEMBERS TO… Named Entity Recognition ???

  7. This is a Pervasive Problem • Extracting regulatory pathways from online bioinformatics journals using a parser trained on the WSJ • Finding faces in images of disaster victims using a model trained on “mug shot” images • Identifying RNA sequences that regulate gene expression in a lab in Baltimore using a model trained on data gathered in a lab in Germany When things change in a way that’s harmful, we’d like to know!

  8. Data Streams Change Over Time Sentiment classification from movie reviews • Natural drift • Users unaware of system limitations

  9. Detecting Genre Shift Genre shift hurts system performance (accuracy) Two problems Detect changes in stream of numbers (A-distance) Convert document stream to stream of informative numbers (margin)

  10. Detecting Genre Shift Genre shift hurts system performance (accuracy) • Measure accuracy directly • Requires labeled examples! • Look for changes in feature distributions • Words become more/less common • New words appear

  11. Measuring Changes in Streams:The A-Distance P P’ A nonparametric, distribution independent measure of changes in univariate, real-valued data streams (Kifer, Ben-David, and Gherke, 2004)

  12. Measuring Changes in Streams:The A-Distance P P’ > ε

  13. Measuring Changes in Streams:The A-Distance P P’ > ε

  14. Changes in Document Streams X … President Barack Obama is urging members to …

  15. Changes in Document Streams X 4 Obama 4 1 1 embassy … President Barack Obama is urging members to …

  16. Changes in Document Streams W X 1.6 Obama 4 1.6 * 4 + 0.1 * 1 + … = 3.7 1 0.1 embassy … President Barack Obama is urging members to …

  17. Changes in Document Streams W X 1.6 Obama 4 1.6 * 4 + 0.1 * 1 + … = 3.7 1 0.1 embassy … President Barack Obama is urging members to … • WX = margin • sign of WX is class label (+/-) • magnitude of WX is “certainty” in label

  18. Why Margins? • We have an easy way of producing them from unlabeled examples! • We want to track feature changes • Margins are linear combinations of feature values • Removing important features yields smaller margins • Only track features that matter, features with zero (small) weight don’t affect margin (much) • Spoiler alert! Tracking margins works really well for unsupervised detection on genre shifts.

  19. Accuracy vs. Margins DVD to Electronics

  20. Accuracy vs. Margins DVD to Electronics Average in block Average over last 100 instances

  21. Accuracy vs. Margins DVD to Electronics

  22. Confidence Weighted Margins • Margins can be viewed as measure of confidence • We detect when confidence in classifications drops • Confidence Weighted (CW) learning refines this idea • Gaussian distribution over weight vectors • Mean of weight vector: μ in RN • Diagonal co-variance matrix: σ in RNxN • Low variance  high confidence • Normalized margin: μx / (xTσx)0.5 • Called VARIANCE in slides that follow μ σ = 0.02 1.6 σ = 1.74 0.1

  23. Experiments • Datasets • Sentiment classification between domains (Blitzer et al., 2007) • DVDs, electronics, books, kitchen appliances • Spam classification between users (Jiang and Zhai, 2007) • Named entity classification between genres (ACE 2005) • News articles, broadcast news, telephone, blogs, etc. • Algorithms • Baselines: SVM, MIRA, CW • Our method: VARIANCE

  24. Experiments • Simulated domain shifts between each pair of genres • 38 pairs, 10 trials each with different random instance orderings • 500 source examples • 1500 target examples • False change • 11 datasets with no shift, 10 trials with different random instance orderings • If no shift found then detection recorded as end of target examples when computing averages

  25. Comparing Algorithms Good for our approach! Good for baseline Instances from point of shift

  26. SVM vs. VARIANCE

  27. SVM vs. VARIANCE

  28. Summary of Results Thus Far • VARIANCE detected shifts faster than … • SVM 34 times out of 38 • MIRA 26 times out of 38 • CW 27 times out of 38

  29. Gradual Shifts

  30. What if you have labels? • STEPD: a Statistical Test of Equal Proportions to Detect concept drift (Nishida and Yamauchi, 2007) • Monitors accuracy of classifier from stream of labeled examples • Parameters: window size, W, and threshold, α

  31. Comparison to STEPD

  32. What about false positives?

  33. The A-Distance: Choosing Parameters P A n > ε

  34. The A-Distance: Choosing Parameters P A n > ε

  35. The A-Distance: Choosing Parameters • A-distance paper gives bounds on FPs and FNs • Bounds depend on n and e • Bounds do not depend on tiling! • So loose as to be meaningless • No guidance on how to choose tiling • What if tiles lie outside support of data?

  36. Better Bounds • PA = true probability of a point falling in tile A • h = number of points that actually fell in A • pA = h/n = ML estimate of PA • Define P’A, h’, and p’A for second window • Suppose PA = P’A, then any change detected is a false positive What is the probability that |pA – p’A| > e/2? > ε

  37. Posterior Over PA • B(a, b) is the Beta function over a + b Bernoulli trials • a trials have one outcome (point lands in tile A) • b trials have the other (point lands in some other tile)

  38. False Positives: Two Cases

  39. Don’t worry, I’m not going to explain this (much)

  40. Probability of a FP (n = 200)

  41. Probability of FN

  42. Minimizing Expected Loss

  43. Moving Forward Twitter Transcribed Broadcast News Genre Classifier Newswire

  44. Genre Shift “Fix” … told that John Paul Stevens is retiring this summer … … PRESIDENT BARACK OBAMA IS URGING MEMBERS TO… Named Entity Recognition

  45. Genre Shift “Fix” … told that John Paul Stevens is retiring this summer … … PRESIDENT BARACK OBAMA IS URGING MEMBERS TO… … President Barack Obama is urging members to … Named Entity Recognition

  46. Conclusion • Changes in margins convey useful information about changes in classification accuracy • No need for labeled examples! • The A-distance applied to margin streams finds genre shifts with few false positives/negatives • Confidence weighted margins normalized by variance detect shifts faster than SVM, MIRA, or (non-normalized) CW margins • Our approach even works with gradual shifts and compares favorably to shift detectors that use labeled examples

  47. Thank you!

More Related