1 / 40

Toward Zero Resources (or how to get something from nothing)

Toward Zero Resources (or how to get something from nothing). Towards Spoken Term Discovery at Scale with Zero Resources Jansen, Church & Hermansky Interspeech-2010 NLP on Spoken Documents Without ASR Dredze, Jansen, Coppersmith & Church EMNLP-2010.

irina
Download Presentation

Toward Zero Resources (or how to get something from nothing)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Toward Zero Resources(or how to get something from nothing) • Towards Spoken Term Discovery at Scalewith Zero Resources • Jansen, Church & Hermansky • Interspeech-2010 • NLP on Spoken Documents Without ASR • Dredze, Jansen, Coppersmith & Church • EMNLP-2010

  2. We Don’t Need Speech Recognition To Process Speech At least for some tasks

  3. Linking without Labeling • ASR = Linking + Labeling • Linking: find repetitions • Labeling: assign text strings • BOW (Bag of Words)  BOP (Bag of Pseudo-terms) • Pseudo-Terms: Linking (without Labeling) • BOP: Sufficient for many NLP tasks

  4. Speech Processing Chain Information Retrieval Full Transcripts Corpus Organization This Talk Speech Collection Speech Recognition Text Processing Information Extraction Bag of Words Representation Manual Transcripts Sentiment Analysis Good enough for many tasks

  5. Our Goal Link Audio Segments Link Segments Extract Features Speech Recognition Find long (1s) repetitions Interspeech-2010 Label Segments with Text Labeling Full Transcripts Text Processing Extract Features BOW  BOP EMNLP-2010 0 0 1 0 0 1 1 1

  6. Definitions • Towards: • Not there yet • Zero Resources: • No nothing (no knowledge of language/domain) • The next crisis will be where we are least prepared • No training data, no dictionaries, no models, no linguistics • Low Resources: A little more than zero • Spoken Term Discovery (Linking without Labeling) • Spoken Term Detection (Word Spotting): Standard • Find instances of spoken phrase in spoken document • Input: spoken phrase + spoken document • Spoken Term Discovery: Non-standard task • Input: spoken document (without spoken phrase) • Output: spoken phrases (interesting repeated intervals in document)

  7. What makes an interval of speech interesting? • Cues from text processing: • Long (~ 1 sec such as “The Ed Sullivan Show”) • Repeated • Bursty (tf * IDF) • tf: lots of repetitions within a particular document • IDF: with relatively few repetitions across other documents • Unique to speech processing: • Given-New: • First mention is articulated more carefully than subsequent • Dialog between two parties (A & B): • A: utters an important phrase • B: what? • A: repeats the important phrase

  8. Related Work(Mostly Speech Literature and Mostly from Boston) • Other approaches • Phone recognition (Lincoln Labs) • Use existing phone recognizers to create phone n-grams for topic classification • Hazen et al., 2007, 2008 • Self organizing units (BBN) • Unsupervised discovery of phone like units for topic classification • Garcia and Gish, 2006; Siu et al, 2010 • Find recurring patterns of speech (MIT-CSAIL) • Park and Glass, 2006, 2008 • Similar goals • Audio summarization without ASR • Finds similar regions to include in summary • Zhu, 2009 (ACL)

  9. n2 Time & Space • But the constants are attractive • Sparsity • Resigned algorithms to take advantage of sparsity • Median Filtering • Hough Transform • Line Segment Search

  10. Representations for Learning • Back to NLP… • Group matched segments into Pseudo-Terms • BOW (bag of words)  BOP (bag of pseudo-terms) 0 0 1 0 0 1 1 1 Matched Segments Feature Vectors

  11. Creating Pseudo-Terms P2 P1 P3

  12. Example Pseudo-Terms term_5term_6 term_63term_113 term_114 term_115 term_116 term_117 term_118 term_119 term_120 term_121 term_122 our_life_insurancetermlife_insurancehow_much_welong_termbudget_forour_life_insurancebudgetend_of_the_monthstay_within_a_certainyou_knowhave_tocertain_budget

  13. Graph Based Clustering • Nodes: each matched audio segment • Edges: edge between two segment if fractional overlap exceeds threshold • Extract connected components of graph • This work: One pseudo-term for each connected component • Future work: better graph clustering algorithms keep track a paper newspapers keep track of newspaper Pseudo-term 1 Pseudo-term 2

  14. Tradeoff in Cluster Quality • We need to find the right tradeoff for our task • Select tradeoff based on dev data term_5 term_63 term_116 our_life_insurance life_insuranceour_life_insurance Similarity Threshold Smaller Larger Less More Pseudo-Terms

  15. Feature Vectors: BOW  BOP four score seven years ... 1 1 1 2 Four score and seven years is a lot of years. 0 0 1 0 1 term_12 term_5 term_12 term_12 term_5 … 2 1 0 0 1 0 1 Question: are pseudo-terms good enough?

  16. Evaluation: Data • Switchboard telephone speech corpus • 600 conversation sides, 6 topics, 60+ hours of audio • Topics: family life, news media, public education, exercise, pets, taxes • Identify all pairs of matched regions • Graph clustering to produce pseudo-terms • O(n2) on 60+ hours is a lot! • Efficient algorithms and sparsity not as bad as you think • 500 terapixeldotplot from 60+ hours of speech • Compute time: 100 cores, 5 hours

  17. Evaluation • Representations • Manual transcripts as bag of words • Requires full speech recognition • Pseudo-terms • Requires acoustic model

  18. Two Evaluation Tasks • Topic clustering (unsupervised) • Automatically discover latent topics in conversations • Standard clusterer given correct number of topics • Topic classification (supervised) • Learn topic labels from supervised data • Several classification algorithms • CW (Dredze et al, 2008) • MaxEnt • 10 fold CV

  19. Clustering (Unsupervised) Results

  20. Classification (Supervised) Results

  21. Future Directions(More something from nothing) • Extend NLP of speech to new areas • Languages, domains, settings where we have little data for speech recognition • BOW (BOP) sufficient for many NLP tasks • BOW (BOP)  TF*IDF! • Lingering Questions • What else can we do? • Topic models? • Information extraction? • Information retrieval? • …

More Related