1 / 20

Masoud Rouhizadeh

Integrating Latent Semantic Analysis and Language Model for Character Prediction in a Binary Response Typing Interface. Seminar on Speech and Language Processing for Augmentative and Alternative Communication. Masoud Rouhizadeh. Introduction.

urbana
Download Presentation

Masoud Rouhizadeh

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Integrating Latent Semantic Analysis and Language Model for Character Prediction in a Binary Response Typing Interface • Seminar on Speech and Language Processing for Augmentative and Alternative Communication Masoud Rouhizadeh

  2. Introduction • Most of the word or character prediction systems make use of word-based and/or character-based n-gram language models. • Some works for enriching such language models with further syntactic or semantic information. (Wandmacher et al. 2007 & 2008). • Predictive powers of Latent Semantic Analysis (LSA) for character prediction in a typing interface developed by Brian Roark (Roark 2009)

  3. Roark's binary switch typing interface • Binary-switch • Static/dynamic grid • Different language model contributions • Different scanning modes: Row-Column RSVP Huffman

  4. Latent Semantic Analysis (LSA) • A technique to model semantic similarity based on co-occurrence distributions of words • LSA is able to relate coherent contexts to specific content words • Good at predicting the occurrence of a content word in the presence of other thematically related terms

  5. LSA, an example of documents 1. The Neatest Little Guide to Stock Market Investing 2. Investing For Dummies, 4th Edition 3. The Little Book of Common Sense Investing: The Only Way to Guarantee Your Fair Share of Stock Market Returns 4. The Little Book of Value Investing 5. Value Investing: From Graham to Buffett and Beyond 6. Rich Dad's Guide to Investing: What the Rich Invest in, That the Poor and the Middle Class Do Not! 7. Investing in Real Estate, 5th Edition 8. Stock Investing For Dummies 9. Rich Dad's Advisors: The ABC's of Real Estate Investing: The Secrets of Finding Hidden Profits Most Investors Miss

  6. Preprocessing and tokenizing • Tokenizing • Removing ignored characters • Turning everything into lowercase • Removing stop words

  7. Term by documents matrix

  8. Cosine similarity • Each word is represented as a vector • A[0, 0, 1, 1, 0, 0, 0, 0, 0]) • B[1, 0, 1, 0, 0, 0, 0, 1, 0]) • 0.4082

  9. Integrating LSA and language model • LSA is a bag of words model and is shown to be reliable to predict a word within a context • Making it more sensitive to context • Pa is estimated by cosine similarity of w1 w2 • Pb is estimated by bigram probability of w1 w2 • P(w2|w1) = λPa + (1-λ)Pb

  10. From word to character prediction • In Roark's typing interface we are interested to predict characters, rather than words. • Sorting the upcoming words based on their probabilities • Evaluated by RSVP simulation

  11. From word to character prediction computer association accessories arts architecture ... bags backup backpack batteries backgrounds brands ... _, e, a, i, c, f, o, n, d, g, ,, t, r, h, m, ., ", s, l, p, b, -, u, "", w, k, j, q , $, y, v, x, z, :, ; B

  12. From word to character prediction computer bags backup backpack batteries backgrounds brands brain ... a,r, e, a, i, c, f, .... A

  13. From word to character prediction computer bags backup backpack batteries backgrounds ... c, g, t, e, a, i, c,.... C

  14. From word to character prediction computer backup backpack backgrounds ... k, a, e, a, i, c, f, … K

  15. From word to character prediction computer backup backpack backgrounds ... g, p, u, e, a, i, c, f… U

  16. From word to character prediction computer backup p, e, a, I, c, f, .... P

  17. Evaluation • Simulation mode Trained and tested on a small part of NY Times portion of the English Gigawordcorpus RSVP

  18. Results 17.79 % keystroke-saving per sentence Average key-stroke per sentence Character frequency LSA+Bigram scanning Model

  19. Conclusion • Word-based language models shown to be effective in character prediction • Integration of LSA and bigram language model works well in predicting upcoming words • With larger LSA and bigram models we expect better results

  20. Thank you.

More Related