1 / 40

Natural Language Processing (highlights)

Natural Language Processing (highlights). Fall 2012 : Chambers. Early NLP. Dave : Open the pod bay doors, HAL. HAL : I’m sorry Dave. I’m afraid I can’t do that. Commercial NLP. NLP is hard. (news headlines). Minister Accused Of Having 8 Wives In Jail

howell
Download Presentation

Natural Language Processing (highlights)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Natural Language Processing(highlights) Fall 2012 : Chambers

  2. Early NLP • Dave: Open the pod bay doors, HAL. • HAL: I’m sorry Dave. I’m afraid I can’t do that.

  3. Commercial NLP

  4. NLP is hard. (news headlines) • Minister Accused Of Having 8 Wives In Jail • Juvenile Court to Try Shooting Defendant • Teacher Strikes Idle Kids • Miners refuse to work after death • Local High School Dropouts Cut in Half • Red Tape Holds Up New Bridges • Clinton Wins on Budget, but More Lies Ahead • Hospitals Are Sued by 7 Foot Doctors • Police: Crack Found in Man's Buttocks

  5. NLP needs to adapt.

  6. NLP needs to adapt. http://xkcd.com/1083/

  7. NLP is also a Knowledge Problem

  8. Language Models • Language Modeling • Build probabilities of words and phrases • Author Detection • Who wrote this email? (is it spam?) • Historical analysis, who was the author of this book? • Intelligence community, who wrote this incendiary blog?

  9. Language Models: Author ID It was the year of Our Lord one thousand seven hundred and seventy-five. Spiritual revelations were conceded to England at that favouredperiod, as at this. Mrs. Southcott had recently attained her five-and-twentieth blessed birthday. - Charles Dickens Mr. Bennet was among the earliest of those who waited on Mr. Bingley. He had always intended to visit him, though to the last always assuring his wife that he should not go; and till the evening after the visit was paid she had no knowledge of it. - Jane Austen Baby, baby, baby oooh Like baby, baby, baby nooo Like baby, baby, baby oooh I thought you'd always be mine - Justin Bieber

  10. Motivation • We want to predict something. • We have some text related to this something. • something= target label Y • text = text features X Given X, what is the most probable Y?

  11. Motivation: Author Detection Alas the day! take heed of him; he stabbed me in mine own house, and that most beastly: in good faith, he cares not what mischief he does. If his weapon be out: he will foin like any devil; he will spare neither man, woman, nor child. X = { Charles Dickens, William Shakespeare, Herman Melville, Jane Austin, Homer, Leo Tolstoy } Y =

  12. N-gram Terminology • Unigrams: single words • Bigrams: pairs of words • Trigrams: three word phrases • 4-grams, 5-grams, 6-grams, etc. “I saw a lizard yesterday” Trigrams <s> <s> I <s> I saw I saw a saw a lizard a lizard yesterday lizard yesterday </s> Unigrams I saw a lizard yesterday </s> Bigrams <s> I I saw saw a a lizard lizard yesterday yesterday </s>

  13. Sentiment Analysis

  14. It's about finding out what people think...

  15. Online social media sentiment apps Several Sentiment Sites Twitter sentiment http://twittersentiment.appspot.com/ Twends: http://twendz.waggeneredstrom.com/ Twittratr: http://twitrratr.com/

  16. Or was she?

  17. Twitter for Stock Market Prediction • “Hey Jon, Derek in Atlanta is having a bacon and egg, er, sandwich. Is that good for wheat futures?”

  18. Sometimes science is hype • The Bollen paper has since been strongly questioned by others in the field. • It contained some overuse of statistical significance tests that could have overestimated how well sentiment actually aligned with market movements. • Nobody has been able to recreate their findings.

  19. Monitor Real-World Events

  20. Learn a Lexicon • Find some data that is labeled • Movie reviews have star ratings • Manually label data yourself • Use a noisy label, such as “#angry” on tweets • Learn a model from the labeled data • Naïve Bayes Classifier • MaxEnt Model (you have not yet learned) • Decision Trees • etc. Try it now!

  21. Track Population Moods

  22. Information Extraction http://www.youtube.com/watch?v=YLR1byL0U8M

  23. Current Examples • Fact extraction about people. Instant biographies. • Search “tom hanks” on google • Never-ending Language Learning • http://rtw.ml.cmu.edu/rtw/

  24. Where is the Naval Academy? • The United States Naval Academy (also known as USNA, Annapolis, or Navy) is a four-year coeducational federal service academy located in Annapolis. • Start your tour at the Armel-Leftwich Visitor Center of the United States Naval Academy, Annapolis, Md. • this is a great place to walk around, whether you are a 1st time or frequent visitor to annapolis. the academy's campus is situated along the creek, thus offering beautiful views of the water and horizons. P(annapolis | sentence) = P(annapolis | features/ngrams/etc.)

  25. Extracting structured knowledge Each article can contain hundreds or thousands of items of knowledge... “The Lawrence Livermore National Laboratory (LLNL) in Livermore, California is a scientific research laboratory founded by the University of California in 1952.” LLNL EQ Lawrence Livermore National Laboratory LLNL LOC-IN California Livermore LOC-IN California LLNL IS-A scientific research laboratory LLNL FOUNDED-BY University of California LLNL FOUNDED-IN 1952

  26. Sentence Parsing

  27. Sentence Parsing • “Fed raises interest rates”

  28. Example 2 “I saw the man on the hill with a telescope.”

  29. Words barely affect structure. telescopes planets Incorrect Correct!!!

  30. Machine Translation Start at ~6min in. http://www.youtube.com/watch?v=Nu-nlQqFCKg

  31. Machine Translation • Commercial-grade translation • translate.google.com

  32. Machine Translation • How to model translations? • Words: P( casa | house ) • Spurious words: P( a | null ) • Fertility: Pn( 1 | house ) • English word translates to one Spanish word • Distortion: Pd( 5 | 2 ) • The 2nd English word maps to the 5th Spanish word

  33. Distortion • Encourage translations to follow the diagonal… • P( 4 | 4 ) * P( 5 | 5 ) * …

  34. Learning Translations • Huge corpus of “aligned sentences”. • Europarl • Corpus of European Parliamant proceedings • The EU is mandated to translate into all 21 official languages • 21 languages, (semi-) aligned to each other • P( casa | house ) = (count all casa/house pairs!) • Pd( 2 | 5 ) = (count all sentences where 2nd word went to 5th word)

  35. Machine Translation Technology • Hand-held devices for military • Speak english -> recognition -> translation -> generate Urdu • Translate web documents • Education technology? • Doesn’t yet receive much of a focus

  36. Text Influence

  37. Text Influence • Can text style influence people? • Can a computer learn to adapt language to accomplish a goal? • Obama 2012 campaign • Sent emails to people every day asking for donations • Sent variations of email, and learned what features caused more donations • http://www.businessweek.com/articles/2012-11-29/the-science-behind-those-obama-campaign-e-mails

  38. Mobile Devices

  39. Mobile Devices • Keystroke prediction has been around for a while now. • New idea: learn individual user preferences • New idea: use a user’s social media text to train on • http://www.youtube.com/watch?v=3hQT-o8ch0o • http://www.youtube.com/watch?v=kA5Horw_SOE

More Related