1 / 0

Text Mining Presidential Speeches – Brand Management

Text Mining Presidential Speeches – Brand Management. Timothy D’Auria BostonDecision.com May 14 th , 2012.

iona
Download Presentation

Text Mining Presidential Speeches – Brand Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text Mining Presidential Speeches – Brand Management

    Timothy D’Auria BostonDecision.com May 14th, 2012 Disclaimer: Boston Decision believes the information contained herein to be accurate. However, Boston Decision, LLC makes no guarantees and no warranties, written, oral or implied, including without limitation any implied warranties of merchantability, fitness, or accuracy. Recipient assumes all responsibility for use of the information contained herein.
  2. Boston Decision MINE – PREDICT – AUTOMATE Provide the skills, resources, expertise
  3. How It All Started
  4. Thought… Track how a candidate changes positions over time.
  5. How to Do It
  6. Sentiment Analysis -1 +1
  7. Standard Deviation of Sentiment Average Sentiment = (+1 – 1) / 2 = 0 Square Difference of +1 from Mean = (+1 – 0)^2 = 1 Square Difference of -1 from Mean = (-1 – 0)^2 = 1 Standard Deviation = sqrt((1 + 1) / 1) = sqrt(2) = 1.41
  8. Sentiment Analysis
  9. The Flip-Flop Score FFScore = sum of sentiment standard deviations across pertinent issues.
  10. The Problem Flip-flops are rare Flip-flops are rarely clean-cut
  11. Finding Candidates are fairly consistent in message
  12. Candidate Fingerprint Given a random speech where we are unsure who said it…
  13. Prediction Can we predict who said it? Accuracy?
  14. Why bother? Automatic brand consistency Plagiarism detection With a simple change to the text used… Predict effective campaign messaging Predict profitable content Identify comments indicating readiness to buy Optimal keyword selection
  15. Speech Sources http://obamaspeeches.com/ http://mittromneycentral.com/speeches/
  16. The Technology R – Free! tm wordcloud kernlab plyr class Snowball RStudio - Free
  17. Stored the Speeches
  18. Speeches
  19. Define Speech Directories in R candidates <- c("romney", "obama") pathname <- "C:/Users/tdauria/Google Drive/meetups/05/speeches"
  20. Create A Corpus A corpus is a container for documents
  21. What is a document? A document contains text information
  22. What is a document? A text file A paragraph A sentence Etc..
  23. Speech Corpus 1 corpus per candidate Each document is a single speech s.cor <- Corpus(DirSource(directory = s.dir, encoding = "ANSI"))
  24. Clean the Corpus What is the value of text? Upper versus. Lower Case The, A, An, This, That Telephone, phone, phones
  25. Cleanup Function cleanCorpus <- function(corpus) { # Apply Text Mining Clean-up Functions corpus.tmp <- tm_map(corpus, convertPrettyApostrophe) corpus.tmp <- tm_map(corpus.tmp, removePunctuation) corpus.tmp <- tm_map(corpus.tmp, stripWhitespace) corpus.tmp <- tm_map(corpus.tmp, tolower) corpus.tmp <- tm_map(corpus.tmp, removeWords, stopwords("english")) corpus.tmp <- tm_map(corpus.tmp, stemDocument, language = "english") return(corpus.tmp) }
  26. Remove Punctuation Before: After: corpus.tmp <- tm_map(corpus.tmp, removePunctuation)
  27. To Lowercase Before: After: corpus.tmp <- tm_map(corpus.tmp, tolower)
  28. Remove English Stopwords Before: After: corpus.tmp <- tm_map(corpus.tmp, removeWords, stopwords("english"))
  29. Term Document Matrix
  30. Sparse Terms Most terms will be used infrequently and won’t add value to the analysis. Remove. s.tdm <- removeSparseTerms(s.tdm, 0.7)
  31. Word Cloud A visual tool to explore frequency of word usage wordcloud(term, freq)
  32. Obama Word Cloud
  33. Romney Word Cloud
  34. Relationship between concepts

  35. Romney
  36. Romney
  37. Romney Take leadership Free economy Business opportunity Obama takes away freedom
  38. Obama
  39. Obama
  40. Obama Caring for people Time for change Hope
  41. Romney vs. Obama Obama themes are broader Height of term on the dendrogram Romney themes are more business-oriented Obama more personal-oriented
  42. Concept Framing How does each candidate frame a topic in terms of other topics? Daniel P. Parker, U. Penn
  43. Term Associations Economy Energy Health Military findAssocs(tdm[[1]][[2]], 'economy', 0.50)
  44. Economy
  45. Energy
  46. Health
  47. Military
  48. Create predictive model Input a speech Output a name
  49. Term Document Matrix
  50. Term Document Matrix
  51. Hypothesis Candidates have unique linguistic patterns These patterns can serve as a fingerprint
  52. Predict an unknown
  53. K-Nearest Neighbor Algorithm Which past speech most closely matches with the speech we are trying to identify? ? Romney Obama Obama Romney Obama Romney
  54. K-Nearest Neighbor Algorithm Closeness is measured by plotting each term by its frequency. ? Romney Obama Obama Romney Obama Romney
  55. K-Nearest Neighbor Algorithm K-Nearest Neighbor is one of hundreds of possible modeling approaches Fast Simple Easy to conceptualize Accurate?
  56. Some manipulation s.mat <- t(as.matrix(tdm[["tdm"]]))
  57. “Hold-Out Sample – Testing”
  58. Run Model knn(training data, test data, training answers) Runs in microseconds
  59. The Results

  60. Confusion Matrix ACTUAL PREDICTION
  61. Accuracy ACTUAL PREDICTION Accuracy = sum of diagonal over n = 19 / 19 = 100%
  62. Validation Resample new test cases and repeat model Average accuracy results Average accuracy = 95%
  63. Score Algorithm Created program where you feed in a speech, and it will output the speaker. Accepts a file or URL scoreSpeech(new speech, knn.train.data)
  64. Take Aways Data is all around us. Shift towards unstructured data (80%) Automation Any business, any industry, any data
  65. Next Event

    End of May / Early JuneProfits from Data Mining Tim D’Auria tdauria@bostondecision.com Boston Decision, LLCAutomate & Predict Business http://www.bostondecision.com Disclaimer: Boston Decision believes the information contained herein to be accurate. However, Boston Decision, LLC makes no guarantees and no warranties, written, oral or implied, including without limitation any implied warranties of merchantability, fitness, or accuracy. Recipient assumes all responsibility for use of the information contained herein.
More Related