1 / 17

Words that will inspire

Explore the data and insights behind popular TED Talks, leveraging text analytics and predictive modeling. Discover influential factors that drive talk popularity and learn how R libraries and Shiny facilitate data transformations and tool deployment.

cardoza
Download Presentation

Words that will inspire

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Words that will inspire Eduardo Contreras Cortes www.speakthedata.com @edco_one

  2. The Motivation I have a dream that one day… We choose to go to the moon in this decade… We shall fight on the beaches…

  3. The Inspiration

  4. The back-of-the-Envelope calculation

  5. But we need more data! • Sufficient number of talks • Ideally same format and style • Transcripts to be scrapable • A way to track progress of popularity

  6. Eureka! https://www.kaggle.com/rounakbanik/ted-talks

  7. The approach • I. Data extraction and feature engineering • II. Data analysis and model ensemble • III. Model insights

  8. I. Data Extraction and Feature Engineering “The” Source: www.tidytextmining.com/

  9. I. Data Extraction and Feature Engineering • Dataset • More than 2,500 Ted Talks from all TED Events • All transcripts from each talk • Available data: Number of views, comments and ratings • Enriched dataset: Filmed date, published date and duration time • Building Features • Word counts: Number of sentences/words per minute, average words per sentence • Audience reaction: Laughs, questions, applauses • N-gram Word analysis: Frequency of words like “I”, “You”, “Going To”, “Want” To Predict: Binary classification if the Ted Talk is a top most viewed talk

  10. II. Data analysis and model ensemble 1 2 3 4 • Correlation Analysis • Remove variables that were correlated • Descriptive Analysis • Duration of the talks to be similar • Analysis frequency of n-grams • Standarised views per time shown in website • Additional Feature Engineering • Combine n-grams to reduce features • Model Assessment • From simplex to complex models • Understand the most relevant variables of the models • Produce an explainable model! • Libraries used • Tidyverse, Tidytext, • Smbinning, Wordcloud • Libraries used • Cor, Vinf • Libraries used • Cor, Vinf • Libraries used • ROCR, glm, randomForest, • Xgboost, H2O

  11. II. Data analysis and model ensemble

  12. III. Model Insights • The selected model • Logistic Regression Scorecard with 7 variables • AUC: 76% Accuracy: 73% • Shorter duration talks were more effective • 2X More Effective • Speak slowly, less words per minute is better • 2X More Effective • Ask questions! The more the better! • 1.5X More Effective • Make your audience laugh! • 1.5X More Effective

  13. III. Model Insights

  14. III. Model Insights • Libraries used • Developed in Shiny • Deployed in Shinyapps.io • Shinydashboard as layout • Tidyverse for data transformations • Plotly for graphics www.speakthedata.com

  15. Final remarks Text analytics and Predictive modelling showed influential factors that predicts popularity of talks R libraries eased the work of data transformations and modelling Shiny and Shinyapps.io facilitated the deployment of the tool

  16. Thank you! www.speakthedata.com @edco_one

More Related