1 / 20

CS 679: Advanced NLP

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 679: Advanced NLP. Lecture #1: Introduction to Text Mining. Objectives for Today. Quick course info. Overview of Text Mining Discuss your applications of Text Mining Elements of Text Mining

kata
Download Presentation

CS 679: Advanced NLP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 679: Advanced NLP Lecture #1: Introduction to Text Mining

  2. Objectives for Today • Quick course info. • Overview of Text Mining • Discuss your applications of Text Mining • Elements of Text Mining • Introduce course objectives

  3. Course Info. • Office Hours: • Tue & Thu. 3-4pm (without appointment) • OR by appointment • TA: TBD • Web page: https://facwiki.cs.byu.edu/cs679 • Syllabus • Regularly updated schedule: Due dates, Reading assignments, Projects guidelines, Lecture Notes • Google Group “BYU CS 679” • Email: ringger AT cs DOT byu DOT edu • Grades: http://gradebook.byu.edu

  4. Assignments • Readings – with max. one page reports • Mostly research papers (see course web page for all hyperlinks) • Usually one reading report per week • Intro. Projects • Presentation • Report • Semester Project • Proposal • Presentation • Report

  5. Course Policies • Early • Late • Grades • Other See Syllabus for details

  6. Text Mining The process of discovering previously unknown information in large text collections Paraphrased from M. Hearst

  7. Other Definitions • Looking for patterns in unstructured text (Nahm) • Text mining applies the same analytical functions of data mining to the domain of textual information (Doore(

  8. “Search” versus “Discover” Search (goal-oriented) Discover (opportunistic) Structured Data Data Retrieval Data Mining Unstructured Data (Text) Information Retrieval Text Mining Credit: adapted from slide by Nathan Treloar, AvaQuest

  9. Your Exciting Applications

  10. F2011: Your Exciting Applications

  11. W2011: Exciting Applications

  12. 2010: Exciting Applications

  13. 2009: Exciting Applications

  14. Additional Applications • News Mining • Sentiment Detection • Summarization • Trend Analysis • Association Detection

  15. Course Objectives • Acquire experience conducting exploratory data analysis on large collections of text • Gain in-depth experience with and understanding of approaches to • document classification • sentiment classification • feature engineering • feature selection • document clustering • unsupervised topic identification • visualization, including document summarization • Build a foundation of techniques for approximate Bayesian reasoning for unsupervised text analysis

  16. Course Objectives (2) • Obtain experience with techniques for evaluating and visualizing the results of unsupervised learning processes • Independent investigation of methods of your choice! • Application of your methods to learn something important from a significant text corpus of your choice

  17. Simplistic Text Mining Process Credit: NCSA

  18. Methods • Feature Engineering • Feature Selection • Information Extraction • Categorization (Supervised) • Clustering (Unsupervised) • Topic Identification / Topic Modeling • Visualization

  19. Some Available Data Sets • 20 Newsgroups -- Usenet • Reuters (1990s) newswire • Del.icio.us bookmarked web pages • Enron Email • Movie Reviews • Gamespot game reviews • General Conference • State of the Union • Campaign Speeches… • Yours!

  20. Assignment • Reading for next time: • Course Syllabus • "Tapping the Power of Text Mining" by Fan et al. (CACM 9/2006) • "Text-Mining the Voice of the People" by Evangelopoulos et al. (CACM 2/2012) • Skim: Alta Plana Text Analytics Report • Reading Report #1 • % Completed • Questions

More Related