1 / 5

What is Text Analysis

Text analysis is known as text analytics. It refers to the representation, processing, and modeling of textual data to derive beneficial insights. An important element of text analysis is text mining, the process of finding relationships and interesting patterns in large text collections.

Ducat1
Download Presentation

What is Text Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to Ducat India Language | Industrial Training | Digital Marketing | Web Technology | Testing+ | Database | Networking | Mobile Application | ERP | Graphic | Big Data | Cloud Computing Apply Now Training & Placement Call now: 70-70-90-50-90 www.ducatindia.com

  2. What is Text Analysis? What is Text Analysis? Text analysis is known as text analytics. It refers to the representation, processing, and modeling of textual data to derive beneficial insights. An important element of text analysis is text mining, the process of finding relationships and interesting patterns in large text collections. Steps of Text Analysis A text analysis problem usually includes three important steps: parsing, search and retrieval, and text mining. Parsing: Parsing is the process that takes the unstructured text and imposes a structure for further analysis. The unstructured text can be a plain text file, a weblog, an Extensible Markup Language (XML) file, a HyperText Markup Language (HTML) file, or a Word document. Parsing deconstructs the provided text and renders it in a more structured way for the subsequent steps. Search and retrieval: Search, and retrieval is the identification of the documents in a corpus that contain search items such as specific words, phrases, topics, or entities like people or organizations. These search items are generally known as key terms. Search, and retrieval originated from the field of library science and is now used extensively by web search engines.

  3. What is Text Analysis? Text mining: Text mining uses the terms and indexes produced by the prior two phases to find meaningful insights pertaining to domains or problems of interest. Representing Text Tokenization is the function of separating (also called tokenizing) words from the body of the text. Raw text is modified into a set of tokens after the tokenization, where each token is generally a word. A common approach is tokenizing on spaces. For example, the tweet has shown previously: I once had a gf back in the day. Then the bPhone came out lol tokenization based on spaces would output a list of tokens. {I, once, had, a, gf, back, in, the, day., Then, the, bPhone, came, out, lol} Tokenization is a much more difficult task than one may expect. For example, should words like state-of-the-art, Wi-Fi, and San Francisco be considered one token or more? Should words like Résumé, résumé, and resume all map to the same token? Tokenization is even more difficult beyond English. In German, for example, there are many unsegmented compound nouns. In Chinese, there are no spaces between words. Japanese has several alphabets intermingled. This list can go on.

  4. What is Text Analysis? Another text normalization technique is called case folding, which reduces all letters to lowercase (or the opposite if applicable). For the previous tweet, after case folding the text would become this: i once had a gf back in the day. Then the bphone came out lol After normalizing the text by tokenization and case folding, it needs to be represented in a more structured way. A simple yet widely used approach to represent text is called bag-of-words. Read More: https://tutorials.ducatindia.com/data-science/what-is-text-analysis/

  5. Thank you Call now: 70-70-90-50-90 www.ducatindia.com

More Related