Nltk cookbook text
Download
1 / 17

NLTK Cookbook Text - PowerPoint PPT Presentation


  • 140 Views
  • Uploaded on

NLTK Cookbook Text. CSCE 771 Natural Language Processing. Topics Smoothing again: Readings: Chapters. January 16, 2013. Python Text Processing with NLTK 2.0 Cookbook. Tokenizing Text and WordNet Basics Replacing and Correcting Words Creating Custom Corpora Part-of-Speech Tagging

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' NLTK Cookbook Text' - marged


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Nltk cookbook text
NLTK Cookbook Text

CSCE 771 Natural Language Processing

  • Topics

    • Smoothing again:

  • Readings: Chapters

January 16, 2013


Python text processing with nltk 2 0 cookbook
Python Text Processing with NLTK 2.0 Cookbook

  • Tokenizing Text and WordNet Basics

  • Replacing and Correcting Words

  • Creating Custom Corpora

  • Part-of-Speech Tagging

  • Extracting Chunks

  • Transforming Chunks and Trees

  • Text Classification

  • Distributed Processing and Handling Large Datasets

  • Parsing Specific Data


Chapter 1 tokenizing text and wordnet basics
Chapter 1. Tokenizing Text and WordNet Basics

  • In this chapter, we will cover:

  • Tokenizing text into sentences

  • Tokenizing sentences into words

  • Tokenizing sentences using regular expressions

  • Filtering stopwords in a tokenized sentence

  • Looking up synsets for a word in WordNet

  • Looking up lemmas and synonyms in WordNet

  • Calculating WordNetsynset similarity

  • Discovering word collocations


Chapter 2 replacing and correcting words
Chapter 2. Replacing and Correcting Words

  • In this chapter, we will cover: Stemming words Lemmatizing words with WordNet Translating text with Babelfish Replacing words matching regular expressions Removing repeating characters Spelling correction with Enchant Replacing synonyms Replacing negations with antonyms

  • Perkins, Jacob (2010-11-09). Python Text Processing with NLTK 2.0 Cookbook (p. 25). Packt Publishing. Kindle Edition.


Chapter 3 creating custom corpora
Chapter 3. Creating Custom Corpora

  • In this chapter, we will cover: Setting up a custom corpus Creating a word list corpus Creating a part-of-speech tagged word corpus Creating a chunked phrase corpus Creating a categorized text corpus Creating a categorized chunk corpus reader Lazy corpus loading Creating a custom corpus view Creating a MongoDB backed corpus reader Corpus editing with file locking

  • Perkins, Jacob (2010-11-09). Python Text Processing with NLTK 2.0 Cookbook (p. 45). Packt Publishing. Kindle Edition.


Chapter 4 part of speech tagging
Chapter 4. Part-of-Speech Tagging

  • Default tagging

  • Training a unigram part-of-speech tagger

  • Combining taggers with backoff tagging

  • Training and combining

  • Ngramtaggers

  • Creating a model of likely word tags

  • Tagging with regular expressions

  • Affix tagging

  • Training a Brill tagger

  • Training the TnT tagger

  • Using WordNet for tagging Tagging proper names


Chapter 5 extracting chunks
Chapter 5. Extracting Chunks

  • Chapter 5. Extracting Chunks In this chapter, we will cover: Chunking and chinking with regular expressions Merging and splitting chunks with regular expressions Expanding and removing chunks with regular expressions Partial parsing with regular expressions Training a tagger-based chunker Classification-based chunking Extracting named entities Extracting proper noun chunks Extracting location chunks Training a named entity chunker

  • Perkins, Jacob (2010-11-09). Python Text Processing with NLTK 2.0 Cookbook (p. 111). Packt Publishing. Kindle Edition.


Chapter 6 transforming chunks and trees
Chapter 6. Transforming Chunks and Trees

  • In this chapter, we will cover: Filtering insignificant words Correcting verb forms Swapping verb phrases Swapping noun cardinals Swapping infinitive phrases Singularizing plural nouns Chaining chunk transformations Converting a chunk tree to text Flattening a deep tree Creating a shallow tree Converting tree nodes

  • Perkins, Jacob (2010-11-09). Python Text Processing with NLTK 2.0 Cookbook (p. 143). Packt Publishing. Kindle Edition.


Chapter 7 text classification
Chapter 7. Text Classification

  • Chapter 7. Text Classification In this chapter, we will cover: Bag of Words feature extraction Training a naive Bayes classifier Training a decision tree classifier Training a maximum entropy classifier Measuring precision and recall of a classifier Calculating high information words Combining classifiers with voting Classifying with multiple binary classifiers

  • Perkins, Jacob (2010-11-09). Python Text Processing with NLTK 2.0 Cookbook (p. 167). Packt Publishing. Kindle Edition.


Chapter 8 distributed processing and handling large datasets
Chapter 8. Distributed Processing and Handling Large Datasets

  • In this chapter, we will cover: Distributed tagging with execnet Distributed chunking with execnet Parallel list processing with execnet Storing a frequency distribution in Redis Storing a conditional frequency distribution in Redis Storing an ordered dictionary in Redis Distributed word scoring with Redis and execnet

  • Perkins, Jacob (2010-11-09). Python Text Processing with NLTK 2.0 Cookbook (p. 201). Packt Publishing. Kindle Edition.


Chapter 9 parsing specific data
Chapter 9. Parsing Specific Data Datasets

  • Chapter 9. Parsing Specific Data In this chapter, we will cover: Parsing dates and times with Dateutil Time zone lookup and conversion Tagging temporal expressions with Timex Extracting URLs from HTML with lxml Cleaning and stripping HTML Converting HTML entities with BeautifulSoup Detecting and converting character encodings

  • Perkins, Jacob (2010-11-09). Python Text Processing with NLTK 2.0 Cookbook (p. 227). Packt Publishing. Kindle Edition.


ad