1 / 18

NLTK & Python Day 4

NLTK & Python Day 4. LING 681.02 Computational Linguistics Harry Howard Tulane University. Course organization. I have requested that Python and NLTK be installed on the computers in this room. NLPP. §1 Language processing & Python §1.1 Computing with language. Loading the book's texts.

brita
Download Presentation

NLTK & Python Day 4

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. NLTK & PythonDay 4 LING 681.02 Computational Linguistics Harry Howard Tulane University

  2. Course organization • I have requested that Python and NLTK be installed on the computers in this room. LING 681.02, Prof. Howard, Tulane University

  3. NLPP §1 Language processing & Python §1.1 Computing with language

  4. Loading the book's texts >>> from nltk.book import * *** Introductory Examples for the NLTK Book *** Loading text1, ..., text9 and sent1, ..., sent9 Type the name of the text or sentence to view it. Type: 'texts()' or 'sents()' to list the materials. text1: Moby Dick by Herman Melville 1851 text2: Sense and Sensibility by Jane Austen 1811 text3: The Book of Genesis text4: Inaugural Address Corpus text5: Chat Corpus text6: Monty Python and the Holy Grail text7: Wall Street Journal text8: Personals Corpus text9: The Man Who Was Thursday by G . K . Chesterton 1908 >>> LING 681.02, Prof. Howard, Tulane University

  5. Searching text • Show every token of a word in context, called concordance view. • text1.concordance("monstrous") • Show the words that appear in a similar range of contexts. • text1.similar("monstrous") • Show the contexts that two words share. • text1.common_contexts("monstrous") LING 681.02, Prof. Howard, Tulane University

  6. Searching text, cont. • Plot how far each token of a word is from the beginning of a text. • text1.dispersion_plot(["monstrous"]) • Needs NumPy & Matplotlib, though it didn't work for me. • Generate random text. • text1.generate() LING 681.02, Prof. Howard, Tulane University

  7. Counting vocabulary • Count the word and punctuation tokens in a text: • len(text1) • List the distinct words, i.e. the word types, in a text: • set(text1) • Count how many types there are in a text: • len(set(text1)) • Count the tokens of a word type: • text1.count("smote") LING 681.02, Prof. Howard, Tulane University

  8. Lexical richness or diversity • The lexical richness or diversity of a text can be estimated as tokens per type: • len(text1) / len(set(text1) • The frequency of a type can be estimated as tokens per all tokens: • 100 * text1.count('a') / len(text1) • This is integer division, however. • p. 8 "_future_" is some kind of error LING 681.02, Prof. Howard, Tulane University

  9. Making your own function in Python • To save you from typing the same thing over and over, you can define your own function: >>> deflexical_diversity(text): ... returnlen(text1) / len(set(text1) • You call this function just by typing it and filling in the argument, a text name, in the parenthesis: >>> lexical_diversity(text1) LING 681.02, Prof. Howard, Tulane University

  10. Other functions • Sort the word types in a text alphabetically: • sorted(set(text1)) LING 681.02, Prof. Howard, Tulane University

  11. Exercises 1.8.… • 4. … How many words are there in text2? How many distinct words are there? • 5. Compare the lexical diversity scores for humor and romance fiction in Table 1.1. Which genre is more lexically diverse? • 8. Consider the following Python expression: len(set(text4)). State the purpose of this expression. Describe the two steps involved in performing this computation. LING 681.02, Prof. Howard, Tulane University

  12. NLPP §1.2 A Closer Look at Python: Texts as Lists of Words

  13. The representation of a text • We will think of a text as nothing more than a sequence of words and punctuation. • The opening sentence of Moby Dick: >>> sent1 = ['Call', 'me', 'Ishmael', '.'] • The bracketed material is known as a list in Python. • We can inspect it by typing the name. • How would you find out how many words it has? LING 681.02, Prof. Howard, Tulane University

  14. List construction • Append one list to the end of another with '+', known as concatenation: >>> ['Monty', 'Python'] + ['and', 'the', 'Holy', 'Grail'] ['Monty', 'Python', 'and', 'the', 'Holy', 'Grail'] >>> sent4 + sent1 ['Fellow', '-', 'Citizens', 'of', 'the', 'Senate', 'and', 'of', 'the','House', 'of', 'Representatives', ':', 'Call', 'me', 'Ishmael', '.'] • Append a single item to a list • >>> sent1.append("Some") • sent1 ['Call', 'me', 'Ishmael', '.', 'Some'] LING 681.02, Prof. Howard, Tulane University

  15. List indexing • Each element in a list is numbered in sequence, a number known as the element's index. • Show the item that occurs at an index such as 173 in a text: >>> text4[173] 'awaken' • Show the index of an element's first occurrence: >>>text4.index('awaken') 173 • Show the elements between two indices (slicing): >>> text5[16715:16735] >>> text5[16715:] >>> text5[:16735] • Assign an element to an index: >>> text[0] = 'First' LING 681.02, Prof. Howard, Tulane University

  16. Python counts from 0 • Create a list: >>> sent = ['word1', 'word2', 'word3', 'word4', 'word5', ... 'word6', 'word7', 'word8', 'word9', 'word10'] • Find the first word: >>> sent[0] 'word1' Find the last word: >>> sent[9] 'word10' • What does sent[10] do? • It produces a runtime error. LING 681.02, Prof. Howard, Tulane University

  17. List exercises LING 681.02, Prof. Howard, Tulane University

  18. Next time NLPP: finish §1 and do all exercises; do up to Ex 8 in §2

More Related