1 / 16

I256: Applied Natural Language Processing

I256: Applied Natural Language Processing. Marti Hearst Aug 30, 2006. Today. Introductions Python Basics. Introduction to NLTK. The Natural Language Toolkit (NLTK) provides: Basic classes for representing data relevant to natural language processing.

Download Presentation

I256: Applied Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. I256: Applied Natural Language Processing Marti Hearst Aug 30, 2006

  2. Today • Introductions • Python Basics

  3. Introduction to NLTK • The Natural Language Toolkit (NLTK) provides: • Basic classes for representing data relevant to natural language processing. • Standard interfaces for performing tasks, such as tokenization, tagging, and parsing. • Standard implementations of each task, which can be combined to solve complex problems. • Pre-parsed corpora and tools to access them. Slide by Diane Litman

  4. NLTK: Example Modules • nltk_lite.tokenize: processing individual elements of text, such as words or sentences. • nltk_lite.probability: modeling frequency distributions and probabilistic systems. • nltk_lite.tag: tagging tokens with supplemental information, such as parts of speech or wordnet sense tags. • nltk_lite.parser: high-level interface for parsing texts. Slide by Diane Litman

  5. Python and Natural Language Processing • Python is a great language for NLP: • Simple (and fun!) • Powerful string manipulation • Easy to debug: • Interpreted language • Easy to test small steps incrementally • Exceptions • Easy to structure • Modules • Object oriented programming Slide by Diane Litman

  6. An Interpreted Language • The interpreter processes what you’ve typed as soon as you hit <return>: >>> 3 * 4 12 >>> • Python is sensitive to leading whitespace • If you put in extra spaces, or too few, it will complain. • If you type a multi-line command, you must do the indenting; the interpreter helps you with this: >>> if 4 > 3: print "duh” duh >>>

  7. Some Python Basics • Strings

  8. Some Python Basics • Lists

  9. Some Python Basics • Iteration over Lists

  10. Modules and Packages Python modules “package program code and data for reuse.” (Lutz) Similar to library in C, package in Java. Python packages are hierarchical modules (i.e., modules that contain other modules). Three commands for accessing modules: import from…import reload Slide by Diane Litman

  11. Modules and Packages: import • The importcommand loads a module: # Load the regular expression module >>> import re • To access the contents of a module, use dotted names: # Use the search method from the re module >>> re.search(‘\w+’, str) • To list the contents of a module, use dir: >>> dir(re) [‘DOTALL’, ‘I’, ‘IGNORECASE’,…] Slide by Diane Litman

  12. Modules and Packagesfrom…import • The from…import command loads individual functions and objects from a module: # Load the search function from the re module >>> from re import search • Once an individual function or object is loaded with from…import,it can be used directly: # Use the search method from the re module >>> search (‘\w+’, str) Slide by Diane Litman

  13. Import Keeps module functions separate from user functions. Requires the use of dotted names. Works with reload. from…import Puts module functions and user functions together. More convenient names. Does not work with reload. Import vs. from…import Slide by Diane Litman

  14. Modules and Packages: reload • If you edit a module, you must use the reload command before the changes become visible in Python: >>> import mymodule ... >>> reload (mymodule) • The reload command only affects modules that have been loaded with import; it does not update individual functions and objects loaded with from...import. Slide by Diane Litman

  15. Configuring the Python IDE • Called IDLE • You can set key bindings • Go to Options > Configure IDLE • Select Keys tab • Select an action and specify an alternative binding • Click Save as New Custom Key Set • Give it a name • Click Apply so it takes hold • If you want to use an existing binding (say, Control-A) • First find the command that has that binding • Change it to something else • Click Apply • Now choose your command and change it’s binding ot Control-A

  16. For Next Week • Monday: holiday, no class • Sign up for the email list! • Mail to: majordomo@sims.berkeley.edu • Put in msg body: subscribe anlp • For Wed Sept 6 • Finish the programming tutorial • Do the regular expression tutorial. • We’ll go through regex’s some in class.

More Related