text classification day 35
Download
Skip this Video
Download Presentation
Text classification Day 35

Loading in 2 Seconds...

play fullscreen
1 / 13

Text classification Day 35 - PowerPoint PPT Presentation


  • 97 Views
  • Uploaded on

Text classification Day 35. LING 681.02 Computational Linguistics Harry Howard Tulane University. Course organization. http://www.tulane.edu/~ling/NLP/. Learning to classify text. NLPP §6. Classification. What is it? Supervision

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Text classification Day 35' - rebecca-patterson


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
text classification day 35

Text classificationDay 35

LING 681.02

Computational Linguistics

Harry Howard

Tulane University

course organization
Course organization
  • http://www.tulane.edu/~ling/NLP/

LING 681.02, Prof. Howard, Tulane University

classification
Classification
  • What is it?
  • Supervision
    • A classifier is supervised if it is built on training corpora containing the correct label for each input.
      • This usually means that the program can calculate an error when the predicted label does not match the correct label.
    • A classifier is unsupervised if it is built on training corpora that does not contain the correct label for each input.
      • There is no way to calculate an error.

LING 681.02, Prof. Howard, Tulane University

diagram of supervised classification
Diagram of supervised classification

LING 681.02, Prof. Howard, Tulane University

philosophical question
Philosophical question
  • Does supervised classification work for the majority of stuff that you learned spontaneously as a child?
  • NO, life does not come neatly labelled.

LING 681.02, Prof. Howard, Tulane University

algorithm
Algorithm
  • Divide the corpus into three sets:
    • training set
    • test set
    • development (dev-test) set
  • Choose an initial set of features that will be used to classify the corpus.
    • The part of the program that looks for the features in the corpus is called a feature extractor.
  • Train the classifier on the training set.
  • Run it on the development set.
  • Refine the feature extractor from any errors produced on the development set.
  • Run the improved classifier on the test set.

LING 681.02, Prof. Howard, Tulane University

choosing the right features
Choosing the right features
  • Use too few, and the data will be underfitted.
    • The classifier is too vague and makes too many mistakes.
  • Use too many, and the data will be overfitted.
    • The classifier is too specific and will not generalize to new examples.

LING 681.02, Prof. Howard, Tulane University

example gender id
Example: gender id
  • What would the features be?
    • A female name ends in a, e, i.
    • A male name ends in k, o, r, s, t.
  • Explain how classification would work.
  • NLTK code pp. 223-4.

LING 681.02, Prof. Howard, Tulane University

more examples
More examples
  • Classify movie reviews as positive or negative.
    • How?
  • Classify POS of words.
    • How?

LING 681.02, Prof. Howard, Tulane University

beyond the word
Beyond the word
  • Look at word\'s context.
    • As we have seen, this is crucial to POS tagging.
  • Classify IMs as to dialogue acts that they instantiate.
    • What could be some such acts?
    • statement, emotion, yes-no question
    • How?
  • Recognizing textual entailment
    • … is the task of determining whether a given piece of text T entails another text called the "hypothesis".
    • How?

LING 681.02, Prof. Howard, Tulane University

rte example
RTE example
  • T: Parviz Davudi was representing Iran at a meeting of the Shanghai Co-operation Organisation (SCO), the fledgling association that binds Russia, China and four former Soviet republics of central Asia together to fight terrorism.
  • H: China is a member of SCO.

LING 681.02, Prof. Howard, Tulane University

next time

Next time

Finish NLPP §6

Go on to NLPP §7

Extracting info from text

ad