I256 applied natural language processing
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

I256: Applied Natural Language Processing PowerPoint PPT Presentation


  • 50 Views
  • Uploaded on
  • Presentation posted in: General

I256: Applied Natural Language Processing. Marti Hearst Sept 27, 2006. Evaluation Measures. Evaluation Measures. Precision: Proportion of those you labeled X that the gold standard thinks really is X #correctly labeled by alg/ all labels assigned by alg

Download Presentation

I256: Applied Natural Language Processing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


I256 applied natural language processing

I256: Applied Natural Language Processing

Marti Hearst

Sept 27, 2006


Evaluation measures

Evaluation Measures


Evaluation measures1

Evaluation Measures

  • Precision:

    • Proportion of those you labeled X that the gold standard thinks really is X

    • #correctly labeled by alg/ all labels assigned by alg

    • #True Positive / (#True Positive + #False Positive)

  • Recall:

    • Proportion of those items that are labeled X in the gold standard that you actually label X

    • #correctly labeled by alg / all possible correct labels

    • #True Positive / (#True Positive + # False Negative)


F measure

F-measure

  • Can “cheat” with precision scores by labeling (almost) nothing with X.

  • Can “cheat” on recall by labeling everything with X.

  • The better you do on precision, the worse on recall, and vice versa

  • The F-measure is a balance between the two.

    • 2*precision*recall / (recall+precision)


Evaluation measures2

Evaluation Measures

  • Accuracy:

    • Proportion that you got right

    • (#True Positive + #True Negative) / N

      N = TP + TN + FP + FN

  • Error:

    • (#False Positive + #False Negative)/N


Prec recall vs accuracy error

Prec/Recall vs. Accuracy/Error

  • When to use Precision/Recall?

    • Useful when there are only a few positives and many many negatives

    • Also good for ranked ordering

      • Search results ranking

  • When to use Accuracy/Error

    • When every item has to be judged, and it’s important that every item be correct.

    • Error is better when the differences between algorithms are very small; let’s you focus on small improvements.

      • Speech recognition


Evaluating partial parsing

Evaluating Partial Parsing

  • How do we evaluate it?


Evaluating partial parsing1

Evaluating Partial Parsing


Testing our simple fule

Testing our Simple Fule

  • Let’s see where we missed:


Update rules evaluate again

Update rules; Evaluate Again


Evaluate on more examples

Evaluate on More Examples


Incorrect vs missed

Incorrect vs. Missed

  • Add code to print out which were incorrect


Missed vs incorrect

Missed vs. Incorrect


What is a good chunking baseline

What is a good Chunking Baseline?


The tree data structure

The Tree Data Structure


Baseline code continued

Baseline Code (continued)


Evaluating the baseline

Evaluating the Baseline


Cascaded chunking

Cascaded Chunking


Next time

Next Time

  • Summarization


  • Login