Natural Language Processing And Computational Linguistics. Using TF-IDF Anomalies to Cluster Documents on Subject Matter. An Analysis using Word, Simple Noun Phrase, and Complex Noun Phrase Frequencies. Whitney St.Charles Research Alliance in Math and Science 2007 Mentors:
And Computational Linguistics
An Analysis using Word, Simple Noun Phrase, and Complex Noun Phrase Frequencies
Research Alliance in Math and Science 2007
Yu (Cathy) Jiao, Ph.D.
Robert Patton, Ph.D.
Computational Sciences and Engineering Division
Example: The Brown Corpus http://www.edict.com.hk/concordance/WWWConcappE.htm
tried/vbdto/to keep/vb everyone/n awake/adj.
nounStatic rule-based extraction
“I saw the man with the telescope.”
An ‘error-driven’ approach for learning an ordered set of rules
1. Generate all rules that correct at least one error.
2. For each rule:
(a) Apply to a copy of the most recent state of the training set.
(b) Score result
3. Select rule with best score.
4. Update training set by applying selected rule.
5. Stop if score is smaller than some pre-set threshold T; otherwise repeat from step 1.
number of responses
number correct in key
With 82% fewer comparisons!