40 likes | 122 Views
This project evaluates the efficiency of various classification algorithms using Naïve Bayes and Name Classifier methods for educational institution targets. Improvements in accuracy are proposed, along with a combination algorithm for enhanced matching.
E N D
Jeff Roth Project 3
Basic Results Course - Target = Reed Training = Rice, uwm, Washington Source = wsu Naïve Bayes: 7 / 12 correct, 6 / 16 FP Name Classifier: 12 / 15 correct, 0 / 19 FP Course - Target = Rice Training = Reed, uwm, Washington Source = wsu Naïve Bayes: 7 / 10 * correct, 5 / 16 FP Name Classifier: 12 / 13 correct, 0 / 19 FP Faculty - Target = Berkley Training = Cornell, Texas, Washington Source = Michigan Naïve Bayes: 6 / 10 correct, 3 / 10 FP Name Classifier: 14 / 14 correct, 0 / 14 FP Faculty - Target = Cornell Training = Berkley, Texas, Washington Source = Michigan Naïve Bayes: 5 / 10 correct, 3 / 10 FP Name Classifier: 14 / 14 correct, 0 / 14 FP
“Improved” Naïve Bayes Course - Target = Reed Training = Rice, uwm, Washington Source = wsu Naïve Bayes: 7 / 12 correct, 7 / 16 FP Course - Target = Rice Training = Reed, uwm, Washington Source = wsu Naïve Bayes: 7 / 10 * correct, 5 / 16 FP Faculty - Target = Berkley Training = Cornell, Texas, Washington Source = Michigan Naïve Bayes: 6 / 10 correct, 3 / 10 FP Faculty - Target = Cornell Training = Berkley, Texas, Washington Source = Michigan Naïve Bayes: 5 / 10 correct, 3 / 10 FP Improvements: 1. Classification = argmax (Log(P(vj) + Σ log(P(ai | vj))) - included in basic 2. If a word in classification doc has no match, classification = 1 / (2 * |vocabulary|) - no help 3. Divide by number of words in test doc and find global max - scratched
Combination Course - Target = Reed Training = Rice, uwm, Washington Source = wsu Name Classifier: 13 / 15 correct, 0 / 19 FP Course - Target = Rice Training = Reed, uwm, Washington Source = wsu Name Classifier: 12 / 13 correct, 0 / 19 FP Faculty - Target = Berkley Training = Cornell, Texas, Washington Source = Michigan Name Classifier: 14 / 14 correct, 0 / 14 FP Faculty - Target = Cornell Training = Berkley, Texas, Washington Source = Michigan Name Classifier: 14 / 14 correct, 0 / 14 FP Combination algorithm: 1. Match source to target if both Naïve Bayes and name matcher agreed 2. Match remaining unmatched target elements to source by name matcher 3. Match any remaining unmatched target elements to source by Naïve Bayes