100 likes | 271 Views
Taking the Kitchen Sink Seriously: An Ensemble Approach to Word Sense Disambiguation from Christopher Manning et al. Overview. 23 student WSD projects combined in a 2-layer voting scheme (an ensemble of ensemble classifiers).
E N D
Taking the Kitchen Sink Seriously:An Ensemble Approach to Word Sense Disambiguation fromChristopher Manning et al.
Overview • 23 student WSD projects combined in a 2-layer voting scheme (an ensemble of ensemble classifiers). • Performed well on SENSEVAL-2: 4th place out of 21 supervised systems on the English Lexical Sample task. • Offers some valuable lessons for both WSD and ensemble methods in general.
System Overview • 23 different "1st order" classifiers. • Independently developed WSD systems. • Use a variety of algorithms (naïve bayes, n-gram, etc.). • These 1st order classifiers combined into a variety of 2nd order classifiers/voting mechanisms. • 2nd order classifiers vary with respect to: • Algorithms used to combine 1st order classifiers. • Number of voters. Each takes the top k 1st order, where k is one of {1,3,5,7,9,11,13,15} .
Voting Algorithms • Majority vote (each vote has weight 1). • Weighted voting, with weights determined by EM. • Tries to choose weights that maximize the likelihood of 2nd order training instances, where the probability of a sense (given the votes) is defined as the sum of weighted votes for that sense. • Maximum entropy using features derived from the votes of the 1st order classifiers.
Classifier Construction Process • For each word: • Train each 1st order on ¾ of training data • Use remaining ¼ of data to rank performance of 1st orders • For each 2nd order classifier: • Take the top k 1st orders for this word • Train the 2nd order on ¾ of training data using this ensemble • Rank performance of 2nd orders with ¼ of training data • Take the top 2nd order as the classifier for this word. Retrain on all the training data.
Results • 61.7% accuracy in SENSEVAL-2 competition (4th place). • After competition, improved performance: • Used global performance (i.e., over all words) as a tie breaker for rankings of both 1st and 2nd order . • Improved accuracy to 63.9% (would have been 2nd).
Results for 2nd Order Classifiers • Results are averaged over all words. • Note MaxEnt's ability to resist dilution.
Evaluating Effects of Combination • We want different classifiers to make different mistakes. • We can measure this differentiation as the average (over all pairs of 1st order classifiers) of the fraction of errors that are shared (error independence). • When error independence and word difficulty grow, the advantage of combination grows.
Lessons for WSD • Every word is a separate problem. • All 1st and 2nd order classifiers had some words on which they did the best. • Implementation details: • Large or small window sizes work better than medium window sizes. • This suggests that senses are determined on both a very local, collocational level and a very general, topical level. • Smoothing is very important.
Lessons for Ensemble Methods • Variety within the ensemble is desirable. • Qualitatively different approaches are better than minor perturbations in similar approaches. • We can measure the extent to which this ideal is achieved. • Variety in combination algorithms helps as well. • In particular, it can help with overfitting (because different algorithms will start overtraining at different points).