Online Active Learning with Imbalanced Classes

Zahra Ferdowsi Online Active Learning with Imbalanced Classes October 15th 2013 Accenture Technology Labs DePaul University

Do we always have enough labeled data to train the classifier?

Active Learning Scenario • Large number of unlabeled examples • The interactive nature (experts in the process) • Limited labeling resources • High labeling costs

Healthcare example: motivation of this study • Inefficiencies in the healthcare insurance process result in large monetary losses affecting corporations and consumers • $91 billion over-spent in US every year on Health Administration and Insurance (McKinsey study’ Nov 2008) • 131 percent increase in insurance premiums over past 10 years

Health Insurance Claim Process

Healthcare example • Claim payment errors drive a significant portion of these inefficiencies • Increased administrative costs and service issues of health plans • Overpayment of Claims - direct loss • Underpayment of Claims – loss in interest payment for insurer, loss in revenue for provider

Random Samples Early Rework Detection: How its done before 1. Random Audits for Quality Control Claims Database Manual Audits Auditors Extremely Low Hit Rates Long audit times due to fully manual audits

Generate expert hypotheses Early Rework Detection: How its done before 2. Hypothesis and Rule Based Audits Database Queries Claims Database Hypothesis-basedaudits Auditors Better hit rates but still lot of manual effort in discovering, building, updating, executing, and maintaining the hypotheses

Data • Duration: 2 years • Number of claims: 3.5 million • Labeled claims: 121k (49k rework) • Number of features: 16k

Features • Member information • Provider information • Claim header • Contract information, total amount billed, diagnosis code, date of service • Claim line details • amount billed per service, procedure code, counter for the procedure (quantity)

Predictive Modeling • Domain characteristics • High dimensional data • Sparse data • Fast training, updating and scoring required • Ability to generate explanation for domain experts • Classifier: Linear SVMs • Distance from margin is used as the ranking score

Well-known Instance Selection Strategies (ISS) • Uncertainty • Distance to the hyper plain (Shen et. al, 2004) • Entropy (Settles, 2008) • Clustering • Density (similarity cosine) • Average similarity to all other cases (Shen et. al, 2004) • Hierarchical (Dasgupta, 2008) • k-means using Cosine similarity (Zhu et. Al, 2001)

Well-known ISS (con.) • Hybrid approach: Density*Uncertainty (Zhu et. al, 2008; Settles et. al, 2008) • Query-by-Committee • measuring the level of disagreement of a few classifiers (Melville and Mooney, 2004)

Select n instances randomly from the pool set Remove selected instances from the pool set Add these instances with label to the training set Select n instances from the pool set using an instance selection strategy Train the classifier on the training set Use the classifier to measure precision @ k% on testing set Is the pool set exhausted? No Yes End Experimental Setup • 5-fold cross-validation • Evaluation metric: precision at top 1%, 2%, and 5%. • Numbers of instances labeled in each iteration = 100 • SVM as the base classifier using LibSVM

How do existing ISS perform? Claims data set

Experiments on more datasets • KDD cup 1999 dataset for network intrusion detection. I use the ”probing” intrusion as label. • HIVA is a chemoinformatics dataset was used to predict which compounds are active against AIDS HIV infection. • ZEBRAis an embryology dataset provides a feature representation of cells of zebrafish embryo to determine whether they are in division (meiosis) or not.

How do existing ISS perform? ZIBRA data set

Do existing ISS work? • No ISS is consistently the best in all domains and at all precision levels • Creating a validation set is challenging in since labeled data are scarce and expensive to obtain. Proposing an unsupervised score that can predict the performance of an ISS without using any additional labeled examples.

Proposed Unsupervised Scores • MS on Unlabeled set (MSU) : mean score of the top k% instances in the unlabeled set • MS on Labeled set (MSL) : mean score of the top k% instances in the labeled set from the previous iteration • MS on All (MSA) : mean score of the top k% instances in the combined (unlabeled set and the labeled set from the last iteration) set.

Do the new unsupervised scores work? • The graphs show high correlation between the score and precision. Certainty on Claims data set

Do they work? • The correlation values are promising

Can we use the unsupervised score to predict the best ISS in each iteration? • The online algorithm has two component: • The unsupervised score (MSU) that can track the performance of individual ISS without using any validation set. • a simple online algorithm that uses MSU to switch between different strategies. • The existing unsupervised score: • CEM (Classification Entropy Maximization) as score • Algorithm for switch between ISS (multi-armed bandit)

Online Active Learning

How does the online algorithm work? HIVA data set

Conclusion • Proposing an online algorithm for active learning that switches between different candidate ISS for classification in imbalanced data sets. • This online algorithm has two components: • a score, MSU, that can track the performance of individual ISS without using any validation set • a simple online algorithm that uses change in MSU to switch between different strategies. • The online approach works better than (or at least similar to) the best individual ISS and achieves 80% - 100% of the highest possible precision.

Questions

References [1] Active learning challenge. [2] Kdd cup 1999. [3] J. Attenberg and F. Provost. Inactive learning?: difficulties employing active learning in practice. SIGKDD Exploration Newsletter, 12, March 2011. [4] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire. The nonstochasticmultiarmed bandit problem. SIAM J. Comput., 32(1):48– 77, 2002. [5] Y. Baram, R. El-Yaniv, K. Luz, and M. Warmuth. Online choice of active learning algorithms. Journal of Machine Learning, 2004. [6] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. [7] P. Donmez and J. G. Carbonell. Active sampling for rank learning via optimizing the area under the roc curve. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR ’09, pages 78–89, Berlin, Heidelberg, 2009. Springer. [8] P. Donmez, J. G. Carbonell, and P. N. Bennett. Dual strategy active learning. In ECML, 2007.

References [9] J. He and J. Carbonell. Nearest-neighbor-based active learning for rare category detection. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, MIT Press, Cambridge, MA, 2008. [10] M. Kumar, R. Ghani, and Z.-S. Mei. Data mining to predict and prevent errors in health insurance claims processing. In KDD 2010, KDD ’10, New York, USA, 2010. [11] A. McCallum and K. Nigam. Employing em in pool-based active learning for text classification. In In Proceedings of the International Conference on Machine Learning (ICML), pages 359–367. Morgan Kaufmann, 1998. [12] H. T. Nguyen and A. Smeulders. Active learning using pre-clustering. ICML, 2004. [13] B. Settles. Active learning literature survey. Computer Sciences Technical Report 1648, University of Wisconsin–Madison, 2009. [14] B. Settles and M. Craven. An analysis of active learning strategies for sequence labeling tasks. In EMNLP, 2008. [15] S. Tong and D. K. Nguyen. Support vector machine active learning with applications to text classification. In In Proceedings of the International Conference on Machine Learning (ICML), pages 999–1006. Morgan Kaufmann, 2000.

Online Active Learning with Imbalanced Classes

Online Active Learning with Imbalanced Classes

Presentation Transcript

Inductive Learning from Imbalanced Data Sets

Active Learning for Online Courses

Imbalanced Data Set Learning with Synthetic Examples

imbalanced data

Active Learning for Imbalanced Sentiment Classification

Active Learning in Large Classes

Learning with Online Constraints: Shifting Concepts and Active Learning Claire Monteleoni

ACTIVE LEARNING “WITH” TECHNOLOGY

Active and Interactive Learning in Large Classes

Enhancing Lecture Classes with Active and Cooperative Learning

Enhancing STEM Classes with Active and Cooperative Learning

Enhancing Large Classes with Active and Cooperative Learning

Inductive Learning from Imbalanced Data Sets

Enhancing Engineering Classes with Active and Cooperative Learning

Using Active and Cooperative Learning in Large Classes

Active Learning with Mobile Devices

Online language learning classes

Free Online Courses with Certificates | Online Classes | Learning Crux

Choose Active Learning with SAT Prep Classes in Dallas

Active and Cooperative Learning in Large Classes

Active Learning Strategies for Large Geoscience Classes

Learning with Online Constraints: Shifting Concepts and Active Learning Claire Monteleoni