ABSTRACT. Query By Committee is an approach in which disagreement amongst ensemble of hypothesis is used to select the data for labeling Query by Bagging and Query by Boosting is the practical implementation of
ABSTRACT Query By Committee is an approach in which disagreement amongst ensemble of hypothesis is used to select the data for labeling Query by Bagging and Query by Boosting is the practical implementation of This approach by using bagging and boosting methods respectively to build committees. Committee must be made up of consistent hypothesis that is very different from each other. Decorate is the recently developed method that constructs the diverse committees using artificial training data. Paper introduces Active decorate which uses decorate committees to select good training examples
Introduction An important property of the good ensemble for committee based active learning is diversity. Decorate method directly constructs diverse committees by employing specially constructed artificial training examples. Active Decorate fewer examples than decorate and also produces considerable Reductions in error and also outperforms both QbyBag and QbyBoost.
Query By Committee generalized approach Algorithm: Given: T- Set of training Examples U- Set of unlabeled training examples BaseLearn - base learning algorithm k- Number of selective sampling iterations m- size of each sample 1.Repeat k times. 2.Generate a committee of classifiers, C*=EnsembleMethod(BaseLearn,T) 3.For all xjbelongs to U, Compute utility(C*,xj),based on the current committee. 4.Select the subset of m examples that maximizes the utility 5.Label examples in S 6.Remove examples in S from U and add to T 7.Return EnsembleMethod(BaseLearn,T)
Query By Committee generalized approach(Contd..) Gibbs Algorithm is used to generate the committee of hypothesis used For sample selection. For many interesting problems it found to be computationally intractable. To solve this issue QbyBag,QbyBoost came into picture which uses bagging and Adaboost method to construct the committees for sample selection. It computes the utility of candidate examples based on the margin of the example. Margin is defined as the difference between the number of votes in the current Committee for the most popular class label to the second most popular class label. Examples with smaller margins are considered to have higher utility.
Active Decorate It uses a ensemble method decorate to create a diverse committee To evaluate the expected utility of unlabeled examples,we use margins on the Examples. Ensemble method provides class probabilities,instead of just the most likely class label Margin is now defined as the difference between highest and the second highest predicted probabilities.
Metrics used to analyze the effectiveness of the practical implementation are: • Target Error • Data utilization ratio • Error reduction rate • Target error rate is defined as the error a decorate can achieve on the dataset. • Data utilization ratio is the number of training examples required to achieve the • Target error divided by the number of examples required by decorate.