Boosting is another committee-based approach. It works with weights in both steps: learning and prediction.
During the learning phase, the boosting procedure trains a learning algorithm a number of times, each time using a slightly different composition of the training set.
At each iteration, the boosting algorithm:
- Starts with the training set built in the previous iteration,
- Trains a new model,
- Evaluates the model error on the training patterns,
- Calculates the model weight based on such error,
- Finally, builds a new training set by over-sampling /under-sampling the incorrectly/correctly classified training patterns. The over-sampling/under-sampling factor derives from the model weight.
The training set for the first iteration is the training set provided for the whole learning procedure.
The algorithm stops when a maximum number of iterations has been reached or the model error is too big (that is the weight is too close to 0 and therefore the corresponding model is ineffective).
The output of this learning phase is a number of models, lower or equal to the selected number of maximum iterations.
Notice that boosting can be applied to any training algorithm. However, it is particularly helpful in the case of weak classifiers. In fact, boosting techniques are quite sensitive to noise and outliers, that is to overfitting.
The prediction phase loops on all models, available from the learning phase, and provides a prediction based on the majority vote for classifiers and on a weighted average for regression techniques.
KNIME implements Adaboost, one of the most commonly used boosting algorithms, with two meta-nodes in the “Mining-> Ensemble Learning” category: the “Boosting Learner” and the “Boosting Predictor” meta-node.
The “Boosting Learner” meta-node (see figure below) implements the learning loop via the “Boosting Learner Loop Start” node and the “Boosting Learner Loop End” node.
- Identify the mis-classified patterns
- Calculate the model error
- Calculate the model weight
The “Boosting Learner Loop Start” node uses the model weight and the mis-classified patterns to alter the composition of the training set.
For each iteration the boosting loop outputs the model, its error, and its weight.
The “Boosting Predictor” meta-node receives the model list from the learner node and the test set patterns. For each test pattern, it loops on all models and weighs their prediction result.
The “Boosting Predictor Loop Start” node starts the boosting predictor loop by identifying the weight column and the model column (see settings in its configuration window).
The “Boosting Predictor Loop End” node implements the majority vote on all model results and assigns the final value to the test pattern. Its configuration window requires the identification of the prediction column.
The loop body just includes the predictor of the mining model selected for the learning phase.
Below you can see our implementation of boosting in KNIME, with a decision tree on the cars-85.csv data set. The task here was to predict a car fuel system based on its number of doors, wheel base, and width, by using maximum 5 models. Boosting produced 5 models (the maximum number allowed) and 0.714 accuracy.