Bootstrap aggregation, or shortly said bagging, for example, is an ensemble meta-learning technique that trains many classifiers on different partitions of the training data and uses the majority vote on the predictions of all those classifiers to select the final prediction for a test pattern (see http://www.dataminingreporting.com/blog/bagging for more details).
Boosting is another committee-based ensemble method. It works with weights in both steps: learning and prediction. During the learning phase, the boosting procedure trains a learning algorithm a number of times, each time adjusting its composition to the model errors. During the prediction phase, it provides a prediction based on the majority vote for classifiers and on a weighted average for regression techniques (see http://www.dataminingreporting.com/blog/boosting for more details).
The most famous ensemble learning technique, however, is for sure the random forest or decision tree ensemble. A random forest trains a number of decision trees. Each of the decision tree is trained on a different subset of rows and/or columns, randomly selected at each iteration. The output model is then an ensemble of differently trained decision tree models. The simple majority vote applies for the final prediction. (see http://www.dataminingreporting.com/blog/decision-tree-ensemble-decision-tree-forest for more details).
However, any combination of a number of models and their predictions could be packaged as an ensemble model. KNIME can actually do that packaging for you with the Table To PMML Ensemble node!
Let’s suppose you have trained a MLP neural network, a decision tree, and a Naïve Bayes network on your data to solve your problem and let’s suppose you would like to combine their predictions using the majority vote. Well, that is actually very easy to do with KNIME.
1. You need to work with PMML models. Thus, a PMML to Cell node converts each model into a PMML-type cell in the output data table.
2. All PMML models are then concatenated together in a single data table
3. Finally, the Table to PMML Ensemble node converts the list of models into a single model, querying all of them and selecting the prediction fitting the selected output method. In this example, we used the majority vote as selection criterion for the final prediction, but other output criteria are available, such as the prediction with maximum value, the prediction for the first model of the list, or even all the predictions of the models involved.
4. A JPMML Classifier node can interpret ensemble models and produce the correct prediction following the selected output strategy.
Steps 1-3 are summarized in the metanode reported in the figure below.