Cross-tab tables group data rows by the distinct values of 2 or more columns and show the aggregated values of one additional column. The total values of each row and column as well as the overall total are usually shown. Here is an example, where the data rows occurrences are reported grouped by age and sex for a hypothetical data set. In a pivot logic, “sex” is the pivot column and “age” is the group column.
The new “Pivoting” node in KNIME 2.4
KNIME 2.4 has a completely new “Pivoting” node that allows building a cross-tab table without any effort.
The new “Pivoting” node takes one input (= the data set) and produces 3 output data tables:
- The pivot table itself, which is the same table produced by the old “Pivoting” node in the KNIME versions prior to 2.4
- The group totals table, that is the row totals
- The pivot totals table, that is the column totals
Let’s take the adult data set from the UCI Repository (http://archive.ics.uci.edu/ml/datasets/Adult) and let’s build a crosstab table using the “workclass” as group column and the “income” as pivot column and let’s count the data rows as aggregation variable.
After a “File Reader” node, we introduce a “Pivoting” node.
The configuration window
The configuration window of the “Pivoting” node consists of 3 tabs.
- Tab “Groups” contains the selection framework for the group column(s)
- Tab “Pivots” contains the selection framework for pivots column(s) plus a few crosstab options, like “Ignore missing values”, “Append overall totals”, and “Ignore domain”. “Append overall totals”, in particular, calculates the sum of the row/column totals.
- Tab “Options” selects the aggregation column(s) and the aggregation method for each aggregation column. It also contains a listbox to define the name of the resulting column(s).
Note. Multiple group and pivot columns are now possible. In this example, we build a cross tab only across “workclass” and “Income”. However, more sub-grouping and sub-pivoting columns can be added for a more complex cross-tab table.
- “workclass” in the “Groups” tab
- “Income” in the “Pivots” tab. As Andreas pointed out in one of the comments (thanks!) we also enabled the “Append overall totals” flag to have the overall totals in the "Pivots Totals" table.
- “age” in the “Options” tab with aggregation method = “Count”. Since we use “Count” any other column would work as aggregation column.
The crosstab table
These are the three output data tables generated by the “Pivoting” node after configuration and execution.
In order to create this cross-tab structure, we join the “Pivot” table and the “Group Totals” table on the “workclass” values. Then we concatenate the resulting table with the “Pivot Totals” table.
The final workflow and the final table
This is the final workflow.
Try it and let me know how it works for you!