The question is not trivial, since KNIME does not allow interactions across rows but only across columns in a data set. It seems it will be possible in KNIME 2.1 version inside the Java Snippet node. However, I am still working with the 2.0 version and I had to implement a workflow that calculated YTD values. The final solution was actually not too complicated.
Let's suppose that we have a data set with a number of values on the x-axis and all the months of the current year on the y-axis. We do receive the data in this format and we need to produce the ouput also in this format with the year months on the y-axis. There are three main steps to calculate its YTD values.
- transpose the data set so to have the months as column headers
- calculate the YTD values across each row
- transpose the data back so to have the months as Row ID and the original and new YTD values as column headers
- All rows must contain ONLY DOUBLE values. If values are mixed in a row, for example double and string, a transpose operation on that row will produce a column with unknown type with a number of consequent problems on unknown type columns (for example: Java snippet node does not work on unknown type columns). This means that all no double value columns must be dropped from the data. This is what the first node, a Column Filter, does.
- Same problem for rows with all missing values. A transpose operation will produce a column with unkonwn type for that row. Here we can proceed with a "Missing Values" node, but if we do not want to alter the data set content is enough to insert a "temp" column with constant value 0.0 at the end of the data set. The temp values will force columns with all empty values to take the type DOUBLE. This is the task of the "Rule Engine" node. (default =0.0 as a String, column_name = temp).
- String to Number converts the values 0.0 in temp column from String into Double.
- We can finally perform the Transpose operation. Now all months are the column headers and all our original values are the RowID.
- Now we have the data in the desired form where an YTD operation across rows would be allowed. Let's remove all DOUBLE missing values in order to avoid undesired states of the following Java snippet nodes.
- We need now to filter out the row "temp". it is not necessary, but just to make the YTD operation cleaner. Column "temp" was created with column header "temp". This means that we have now a row with RowID "temp". Since there is no other way to identify this extraneous row, we need to import all Row IDs into a real column (let's call it "items") and from there filter the row where the value of "items " is "temp". This is the task of the RowID node and the Row Filter node.
- Finally, the Column Filter node removes this artificially created column "items".
Here we have 12 Java Snippet nodes, each one calculating the YTD values for each month of the year, as for example for the month of April:
double sum = 0.0;
sum += ($Jan$+$Feb$+$Mar$+$Apr$);
The new value is then written in column Apr_YTD and appended at the end of the data set.
Now we need to trasnpose the whole data set back, to have the months on the y-axis and the values (original + YTD values this time) on the X_axis.
- The first Column Filter gets rid of the original values. Keeps only the newly calculated YTD values.
- We perform a rename of the columns "<month>_YTD" into just the name of the "<month>", that is column "Apr_YTD" will be renamed just "Apr".
- We perform the transpose back operation
- We rename all values that are now the column headers from "<column_header>" to "<column_header>_YTD".
- The node RowID creates a copy column of the RowID with the name of the month.
- The rule engine is writing for each row the current year from a workflow variable.
- And finally the Row Filter selects the month of which YTD value we are interested in.