There is a new node for string manipulation in KNIME 2.5: it is the "String Manipulation" node.
The good of the "String manipulation" node is that it can do almost everything we need when dealing with strings, like calculate a string length, compare two strings, change a string into only uppercase or lowercase characters, replace a substring or all occurrences of a character inside a string, capitalize the string words, find the positions of a character or substring occurrence, extract a substring from a string, and so on.
The configuration window of the “String Manipulation” node has an “Expression Editor” in the central bottom part. Here a number of string functions can be combined together to obtain the desired string transformation.
The available string functions are listed above in the “Function List” panel. The “Description” panel on the right explains the task of the selected function.
Functions can also be visualized in smaller groups, by selecting a category in the “Category List” menu over the “Function List” panel.
On the left, in the “Column List” panel, all available input data columns are displayed.
Double-clicking a column or a function automatically inserts it in the “Expression Editor”. String values have to be reported in quotation marks, for example “abc”, when introduced in the “Expression Editor”.
The “Insert Missing As Null” flag enables the production of a null string, instead of an empty data cell, when there are missing values in the input data.
The configuration window finally requires the name of the new or of the existing column, depending on whether the resulting string has to overwrite existing data.
In KNIME two different entities control the data: the data values themselves and the data domain. The data domain contains the set of unique data values. Some nodes work on the data values directly, some nodes work on the data domain only, some nodes read from the data domain and write into the data values, and some nodes cover the opposite path.
Recently I worked with the "String Replacer" node. The "String Replacer" node replaces a pattern occurring in Strings or a whole String with a new String. I had a data set with contractID values as A123, B123, C123, and so on, which I wanted to reduce all values to just 123. So, I applied a "String Replacer" node. The "String Replacer" node though acts on the data values but does not update the data domain. Subsequently I used a "Row Filter" node which correctly works on the data values. However, the list of values offered in the pattern menu originates from the data domain. I also applied a "Pivoting" node. The "Pivoting" node grouped the contractID values according to the data domain and the data values. At the end I had both the old values (A123, B123, C123,...) and the new value (123) as pivoting column.
In general, a discrepancy might occur between the data values and the data domain. If such a discrepancy is not a desired behaviour, we can synchronize the two entities, that is the data values and the data domain, with a "Domain Calculator" node. The "Domain Calculator" node calculates and overrides the data domain for the selected column based on the current column values. The "Domain Calculator" node was applied to the column contractID on the output port of the "String Replacer" node. After the "Domain Calculator" node was run, there was no more discrepancy between the data domain and the data values for column contractID.
From time to time it might be useful to insert a "Domain Calculator" node to avoid undesired unsynchronizations between the data domain and the data values.