The amount of data is getting bigger at every project, so much that sometimes I have to wait for the workflow to fully execute. In my last project, for example, I had to deal with 176 millions rows!!!!
Now it is understandable that I want to use a smaller data set while building my workflow and the real data set just in the last full workflow execution. I need then to implement a switch, controlled by a flow variable, that allows the workflow to run on all data or alternatively only on a subset of data depending on the flow variable value.
The "String Radio Buttons" Quickform
To create a two-value flow variable is easy by using the "String Radio Buttons" Quickform node. In this case the "String Radio Buttons" Quickform creates a flow variable named "port" with two possible values: "partial" or "full". If "partial" is selected, the workflow runs on a subset of data, if "full" is selected the workflow runs on the full set of data. Here below is the configuration window of the "String Radio Buttons" Quickform node.
The data consists of 6 files. Reading the full data set means to loop across all 6 files and collect the read data. Reading a subset of the data means reading only one of the files.
The input then comes from a "List Files" node. The "List Files" node produces the list of files available in a selected location on your machine (see picture below).
The selected location is inserted in the configuration window of the "List Files" node and the node produces a data table where each row contains the file path to one of the files in the selected location.
How to connect a File Reader to a switch node?
The other branch just reads one of the possible file. The problem here is: how do I force a "File Reader" node to follow the enabling/disabling of the switch node?
The switch node outputs a data table and all connected nodes will be disabled/enabled depending on the switch node status. However, the "File Reader" node does not take any input, so it cannot be connected to the switch node.
The solution to this problem is to make the connection via a Flow Variable. The data table with all paths, output by the switch node, is then converted into a flow variable with a "TableRow To Variable" node. The output flow variable, containing only the first path of the switch output data table, is passed to the "File Reader" node and used to set the file path. In this way, if the switch output port is disabled, the whole branch, including the "File Reader" node, is disabled.
Finishing the workflow
The switch block is then closed by and "END IF" node to collect the resulting data.
A "Java Edit variable" node is used to transform the initial selection between "full" and "partial" and to control the switch node output ports.
The final workflow is shown in the figure below.
If we choose "full", all data are read by looping on all available files.
If we choose "partial", only the content of the first file is read and imported into KNIME.