A second whitepaper has been published in the series "Usable Customer Intelligence from Social Media Data". The subtitle of this second whitepaper is "Clustering the Social Community". This whitepaper takes a further step in analyzing and positioning the users of a social media forum.
It is based on the data created in the previous whitepaper "Usable Customer Intelligence from Social Media Data: Network Analytics meets Text Mining". Here text mining and network analytics techniques were combined to extend the feature set describing each forum users.

This feature set has then been the basis for this second analysis. Here we clustered the data using the k-Means algorithm and we identified a few clusters for neutral and inactive user, one cluster for very active and enthusiastic users (your superfans), a few smaller clusters for still active and still quite enthusiastic fans, and finally some clusters with negative fans and various degrees of activities.

A few interesting conclusions emerged from this study, like how to proceed with users from the different clusters or the concept of leadership and follower with respect to the concept of general activity. 

The whitepaper was developed with the KNIME Team and, like for the previous one, the pdf file and the KNIME workflows are downloadable from the KNIME whitepaper site.
 
 
Has it ever happened to you that you imported a KNIME workflow without a problem and then spent the nxt two hours looking for the workflow input data?Well, it has happened to me many times. So, at some point, I asked if there is any trick to export a workflow WITH the input data and WITHOUT all other intermediate data. And of course there is ...

First of all, you need a reader node in your workflow, like a File Reader or a Table Reader. The node has to be already configured with a valid filename for the whole process to work.

After you have created and configured the reader node in the workflow, go to the file system and open the corresponding folder. The folder for the node should have a path like <WORKSPACE_FOLDER>/<WORKFLOW_GROUP>/<WORKFLOW>/File Reader (or Table Reader).

In the node folder create a new folder, name it "drop", and fill it with the file to be read. This triggers the creation of a flow variable containing the filepath at the next node reset.

Go back into the workflow editor of KNIME and reset the workflow.

Reopen the node for configuration, select the "Flow Variables" tab, open parameter DataURL for the File Reader node, and select the new flow variable "knime.node(drop). This flow variable contains the path to the file in the local drop folder. At this point the reader node is configured to automatically read the file in the drop folder. If you go back to tab "Settings" you will see the message "The DataURL parameter is controlled by a variable".

Now reset the workflow if you have executed it and export it with the option "Exclude data from export" disabled.
To export a KNIME workflow, right-click the workflow in the Workflow Projects panel and select "Export KNIME workflow".

If you now reimport the workflow that you have just exported, you should get the same workflow, with the drop folder in the reader node's folder, with the input file in the drop folder, and with the reader node automatically configured to read the data file in the input folder.

I found this trick extremely useful to export data and workflows at the same time without losing any configuration settings, especially if those workflows were then supposed to be used by others. 
 
 
There are many different ways in KNIME to create a flow variable:

- as a global flow variable via the "Workflow Variable Administration" window;
- as a local flow variable via:
                - the "Quickform" node
                - the "Java Edit Variable" node
                - the "TableRow to Variable" node
                - the flow variable button in the configuration window
                - the unlabelled box in the "Flow Variables" tab in the configuration window

This last way of creating a flow variable on some branch in the middle of a workflow is the least documented one.

Let's make an example. A Row Filter node is extracting all products with a given name, like "brand 1" . After that a second Row Filter node is extracting all products with a similar product name, like "brand 1 discounted".

If the name of the first product never changes, we can use hardcoded settings in the first Row Filter node. We can then transfer the product name into a flow variable, append "discounted" to it and reuse it in the configuration settings of the second Row Filter node.

In the "Flow Variables" tab of the configuration window of each node there is a box without label. This box can be used to create a new flow variable with a specific name and to fill it with the value of the configuration settings it refers to. By filling the box with the new flow variable name, you create a new flow variable with that name and with that setting value (see figure below).

The function of this unlabelled box is equivalent to the function of the "create variable" flag in the flow variable button where this is available.
Let's see another example where this feature might turn out to be useful.
One of the most common tasks for a workflow is to read a file with the "File Reader" node, make some processing on the data, and write the processed data on a slightly differently named file.

If the file name of the input file does not change, we can use a "File Reader" node with hardcoded settings in the configuration window. We could then move that filename into a flow variable, change it, and reuse it to write the final data. The filename setting from the File Reader node can be transferred into a flow variable by using the unlabeled box of the "Flow Variables" tab of its configuration window.

I hope this post helps clarifying the usage of the mysterious unlabelled box in the "Flow Variables" tab of each node.
 
 
Remember? In a previous post on the 21 of May, I posted the link to the new KDNuggets poll about the most used data analysis tool. I roughly monitored the voting for KNIME more or less every day. Below are the results starting from May 23 2012:
Most KNIME users voted till the 26 of May. A few KNIME users were still voting till the end of May. All in all I observed 259 votes for KNIME.
However, in the final results, the accepted votes for KNIME by KDNuggets were only 174 (http://www.kdnuggets.com/polls/2012/analytics-data-mining-big-data-software.html).
I cannot really explain this drop in the accepted votes, but here it is.
According to KDNuggets, the KNIME users for 2011 were 174.
The KNIME users who voted in the KDNuggets poll 2012 and that I monitored reached a peak of 259.
Considering that not all KNIME users voted in the KDNUggets poll 2012, these numbers show a very interesting growth of the adoption of the KNIME platform for data analysis.
 
 
I found this very interesting example in integrating R and KNIME for social network analysis. I hope you will enjoy it as I did.
http://estanislao.com.br/blog/social-network-analysis-in-knime-for-r-users/