|DMR - Data Mining and Reporting||
KDNuggets has started the 2012 poll about the most used data analytics tools.
I recently discovered how to extract cluster information from a k-Means model in KNIME.
The k-Means model is output by a k-Means node as a PMML model. So, first of all we convert the PMML model into an XML table cell with the "PMML To Cell" node. After that we start with the XML parsing of the cell content.
If you write the XML content of the cell into a file with the "XML Writer" node, you can see that the root is <PMML>, followed by <ClusteringModel>, and then by <Cluster>. A <Cluster> contains all info for a particular cluster of the k-Means model.
The "Xpath" node implements this XPath query "/dns:PMML/dns:ClusteringModel/dns:Cluster") and extracts all <Cluster> into one cell.
The "Ungroup" node assigns each <Cluster> to one data row.
And finally a number of "XPath" nodes extracts further information about each cluster. For example, an "XPath" node with XPath query "/dns:Cluster/@name" extracts the cluster name or an "XPath" node with XPath query "/dns:Cluster/dns:Array/text()" extracts the text content of <Cluster>, that is the prototype values of the used attributes.
The attribute values are all concatenated together in a long string. We need to use a "Cell Splitter" node to get the single values.
Here under there is the sub-workflow used to extract the information about the k-Means prototype from a PMML model produced by a k-Means node.
Rosaria Silipo has been mining data since her master degree and kept mining data throughout all her doctoral program, her postdoc, and most of her following job positions.