I have worked on many different data analysis projects in the past years. Sometimes I was an expert on the subject, sometimes I was just lending my data experience to very little known topics. At the end of the day it is all data!

While most data investigations share the same techniques across different disciplines, many similar projects on the same topic kept coming back to me over the years: customer intelligence projects. Customer Intelligence is a collection of data analysis techniques with the goal of gaining insight on the customer experiences. It produces a single financially accountable view of all customer-related information.

A few customer intelligence applications require cutting-edge technology and cannot yet exploit structured and consolidated techniques. For example, the analysis of social networks leverages the most modern algorithms for text mining and network analytics. However, most customer intelligence solutions have been around for quite some time and can take advantage of years of previously existing experience on the subject. For example, customer segmentation and churn retention are by now mature areas of data analytics and can rely on traditional classification, clustering, and/or predictive techniques.

Customer Segmentation.
Customer segmentation groups together customers and therefore allows to treat different groups of customers differently. There are mainly two strategy directions to group customers together, depending on the data at hand and/or on the goal of the analysis.

In one case, we might want to isolate customers producing a specific feature.  This is really a customer classification task. Customer classification infers the value of the target feature from all other descriptive features. That is, it allows for the creation of different groups of customers producing similar values of the target feature.

In a previous project, the goal was set to get to know better the groups of customers producing different revenues for the company. That is, to better understand which kind of customers were most/least valuable for the company revenues, in terms of demographics, spending habits, budget, and likelihood to buy other products.

Picture
The results of the analysis have separated very high value customers from just high value customers each group with the following features.

Very high value customers are curious to know (and probably buy) new products, making them interesting targets for further promotions. On the negative side, the same very high value customers are not necessarily the most loyal ones. Indeed, they are as curious about new products as about new companies and might easily switch to the competition.

On the opposite, high value customers consist mainly of old faithful customers, who have been buying the same product for years but have never been tempted to buy something else for whatever reason (reduced budget or reduced curiosity). Such customers, though still very valuable for the company, have been excluded from the promotions of new products.

Picture
The second strategy is to group customers together blindly, that is without privileging any particular feature in their descriptive pattern. This is called customer clustering and assigns similar customers to the same cluster. The similarity across customers is calculated on all descriptive features without trying to identify any pre-defined group of customers.

This is a classic customer segmentation strategy when nothing is known about the customers or when we want to discover something new about the customer basis without prejudices. In the same project, this kind of analysis led to the identification of a not negligible group of customers that we called the zero-value faithful customers.

The zero-value faithful customers remained with the company producing zero revenues for many years. These customers took advantage of a number of the company’s free promotions without ever turning into profitable customers. Useless to say, such customers have been removed from future promotions, saving a considerable amount of money for the company.

This is just a small taste of what customer segmentation analysis can discover among your customer data. Depending on how rich your data is, customer segmentation can enrich your knowledge about the customer basis and consequently your revenue. In fact, the selection of the right customers for new promotions, the expansion of the market base, the retention of the high-value old faithful customers, as well as the exclusion of the zero-value old faithful customers are all keys to higher profits and can make you more and more competitive in today’s market.

In addition, customer segmentation, like other data analytics procedure, benefit from years of experience, which makes them solidly grounded, easy to run routinely, and just little expensive. The availability of such standard and consolidated procedures makes customer intelligence not only and not anymore the expensive toy for large companies, but makes it now affordable by small businesses as well. Do not let your competition know more about your customers than you do!

Contact us at www.dataminingreporting.weebly.com/contact_us for a quick offer on a customer segmentation application or on any other data analytics procedure on your customer data!


 
 
A second whitepaper has been published in the series "Usable Customer Intelligence from Social Media Data". The subtitle of this second whitepaper is "Clustering the Social Community". This whitepaper takes a further step in analyzing and positioning the users of a social media forum.
It is based on the data created in the previous whitepaper "Usable Customer Intelligence from Social Media Data: Network Analytics meets Text Mining". Here text mining and network analytics techniques were combined to extend the feature set describing each forum users.

This feature set has then been the basis for this second analysis. Here we clustered the data using the k-Means algorithm and we identified a few clusters for neutral and inactive user, one cluster for very active and enthusiastic users (your superfans), a few smaller clusters for still active and still quite enthusiastic fans, and finally some clusters with negative fans and various degrees of activities.

A few interesting conclusions emerged from this study, like how to proceed with users from the different clusters or the concept of leadership and follower with respect to the concept of general activity. 

The whitepaper was developed with the KNIME Team and, like for the previous one, the pdf file and the KNIME workflows are downloadable from the KNIME whitepaper site.
 
 
I recently discovered how to extract cluster information from a k-Means model in KNIME.

The k-Means model is output by a k-Means node as a PMML model. So, first of all we convert the PMML model into an XML table cell with the "PMML To Cell" node. After that we start with the XML parsing of the cell content.

If you write the XML content of the cell into a file with the "XML Writer" node, you can see that the root is <PMML>, followed by <ClusteringModel>, and then by <Cluster>. A <Cluster> contains all info for a particular cluster of the k-Means model.

The "Xpath" node implements this XPath query "/dns:PMML/dns:ClusteringModel/dns:Cluster") and extracts all <Cluster> into one cell.
The "Ungroup" node assigns each <Cluster> to one data row.
And finally a number of "XPath" nodes extracts further information about each cluster. For example, an "XPath" node with XPath query "/dns:Cluster/@name" extracts the cluster name or an "XPath" node with XPath query "/dns:Cluster/dns:Array/text()" extracts the text content of <Cluster>, that is the prototype values of the used attributes.

The attribute values are all concatenated together in a long string. We need to use a "Cell Splitter" node to get the single values.

Here under there is the sub-workflow used to extract the information about the k-Means prototype from a PMML model produced by a k-Means node.