|DMR - Data Mining and Reporting||
Here is an interesting post about the role that data analytics played in the last US election.
This post is about a very anonymous node, but which I find quite useful. So I thought it deserves mentioning in this blog. The node is the "One2Many" node.
Many analytics algorithms only accept numerical values, for example distance based algorithms. In the cases then when a nominal data column is present, the analyst needs to implement a few tricks to include the data column into the analysis.
One trick is of course to assign numbers to (to encode) the different nominal values. Another trick is to create a binary matrix with the same nominal values.
Let's make an example of such a binary matrix. Let's read the adult.data file. This data set contains a string field "native-country". We want to transform a data column with a few nominal values, like "United States", "France", "Canada", and so on, into a matrix with column headers provided by the nominal values of the column and values 0/1 depending on whether the record/person was born in that country or not (see table below).
That is exactly waht the node "One2Many" does. Takes the string values in the nominal column and reports them as header of a matrix, then assigns value 0 or 1 to each new column for each record depending on the original column value for that record, and finally appends the resulting matrix to the original data table.
Applying the "One2Many" node to the "native-country" column of the adult data set, we got the following data representation., where "United States" is translated in "1" under column "United States" and "0" under all other native-country related columns.
The node "Many2One" runs the opposite transformation. Given a binary matrix, it condenses all 0/1 values into one string value in one nominal column.
Recently I have been involved in the organization of the first nation-based KNIME User Day: the KNIME User Day Italia in Milano on the 9th of October 2012. The day was packed with technical presentations, KNIME updates, and networking intervals.
The first presentation about what KNIME has to offer and what it can do for your company, both with the desktop free version and the commercial products, captured the ears of the first-timers and other inexperienced KNIME users. The following talk listed the new features in the latest KNIME releases and was extremely well received by the more experienced part of the audience.
After this first short part dedicated to describe KNIME in its general and latest features, the community sessions began. The quality of the technical talks was incredibly high, even much higher than many of the conferences on analytics I have been to.
Alessandro Usseglio Viretta from InNumero started the community session with a talk showing the simulation of the Fama and French algorithm to measure the amount of skills and lack in a fund management. A Linear Regression node was used to estimate the alpha of the investment fund and the bootstrapping technique was implemented by means of the Shuffle node.
Phil Winters then described a number of possible data mining applications in the field of customer intelligence, mainly concentrating on segmentation and the use of decision tree and k-means to select groups of customers.
The third talk, from Andrea Scarso at Moneyfarm, implemented a psychological evaluation of the customer in order to advise him/her the most suitable investment option in terms of the customer’s evaluated risk attitude.
There was then a talk about a classical churn retention application given by Alfredo Roccato. It used a time series prediction technique to know in advance which customers are at risk of abandoning the company.
Alessandro La Torraca from the University Milano-Bicocca presented a very interesting work: the inference of the demographic features of an online-newspaper reader based on its online behavior features.
Two more chemistry oriented talks brought us to the end of the day. Even though chemistry is not my specialty, I got a few interesting hints about how to use KNIME even from these last two talks.
In summary, the first KNIME User Day Italia has been a huge success, thanks to the organizers, to the highly qualified speakers, and to the receptive and enthusiastic KNIME user community.
If by chance KNIME organizes a KNIME User Day workshop in your neighborhood, do not miss it! It has been a fantastic opportunity for networking and knowledge sharing.
I have worked on many different data analysis projects in the past years. Sometimes I was an expert on the subject, sometimes I was just lending my data experience to very little known topics. At the end of the day it is all data!
While most data investigations share the same techniques across different disciplines, many similar projects on the same topic kept coming back to me over the years: customer intelligence projects. Customer Intelligence is a collection of data analysis techniques with the goal of gaining insight on the customer experiences. It produces a single financially accountable view of all customer-related information.
A few customer intelligence applications require cutting-edge technology and cannot yet exploit structured and consolidated techniques. For example, the analysis of social networks leverages the most modern algorithms for text mining and network analytics. However, most customer intelligence solutions have been around for quite some time and can take advantage of years of previously existing experience on the subject. For example, customer segmentation and churn retention are by now mature areas of data analytics and can rely on traditional classification, clustering, and/or predictive techniques.
Customer segmentation groups together customers and therefore allows to treat different groups of customers differently. There are mainly two strategy directions to group customers together, depending on the data at hand and/or on the goal of the analysis.
In one case, we might want to isolate customers producing a specific feature. This is really a customer classification task. Customer classification infers the value of the target feature from all other descriptive features. That is, it allows for the creation of different groups of customers producing similar values of the target feature.
In a previous project, the goal was set to get to know better the groups of customers producing different revenues for the company. That is, to better understand which kind of customers were most/least valuable for the company revenues, in terms of demographics, spending habits, budget, and likelihood to buy other products.
The results of the analysis have separated very high value customers from just high value customers each group with the following features.
Very high value customers are curious to know (and probably buy) new products, making them interesting targets for further promotions. On the negative side, the same very high value customers are not necessarily the most loyal ones. Indeed, they are as curious about new products as about new companies and might easily switch to the competition.
On the opposite, high value customers consist mainly of old faithful customers, who have been buying the same product for years but have never been tempted to buy something else for whatever reason (reduced budget or reduced curiosity). Such customers, though still very valuable for the company, have been excluded from the promotions of new products.
The second strategy is to group customers together blindly, that is without privileging any particular feature in their descriptive pattern. This is called customer clustering and assigns similar customers to the same cluster. The similarity across customers is calculated on all descriptive features without trying to identify any pre-defined group of customers.
This is a classic customer segmentation strategy when nothing is known about the customers or when we want to discover something new about the customer basis without prejudices. In the same project, this kind of analysis led to the identification of a not negligible group of customers that we called the zero-value faithful customers.
The zero-value faithful customers remained with the company producing zero revenues for many years. These customers took advantage of a number of the company’s free promotions without ever turning into profitable customers. Useless to say, such customers have been removed from future promotions, saving a considerable amount of money for the company.
This is just a small taste of what customer segmentation analysis can discover among your customer data. Depending on how rich your data is, customer segmentation can enrich your knowledge about the customer basis and consequently your revenue. In fact, the selection of the right customers for new promotions, the expansion of the market base, the retention of the high-value old faithful customers, as well as the exclusion of the zero-value old faithful customers are all keys to higher profits and can make you more and more competitive in today’s market.
In addition, customer segmentation, like other data analytics procedure, benefit from years of experience, which makes them solidly grounded, easy to run routinely, and just little expensive. The availability of such standard and consolidated procedures makes customer intelligence not only and not anymore the expensive toy for large companies, but makes it now affordable by small businesses as well. Do not let your competition know more about your customers than you do!
Contact us at www.dataminingreporting.weebly.com/contact_us for a quick offer on a customer segmentation application or on any other data analytics procedure on your customer data!
Rosaria Silipo has been mining data since her master degree and kept mining data throughout all her doctoral program, her postdoc, and most of her following job positions.