I often consider data representation as the first step for data mining. Indeed data visualization is the first approach to investigate an amount of unknown data. In my work, before proceeding to the analysis of the data, I am always required to make a few reports where a few major variables are visible.
However, data visualization might result as cryptic as the amount of underlying data. In order to make data visualization effective we need to:
- Detect the important variables in the data
- Find the independent and the dependent variables
- Represent the data in a way that the independent variables are easy to spot
An example where an informative data representation actually made money
I would like here to tell a story about good and bad data representation.
A friend of mine is an osteopath and works for a big health structure. On the side she runs a few visits on her own in her local village. As good as she is as an osteopath; she encountered a few problems in the money administration of her local business. She came to me for help in her patient management system. In particular, she wanted a system to visualize:
o the patients,
o the number of visits for each patient,
o an alarm at the 10th visit, to send him the bill
o possibly a record of when the bill was sent and the payment received
I need to add that she did not want a computer application. She wanted a paper based scheme to use as a reminder for the whole bill process. At this point of the business, she was convinced that she had forgotten to send the bill to a few patients, just because she had not realized that their 10th visit had passed by.
The paper scheme she was working on was initially shaped like this:
After working with her half an hour trying to understand what her needs were and after a few intermediate tries, we came out with a different scheme still based on paper as she wished.
The rows were sorted by the visit number. The first row contained the first visit, the second row the second visit and so on. The 10th row was marked with color. After reaching the 10th row, the bill was sent and hopefully the money would have been received.
The information is organized visually and sorted according to the visit number.
As easy as it is, this scheme generated a very high return in money.