In my last work on the Internet of Things with Phil Winters, presented shortly at the KNIME UGM 2014, I implemented a Lean Restocking Alert System for a bike share service. The goal was to trigger an alarm for each bike station, if likely to be in need of restocking in the following hour.
The data, made publicly available by Capital Bike Share, flagged the hours of bike restocking for each station. The restocking process was triggered manually in the main monitoring center. To predict the need for restocking I used a decision tree. The task of the decision tree was then to predict the restocking flag in the next hour. To align the restocking flag in the next hour with the current data, I used a Lag Column node with Lag Interval set to 1.
Since more is usually better, as input features I used all current information available in the data (weather and calendar infos) as well as a variable number of bike ratio (current number of bikes/number of docks available) past values. So far not much thinking was required: a decision tree with all available input features and more.
Now though, if we really want to do some thingking, we might wonder if more is always better and, if not, which is the best performing subset of input features. For example, does the past ratio play an important role in the prediction model? If not, would not that be better to simplify the model to a leaner subset of input features? To answer all of those and more questions about the optimal subset of input features, I dug up the Feature Elimination meta-node (see figure below).
The feature Elimination meta-node implements a loop, finding the best performing n-i-dimensional training set at each iteration i. It starts with n input features, then finds the subset of n-1 input features with lowest error on the test set, then the n-2 input features again with lowest error on the test set, and so on. At each iteration i, the best performing subset of n-i input features is found by iterating on all available input features and leaving one out each time. The one input feature, whose absence disrupts the least the predictive model performance, is left out for good.
The configuration window of this node can be interactive (i.e. you manually select the n-i input features you would like to use to train your final model) or automatic (i.e. you define a threshold and the training set with smallest dimensionality and an error below the given threshold is the output for the following nodes).
The configuration window of the Backward Feature Elimination Filter node (see figure below) shows clearly the error increase with the decreasing of the training set dimensionality, until no more acceptable values. It is also interesting to see, that too many input features might confuse the predicitve model and sometimes lead to a worse performance than a leaner set of input features.
This does make sense, since the hour of the day and the flag indicating a working day are predictive factors of the amount of traffic in the city and around each bike station, while the current bike ratio is a descriptor for the current restocking siatuation.
In conclusion, not always more is better. Sometimes, thinking ahead and defining a data set with leaner dimensionality might help the system in terms of computational effort as well as of performances.