However, interpretation of the decision tree rules is only half of the story. The remaining half is the space coverage of such rules. Once a rule is created, how many patterns are affected and how important are such patterns? That is, if a rule covers only a few patterns, do these make a real separated group or are they just random patterns detected by an over-trained tree?
In today’s blog post, we do two things:
- We translate a decision tree into a set of interpretable rules
- We check the space coverage of such rules in a scatter plot
Then the decision tree is on one side exported as an image through a “Decision Tree To Image” node and on the other side converted into a set of rules in the form of:
IF <condition> => <predicted class>
The node that converts a decision tree model into a set of rules is the “Decision Tree to Ruleset” node.
At this point we could apply the ruleset to a dataset to get the same results as by applying a “Decision Tree Predictor” node. This is not the goal of this post. Here we want to see which points of the data set are covered by which rule of the decision tree. We still use the ruleset for this purpose, but we first modify it a bit. The <predicted class> in each rule String is changed with the rule name itself, as if the target class is the rule name and not the original class. Now this modified rule set is applied to the data set and each data point is labelled with the name of the rule that fits it. Data points are then colored based on the assigned rule name.
Below is the final WebPortal page, with the list of rules, the image of the decision tree, and the scatter plot with the data points aggregated by the color of the covering rule.
There are 5 branches in the decision tree, corresponding to 5 rules, each one with a different color. In the scatter plot of age vs. churn risk, rule number 1 in blue covers most of the customers; s number 2 and 3 cover the customers with highest churn risk; and s number 4 and 5 cover the customers with lower churn risk. Rule number 5 is also the least representative and, if removed, its job could probably be done by rule number 4.