Algorithmic Optimizations¶

See also

Most of the content in this chapter has been taken from the publication ‘On the efficient implementation of classification rule learning’, Michael Rapp, Johannes Fürnkranz and Eyke Hüllermeier (2023).

Rule learning methods have a long history of active research in the machine learning community. They are not only a common choice in applications that demand human-interpretable classification models but have also been shown to achieve state-of-the-art performance when used in ensemble methods. Unfortunately, only little information can be found in the literature about the various implementation details that are crucial for the efficient induction of rule-based models. In this documentation, we provide a detailed discussion of algorithmic concepts and approximations that enable applying our rule learning algorithms to large amounts of data. The techniques discussed in this chapter are used by both, the BOOMER algorithm and the SeCo algorithm, which is made possible by their common framework.

The following sections shed some light on the internals of some of the most efficient rule learning algorithms available today. First and foremost, this refers to the idea of using a so-called pre-sorted search algorithm for the construction of individual rules. In addition, we elaborate on extensions of this fundamental principle that allow dealing with nominal features and missing feature values natively. In contrast to most statistical machine learning approaches, including artificial neural networks, the ability to handle such data naturally, without any pre-processing techniques, is an advantage of symbolic learning methods. We also discuss how sparsity in the feature values of training examples, which is a common characteristic of many datasets, can be exploited to further reduce computational costs. In domains with many numerical features, the ability to exploit sparsity in the feature values does not provide any significant advantages in terms of scalability. As an alternative, we also investigate a histogram-based search algorithm that generalizes existing ideas from decision tree learning to rule-based models. Finally, we elaborate on different possibilities to use parallelization to further speed up the training of predictive models.