Missing Feature Values

The ability to deal with training data, where the feature values of individual examples are partly unknown, is a common requirement of many real-world classification problems. Different strategies for handling missing values can be found in the rule learning literature[1]. The algorithms developed by us, ignore examples with missing values when evaluating the conditions that can be added to a rule’s body with respect to a certain feature. Consequently, rules that contain conditions on a feature \(A_{l}\) in their body can never be satisfied by examples \(\boldsymbol{x}_{n}\) for which the feature value \(x_{nl}\) is missing.

Only minor adjustments are necessary to apply the pre-sorted search algorithm to a feature for which some examples may lack a value. First of all, the examples that do not assign a value to the respective feature are excluded from the sorted feature vector traversed by the algorithm and therefore are ignored when enumerating the possible thresholds of conditions and aggregating the statistics of examples they cover. Instead, the algorithm keeps track of the examples that do not assign a value to a particular feature in a separate data structure. When searching for possible conditions concerned with the respective feature, the statistics of the examples that are known to lack a value for the feature must be subtracted from the globally aggregated statistics (previously denoted as \(\boldsymbol{s}\)). This ensures that the statistics that are aggregated while processing a vector of sorted feature values, as well as the statistics that are computed as the difference between previously aggregated statistics and the globally aggregated ones, do not include examples with missing values, the considered refinements of a rule cannot cover.