Using the Python API

The BOOMER algorithm and the SeCo algorithm provided by this project are published as the packages mlrl-boomer and mlrl-seco, respectively (see Installation). The former is implemented by the classes mlrl.boosting.BoomerClassifier and mlrl.boosting.BoomerRegressor, whereas the latter is implemented by the class mlrl.seco.SeCoClassifier. All of these classes follow the conventions of a scikit-learn estimator. Therefore, they can be used similarly to other machine learning methods that are included in this popular framework. The getting started guide that is provided by the scikit-learn developers is a good starting point for learning about the framework’s functionalities and how to use them.

Fitting an Estimator

Whereas the SeCo algorithm is restricted to classification problems, the BOOMER algorithm can also be used for solving regression problems. In the following, we demonstrate the use of these algorithms.

For simplicity, the following examples use small, hard-coded data matrices as inputs to the algorithms. In practice, one usually retrieves the data from files rather than manually specifying the values of the feature and ground truth matrices. Information about supported datasets can be found here.

Classification Problems

An illustration of how the classification algorithms can be fit to exemplary training data is shown in the following:

from mlrl.boosting import BoomerClassifier

clf = BoomerClassifier()  # Create a new estimator
x = [[  1,  2,  3],  # Two training examples with three features
     [ 11, 12, 13]]
y = [[1, 0],  # Ground truth labels of each training example
     [0, 1]]
clf.fit(x, y)
from mlrl.seco import SeCoClassifier

clf = SeCoClassifier()  # Create a new estimator
x = [[  1,  2,  3],  # Two training examples with three features
     [ 11, 12, 13]]
y = [[1, 0],  # Ground truth labels of each training example
     [0, 1]]
clf.fit(x, y)

The fit method accepts two inputs, x and y:

  • A two-dimensional feature matrix x, where each row corresponds to a training example and each column corresponds to a particular feature.

  • A one- or two-dimensional, binary label matrix y, where each row corresponds to a training example and each column corresponds to a label. If an element in the matrix is unlike zero, it indicates that the respective label is relevant to an example. Elements that are equal to zero denote irrelevant labels. In multi-label classification, where each example may be associated with several labels, the label matrix is two-dimensional. However, the algorithms are also capable of dealing with traditional binary classification problems, where a one-dimensional vector of ground truth labels is provided to the learning algorithm.

Both, x and y, are expected to be numpy arrays or equivalent array-like data types.

Regression Problems

For solving regression problems rather than classification problems, as shown above, the BOOMER algorithm can be used as follows:

from mlrl.boosting import BoomerRegressor

clf = BoomerRegressor()  # Create a new estimator
x = [[  1,  2,  3],  # Two training examples with three features
     [ 11, 12, 13]]
y = [[0.34, -1.20],  # Ground truth scores of each training example
     [1.43,  0.78]]
clf.fit(x, y)

The arguments that must be passed to the fit method are similar to the ones used in classification problems and are expected to be numpy arrays or equivalent array-like data types: x is a feature matrix and y is a one- or two-dimensional ground truth matrix, where each row corresponds to a training example and each column corresponds to a numerical output variable to predict for. The algorithm supports to predict for a single output variable or multiple ones.

Using Sparse Matrices

In addition to dense matrices like numpy arrays, the algorithms also support to use scipy sparse matrices. If certain cases, where the feature matrices consists mostly of zeros (or any other value), this can require significantly fewer amounts of memory and may speed up training. Sparse matrices can be provided to the fit method via the arguments x and y just as before. Optionally, the value that should be used for sparse elements in the feature matrix x can be specified via the keyword argument sparse_feature_value:

clf.fit(x, y, sparse_feature_value = 0.0)

Nominal and Ordinal Features

The algorithms provided by this project are capable of dealing with nominal and ordinal features. In both cases, the corresponding feature values are expected to be integers. Unlike ordinal and numerical (real-valued) feature values, nominal feature values (including binary ones) cannot be sorted. If nominal or ordinal features are present in a dataset, it is necessary to inform the algorithms about these features. Otherwise, they will be treated as numerical ones. As can be seen in the following, the keyword arguments ordinal_feature_indices and nominal_feature_indices are meant to be used for specifying the indices of ordinal and nominal features, respectively:

clf.fit(x, y, nominal_feature_indices=[0, 2], ordinal_feature_indices=[1])

Custom Weights for Training Examples

By default, all training examples have identical weights. This means that incorrect predictions for each of these examples are penalized in the same way by the training algorithm. However, in some use cases, e.g., when dealing with imbalanced data, it might be desirable to penalize incorrect predictions for some examples more heavily than for others. For this reason, it is possible to provide arbitrary (positive) integer- or real-valued weights to an algorithm’s fit-method via the keyword argument sample_weight:

clf.fit(x, y, sample_weight=[1.5, 1])

Setting Algorithmic Parameters

In the previous example the algorithms’ default configurations are used. However, in many cases it is desirable to adjust the algorithmic behavior by providing custom values for one or several of the algorithm’s parameters. This can be achieved by passing the names and values of the respective parameters as constructor arguments:

clf = BoomerClassifier(max_rules=100, loss='logistic_example_wise')
clf = BoomerRegressor(max_rules=100, loss='squared_error_example_wise')
clf = SeCoClassifier(max_rules=100, heuristic='m-estimate')

A description of all available parameters is available for both, the BOOMER and the SeCo algorithm.

Making Predictions

Once an estimator has been fitted to the training data, its predict method can be used to obtain predictions for previously unseen examples:

pred = clf.predict(x)
print(pred)

In this example, we use the estimator to predict for the same data that has previously been used for training. In case of the classification problem shown above, this results in the original ground truth labels to be printed:

[[1 0]
 [0 1]]

The argument x that must be passed to the predict method, has the same semantics as for the fit method. It can either be a numpy array, an equivalent array-like data type, or a scipy sparse matrix. In the latter case, the value that should be used for sparse elements in the feature matrix x can be specified via the keyword argument sparse_feature_value:

pred = clf.predict(x, sparse_feature_value = 0.0)

By default, the data type of the ground truth is also used for the predictions. If a different type should be used, it can be specified via the keyword argument dtype:

import numpy as np

pred = clf.predict(x, dtype=np.float32)

Predicting Probabilities

As a probabilistic machine learning method, the mlrl.boosting.BoomerClassifier is capable of predicting probability estimates. These probabilities can be obtained by invoking a previously fitted estimator’s predict_proba method:

pred = clf.predict_proba(x)
print(pred)

In case of a multi-label classification problem, the probabilities are given as a matrix, where each row and column corresponds to a query example and label, respectively. Each value in the matrix specifies the probability of the respective label being relevant to an example. Furthermore, as shown in the following example, all values are in the range [0, 1]:

[[0.98 0.23]
 [0.19 0.84]]

Following scikit-learn conventions, when dealing with single-label problems, the output format differs from the format used above. In the single-label case, a matrix with two columns is returned. The values in these columns correspond to the probability of the positive or negative class being correct for an example, respectively.

Finally, it should be noted that the predict_proba function supports the same keyword arguments as previously described with regard to the predict function.

Predicting Scores

In addition to probabilities, the mlrl.boosting.BoomerClassifier may also provide real-valued scores as predictions. These scores are the raw predictions from which binary predictions and probabilities are derived. As shown in the example below, scores can be obtained from a fitted estimator via the function decision_function:

pred = clf.decision_function(x)
print(pred)

The scores are given as a matrix, where each row and column corresponds to a query example and label. As shown in the example below, a single score is in the range [\(-\infty\), \(+\infty\)]. It indicates whether the corresponding label is likely to be relevant to an example, if it is positive, or irrelevant, if it is negative. Moreover, the absolute value of a score corresponds to the confidence of the model being correct.

[[ 4.62 -2.48]
 [-1.92  3.34]]

The function decision_function supports the same keyword arguments as the predict function discussed earlier.

Accessing the Rules in a Model

In some cases it might be desirable to access the rule in a model that has been learned via the fit method. For this purpose, we provide a convenient API that is illustrated in the following example:

clf = clf.fit(x, y)

for rule in clf.model_:
    for condition in rule.body:
        print(f'{condition.feature_index}, {condition.comparator} {condition.threshold}')
    
    for prediction in rule.head:
        print(f'{prediction.output_index} {prediction.value}')

For details, we refer to the API reference of the classes RuleModel, Rule, Condition and Prediction.