Output of Experimental Results

One of the most important features provided by the command line API is the ability to output a wide variety of experimental results that provide valuable insights into the models learned by a machine learning algorithm, the predictions it provides, and the data it has been trained on.

Each of these information can either be printed to the console or saved to output files. The latter requires to provide a directory, where the output files should be saved. As shown in the examples below, the path to this directory must be specified via the argument --output-dir.

Note

The path of the directory, where experimental results should be saved, can be either absolute or relative to the working directory.

Evaluation Results

TODO

Predictions

In cases where the performance metrics obtained via the arguments --print-evaluation or --store-evaluation are not sufficient for a detailed analysis, it may be desired to directly inspect the predictions provided by the evaluated models. They can be printed on the console, together with the ground truth labels, by proving the argument --print-predictions:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --print-predictions true

Alternatively, the argument --store-predictions can be used to save the predictions, as well as the ground truth labels, to .arff files:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --output-dir /path/to/results/ --store-predictions true

Tip

Depending on the Types of Predictions, the machine learning models used in an experiment are supposed to provide, the predictions stored in the resulting output files are either binary values (if binary predictions are provided), or real values (if regression scores or proability estimates are provided). When working with real-valued predictions, the option decimals may be supplied to the arguments --print-predictions and --store-predictions to specify the number of decimals that should be included in the output (see here for more information).

When using Train-Test-Splits, a single model is trained and queried for predictions for the test set. These predictions are written into a single output file. When using an Evaluation on the Training Data, predictions are also obtained for the training set and written into an additional output file. The names of the output files indicate whether the predictions have been obtained for the training or test set, respectively:

  • predictions_train_overall.arff

  • predictions_test_overall.arff

When using a Cross Validation for performance evaluation, a model is trained for each fold. Similar to before, the names of the output files indicate whether the predictions correspond to the training or test data:

  • predictions_train_fold-1.arff

  • predictions_test_fold-1.arff

  • predictions_train_fold-2.arff

  • predictions_test_fold-2.arff

  • predictions_train_fold-3.arff

  • predictions_test_fold-3.arff

  • predictions_train_fold-4.arff

  • predictions_test_fold-4.arff

  • predictions_train_fold-5.arff

  • predictions_test_fold-5.arff

Prediction Characteristics

By using the command line argument --print-prediction-characteristics, characteristics regarding a model’s predictions can be printed:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --print-prediction-characteristics true

Alternatively, they statistics can be written into a .csv file by using the argument --store-prediction-characteristics:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --output-dir /path/to/results/ --store-prediction-characteristics true

Tip

The output produced by the arguments --print-data-characteristics and --store-data-characteristics can be customized via several options described here. It is possible to exclude certain statistics from the output, to specify whether they should be given as percentages, and how many decimal places should be used.

The statistics obtained via the arguments given above correspond to the test data for which predictions are obtained from the model. Consequently, they depend on the strategy used for splitting a dataset into training and test sets. When using Train-Test-Splits, predictions for a single test set are obtained and their characteristics are written into a file. In addition, statistics for the training data are written into an additional output file when using an Evaluation on the Training Data:

  • prediction_characteristics_train_overall.arff

  • prediction_characteristics_test_overall.arff

When using a Cross Validation, the data is split into several parts of which each one is used once for prediction. Multiple output files are needed to save the statistics for different cross validation folds. For example, a 5-fold cross validation results in the following files:

  • prediction_characteristics_fold-1.csv

  • prediction_characteristics_fold-2.csv

  • prediction_characteristics_fold-3.csv

  • prediction_characteristics_fold-4.csv

  • prediction_characteristics_fold-5.csv

The statistics obtained via the previous commands include the following:

  • The number of labels: Indicates for how many lables predictions have been obtained.

  • The sparsity of the prediction matrix: The percentage of labels predicted as irrelevant for all examples.

  • The average label cardinality: The average number of labels predicted as relevant for each example.

  • The number of distinct label vectors: The number of unique label combinations predicted for different examples.

  • The label imbalance ratio: A measure for the imbalance between labels predicted as relevant and irrelevant, respectively. [1]

Data Characteristics

To obtain insightful statistics regarding the characteristics of a data set, the command line argument --print-data-characteristics may be helpful:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --print-data-characteristics true

If you prefer to write the statistics into a .csv file, the argument --store-data-characteristics can be used:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --output-dir /path/to/results/ --store-data-characteristics true

Tip

As shown here, the arguments --print-data-characteristics and --store-data-characteristics come with several options that allow to exclude specific statistics from the respective output. It is also possible to specify whether percentages should be prefered for presenting the statistics. Additionally, the number of decimals to be included in the output can be limited.

The statistics provided by the previous commands are obtained on the training data and therefore depend on the strategy used for splitting a dataset into training and test sets. If Train-Test-Splits are used, a single training set is used and its characteristics are saved to a file:

  • data_characteristics_overall.csv

In contrast, when using a Cross Validation, the data is split into several parts of which each one is used once for training. As a result, multiple output files are created in a such a scenario. For example, a 5-fold cross validation results in the following files:

  • data_characteristics_fold-1.csv

  • data_characteristics_fold-2.csv

  • data_characteristics_fold-3.csv

  • data_characteristics_fold-4.csv

  • data_characteristics_fold-5.csv

The output produced by the previous commands includes the following information regarding a dataset’s features:

  • The number of examples contained in a dataset: Besides the total number, the number of examples per type of feature (numerical, ordinal, or nominal) is also given.

  • The sparsity of the feature matrix: This statistic calculates as the percentage of elements in the feature matrix that are equal to zero.

In addition, the following statistics regarding the labels in a dataset are provided:

  • The total number of available labels

  • The sparsity of the label matrix: This statistic calculates as the percentage of irrelevant labels among all examples.

  • The average label cardinality: The average number of relevant labels per example.

  • The number of distinct label vectors: The number of unique label combinations present in a dataset.

  • The label imbalance ratio: An important metric in multi-label classification measuring to which degree the distribution of relevant and irrelevant labels is unbalanced. [1]

Label Vectors

We refer to the unique labels combinations present for different examples in a dataset as label vectors. They can be printed by using the command line argument --print-label-vectors:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --print-label-vectors true

If you prefer writing the label vectors into an output file, the argument --store-label-vectors can be used:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --store-label-vectors true

When using Train-Test-Splits for splitting the available data into distinct training and test sets, a single output file is created. It stores the label vectors present in the training data:

  • label_vectors_overall.csv

When using a Cross Validation, several models are trained on different parts of the dataset. The label vectors present in each of these training sets are written into separate output files. For example, the following files result from a 5-fold cross validation:

  • label_vectors_fold-1.csv

  • label_vectors_fold-2.csv

  • label_vectors_fold-3.csv

  • label_vectors_fold-4.csv

  • label_vectors_fold-5.csv

The above commands output each label vector present in a dataset, as well as their frequency, i.e., the number of examples they are associated with. Moreover, each label vector is assigned an unique index. By default, feature vectors are given in the following format, where the n-th element indicates whether the n-th label is relevant (1) or not (0):

[0 0 1 1 1 0]

By setting the option sparse to the value true, an alternative representation can be used (see here). It consists of the indices of all relevant labels in a label vector (counting from zero and sorted in increasing order), while all irrelevant ones are omitted. Due to its compactness, this representation is particularly well-suited when dealing with a large number of labels:

[2 3 4]

Model Characteristics

To obtain a quick overview of some statistics that characterize a rule-based model learned by one of the algorithms provided by this project, the command line argument --print-model-characteristics can be useful:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --print-model-characteristics true

The above command results in a tabular representation of the characteristics being printed on the console. If one intends to write them into a .csv file instead, the argument --store-model-characteristics may be used:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --output-dir /path/to/results/ --store-model-characteristics true

Model characteristics are obtained for each model training during an experiment. This means that a single output file is created when using on Train-Test-Splits:

  • model_characteristics_overall.csv

When using a Cross Validation, several models are trained on different parts of the available data, resulting in multiple output files being saved to the output directory. For example, the following files are created when conducting a 5-fold cross validation:

  • model_characteristics_fold-1.csv

  • model_characteristics_fold-2.csv

  • model_characteristics_fold-3.csv

  • model_characteristics_fold-4.csv

  • model_characteristics_fold-5.csv

The statistics captured by the previous commands include the following:

  • Statistics about conditions: Information about the number of rules in a model, as well as the different types of conditons contained in their bodies.

  • Statistics about predictions: The distribution of positive and negative predictions provided by the rules in a model.

  • Statistics per local rule: The minimum, average, and maximum number of conditions and predictions the rules in a model entail in their bodies and heads, respectively.

Rules

It is considered one of the advantages of rule-based machine learning models that they capture patterns found in the training data in a human-comprehensible form. This enables to manually inspect the models and reason about their predictive behavior. To help with this task, the command line API allows to output the rules in a model using a textual representation. If the text should be printed on the console, the following command specifying the argument --print-rules can be used:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --print-rules true

Alternatively, by using the argument --store-rules, a textual representation of models can be written into a text file in the specifed output directory:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --output-dir /path/to/results/ --store-rules true

Tip

Both, the --print-rules and --store-rules arguments, come with several options that allow to customize the textual representation of models. An overview of these options is provided here.

When using Train-Test-Splits, only a single model is trained. Consequently, the above command results in a single output file being created:

  • rules_overall.csv

A Cross Validation results in multiple output files, each one corresponding to one of the models trained for an individual fold, being written. For example, a 5-fold cross validation produces the following files:

  • rules_fold-1.csv

  • rules_fold-2.csv

  • rules_fold-3.csv

  • rules_fold-4.csv

  • rules_fold-5.csv

Each rule in a model consists of a body and a head (we use the notation body => head). The body specifies to which examples a rule applies. It consist of one or several conditions that compare the feature values of given examples to thresholds derived from the training data. The head of a rule consists of the predictions it provides for individual labels. The predictions provided by a head may be restricted to a subset of the available labels or even a single one.

If not configured otherwise, the first rule in a model is a default rule. Unlike the other rules, it does not contain any conditions in its body and therefore applies to any given example. As shown in the following example, it always provides predictions for all available labels:

{} => (label1 = -1.45, label2 = 1.45, label3 = -1.89, label4 = -1.94)

The prediction for a particular label is positive, if most examples are associated with the respective label, otherwise it is negative. The ratio between the number of examples that are associated with a label, and those that are not, affects the absolute size of the default prediction. Large values indicate a stong preference towards predicting a particular label as relevant or irrelevant, depending on the sign.

The remaining rules only apply to examples that satisfy all of the conditions in their bodies. For example, the body of the following rule consists of two conditions:

{feature1 <= 1.53 & feature2 > 7.935} => (label1 = -0.31)

Examples that satisfy all conditions in a rule’s body are said to be “covered” by the rule. If this is the case, the rule assigns a positive or negative value to one or several labels. Similar to the default rule, a positive value expresses a preference towards predicting the corresponding label as relevant. A negative value contributes towards predicting the label as irrelevant. The absolute size of the value corresponds to the weight of the rule’s prediction. The larger the value, the stronger the impact of the respective rule, compared to the other ones.

Probability Calibration Models

Some machine learning algorithms provided by this project allow to obtain probabilistic predictions. These predictions can optionally be fine-tuned via calibration models to improve the reliability of the probability estimates. We support two types of calibration models for tuning marginal and joint probabilities, respectively. If one needs to inspect these calibration models, the command line arguments --print-marginal-probability-calibration-model and --print-joint-probability-calibration-model may be helpful:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --print-marginal-probability-calibration-model true --print-joint-probabiliy-calibration-model true

Alternatively, a representations of the calibration models can be written into .csv files by using the arguments --store-marginal-probability-calibration-model and --store-joint-probability-calibration-model

boomer --data-dir /path/to/datasets/ --dataset dataset-name --store-marginal-probability-calibration-model true --store-joint-probabiliy-calibration-model true

Tip

All of the above commands come with options for customizing the textual representation of models. A more detailed description of these options is available here.

Calibration models are learned during training and depend on the training data. Train-Test-Splits, where only a single model is trained, result in a single file being created for each type of calibration model:

  • marginal_probability_calibration_model_overall.csv

  • joint_probability_calibration_model_overall.csv

In contrast, a Cross Validation produces multiple output files. Each one corresponds to a calibration model learned on the training data for an individual fold. For example, the following files are created when using a 5-fold cross validation:

  • marginal_probability_calibration_model_fold-1.csv

  • marginal_probability_calibration_model_fold-2.csv

  • marginal_probability_calibration_model_fold-3.csv

  • marginal_probability_calibration_model_fold-4.csv

  • marginal_probability_calibration_model_fold-5.csv

  • joint_probability_calibration_model_fold-1.csv

  • joint_probability_calibration_model_fold-2.csv

  • joint_probability_calibration_model_fold-3.csv

  • joint_probability_calibration_model_fold-4.csv

  • joint_probability_calibration_model_fold-5.csv