Overview of Arguments¶
In addition to the mandatory arguments that must be provided to the command line API to specify the dataset to be used for training, a wide variety of optional arguments are available as well. In the following, we provide an overview of these arguments and discuss their respective purposes.
Note
The arguments -h or --help result in a description of all available command line arguments being printed.
Note
When running the program with the argument -v or --version, the version of the software package is printed. The output also includes information about third-party dependencies it uses, the build options that have been used for building the package, as well as information about hardware resources it may utilize.
Note
The argument --log-level controls the level of detail used for log messages (Default value = info). It can be set to the values debug, info, warn, warning, error, critical, fatal or notset, where the first one provides the greatest level of detail and the last one disables logging entirely.
Basic Usage¶
A more detailed description of the following arguments can be found here.
The most basic command for running the program, only including mandatory arguments, is as follows:
testbed <module_or_source_file> --data-dir /path/to/dataset/ --dataset dataset-name
Module¶
The program dynamically loads a Python module or source file that provides an integration with a specific machine learning algorithm. To specify the module or source file to be used, the following mandatory arguments must be provided:
<module_or_source_file>The fully qualified name of a Python module, or an absolute or relative path to a Python source file, providing a Python class that extends frommlrl.testbed.runnables.Runnable. The name of the class must beRunnable, unless an alternative name is specified via the optional command line argument-ror--runnable.
The following optional arguments allow additional control over the loading mechanism:
-ror--runnable(Default value =Runnable) The name of the class extendingmlrl.testbed.runnables.Runnablethat resides within the module or source file specified via the argument<module_or_source_file>.
The arguments given above can be used to integrate any scikit-learn compatible machine learning algorithm with the command line API. You can learn about this here.
Dataset¶
The following mandatory arguments must always be given to specify the dataset that should be used, as well as the location where it should be loaded from.
--data-dirAn absolute or relative path to the directory where the data set files are located.--datasetThe name of the data set files (without suffix).
Optionally, the following arguments can be used to provide additional information about the dataset.
--sparse-feature-value(Default value =0.0) The value that should be used for sparse elements in the feature matrix. Does only have an effect if a sparse format is used for the representation of the feature matrix, depending on the parameter--feature-format.
Problem Type¶
The command line API can conduct experiments for classification and regression problems. When dealing with the latter, the type of the machine learning problem must explicitly be specified via the following argument:
--problem-type(Default value =classification)classificationThe dataset is considered as a classification data set.regressionThe dataset is considered as a regression data set.
Performance Evaluation¶
A more detailed description of the following arguments can be found here.
One of the most important capabilities of the command line API is to train machine learning models and obtain an unbiased estimate of their predictive performance. For this purpose, the available data must be split into training and test data. The former is used to train models and the latter is used for evaluation afterward, whereas the evaluation metrics depend on the type of predictions provided by a model.
Strategies for Data Splitting¶
--data-split(Default value =train-test)train-testThe available data is split into a single training and test set. Given thatdataset-nameis provided as the value of the argument--dataset, the training data must be stored in a file nameddataset-name_training.arff, whereas the test data must be stored in a file nameddataset-name_test.arff. If no such files are available, the program searches for a file with the namedataset-name.arffand splits it into training and test data automatically. The following options may be specified using the bracket notation:test_size(Default value =0.33) The fraction of the available data to be included in the test set, if the training and test set are not provided as separate files. Must be in (0, 1).
cross-validationA cross validation is performed. Given thatdataset-nameis provided as the value of the argument--dataset, the data for individual folds must be stored in files nameddataset-name_fold-1,dataset-name_fold-2, etc. If no such files are available, the program searches for a file with the namedataset-name.arffand splits it into training and test data for the individual folds automatically. The following options may be specified using the bracket notation:num_folds(Default value =10) The total number of cross validation folds to be performed. Must be at least 2.current_fold(Default value =0) The cross validation fold to be performed. Must be in [1,num_folds] or 0, if all folds should be performed.
noneThe available data is not split into separate training and test sets, but the entire data is used for training and evaluation. This strategy should only be used for testing purposes, as the evaluation results will be highly biased and overly optimistic. Given thatdataset-nameis provided as the value of the argument--dataset, the data must be stored in a file nameddataset-name.arff.
--evaluate-training-data(Default value =false)trueThe models are not only evaluated on the test data, but also on the training data.falseThe models are only evaluated on the test data.
Types of Predictions¶
--prediction-type(Default value =binary)scoresThe learner is instructed to predict scores. In this case, ranking measures are used for evaluation.probabilitiesThe learner is instructed to predict probability estimates. In this case, ranking measures are used for evaluation.binaryThe learner is instructed to predict binary labels. In this case, bi-partition evaluation measures are used for evaluation.
Incremental Evaluation¶
--incremental-evaluation(Default value =false)trueEnsemble models are evaluated repeatedly, using only a subset of their ensemble members with increasing size, e.g., the first 100, 200, … rules.min_size(Default value =0) The minimum number of ensemble members to be evaluated. Must be at least 0.max_size(Default value =0) The maximum number of ensemble members to be evaluated. Must be greater thanmin_sizeor 0, if all ensemble members should be evaluated.step_size(Default value =1) The number of additional ensemble members to be evaluated at each repetition. Must be at least 1.
falseModels are evaluated only once as a whole.
Data Pre-Processing¶
A more detailed description of the following arguments can be found here.
Depending on the characteristics of a dataset, it might be desirable to apply one of the following pre-processing techniques before training and evaluating machine learning models.
One-Hot-Encoding¶
--one-hot-encoding(Default value =false)trueOne-hot-encoding is used to encode nominal features.falseThe algorithm’s ability to natively handle nominal features is used.
Saving and Loading Models¶
A more detailed description of the following arguments can be found here.
Because the training of models can be time-consuming, it might be desirable to store them on disk for later use. This requires to specify the path of a directory where models should be saved.
--model-dir(Default value =None)An absolute or relative path to the directory where models should be stored. If such models are found in the specified directory, they are used instead of learning a new model from scratch. If no models are available, the trained models are saved in the specified directory once training has completed.
Saving and Loading Parameters¶
A more detailed description of the following arguments can be found here.
As an alternative to storing the models learned by an algorithm, the algorithmic parameters used for training can be saved to disk. This may help to remember the configuration used for training a model and enables to reload the same parameter setting for additional experiments.
--parameter-dir(Default value =None)An absolute or relative path to the directory where configuration files, which specify the parameters to be used by the algorithm, are located. If such files are found in the specified directory, the specified parameter settings are used instead of the parameters that are provided via command line arguments.
--print-parameters(Default value =false)trueAlgorithmic parameters are printed on the console.falseAlgorithmic parameters are not printed on the console.
--store-parameters(Default value =false)
Output of Experimental Results¶
A more detailed description of the following arguments can be found here.
To provide valuable insights into the models learned by an algorithm, the predictions they provide, or the data they have been derived from, a wide variety of experimental results can be written to output files or printed on the console. If the results should be written to files, it is necessary to specify an output directory:
--output-dirAn absolute or relative path to the directory where experimental results should be saved.
Evaluation Results¶
--print-evaluation(Default value =true)trueThe evaluation results in terms of common metrics are printed on the console. The following options may be specified using the bracket notation:decimals(Default value =2) The number of decimals to be used for evaluation scores or 0, if the number of decimals should not be restricted.percentage(Default value =true)true, if evaluation scores should be given as a percentage, if possible,falseotherwise.enable_all(Default value =true)true, if all supported metrics should be used unless specified otherwise,falseif all metrics should be disabled by default.hamming_loss(Default value =true)true, if evaluation scores according to the Hamming loss should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.hamming_accuracy(Default value =true)true, if evaluation scores according to the Hamming accuracy metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.subset_zero_one_loss(Default value =true)true, if evaluation scores according to the subset 0/1 loss should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.subset_accuracy(Default value =true)true, if evaluation scores according to the subset accuracy metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.micro_precision(Default value =true)true, if evaluation scores according to the micro-averaged precision metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.micro_recall(Default value =true)true, if evaluation scores according to the micro-averaged recall metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.micro_f1(Default value =true)true, if evaluation scores according to the micro-averaged F1-measure should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.micro_jaccard(Default value =true)true, if evaluation scores according to the micro-averaged Jaccard metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.macro_precision(Default value =true)true, if evaluation scores according to the macro-averaged precision metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.macro_recall(Default value =true)true, if evaluation scores according to the macro-averaged recall metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.macro_f1(Default value =true)true, if evaluation scores according to the macro-averaged F1-measure should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.macro_jaccard(Default value =true)true, if evaluation scores according to the macro-averaged Jaccard metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.example_wise_precision(Default value =true)true, if evaluation scores according to the example-wise precision metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.example_wise_recall(Default value =true)true, if evaluation scores according to the example-wise recall metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.example_wise_f1(Default value =true)true, if evaluation scores according to the example-wise F1-measure should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.example_wise_jaccard(Default value =true)true, if evaluation scores according to the example-wise Jaccard metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.accuracy(Default value =true)true, if evaluation scores according to the accuracy metric should be printed,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.zero_one_loss(Default value =true)true, if evaluation scores according to the 0/1 loss should be printed,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.precision(Default value =true)true, if evaluation scores according to the precision metric should be printed,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.recall(Default value =true)true, if evaluation scores according to the recall metric should be printed,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.f1(Default value =true)true, if evaluation scores according to the F1-measure should be printed,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.jaccard(Default value =true)true, if evaluation scores according to the Jaccard metric should be printed,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.mean_absolute_error(Default value =true)true, if evaluation scores according to the mean absolute error metric should be printed,falseotherwise. Does only have an effect if the parameter--prediction-typeis set toprobabilitiesorscores.mean_squared_error(Default value =true)true, if evaluation scores according to the mean squared error metric should be printed,falseotherwise. Does only have an effect if the parameter--prediction-typeis set toprobabilitiesorscores.mean_absolute_error(Default value =true)true, if evaluation scores according to the mean absolute error metric should be printed,falseotherwise. Does only have an effect if the parameter--prediction-typeis set toprobabilitiesorscores.mean_absolute_percentage_error(Default value =true)true, if evaluation scores according to the mean absolute percentage error metric should be printed,falseotherwise. Does only have an effect if the parameter--prediction-typeis set toprobabilitiesorscores.rank_loss(Default value =true)true, if evaluation scores according to the rank loss should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set toprobabilitiesorscores.coverage_error(Default value =true)true, if evaluation scores according to the coverage error metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set toprobabilitiesorscores.lrap(Default value =true)true, if evaluation scores according to the label ranking average precision metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set toprobabilitiesorscores.dcg(Default value =true)true, if evaluation scores according to the discounted cumulative gain metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set toprobabilitiesorscores.ndcg(Default value =true)true, if evaluation scores according to the normalized discounted cumulative gain metric should be printed,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set toprobabilitiesorscores.
falseThe evaluation results are not printed on the console.
--store-evaluation(Default value =true)trueThe evaluation results in terms of common metrics are written into .csv files. Does only have an effect if the parameter--output-diris specified.decimals(Default value =0) The number of decimals to be used for evaluation scores or 0, if the number of decimals should not be restricted.percentage(Default value =true)true, if evaluation scores should be given as a percentage, if possible,falseotherwise.enable_all(Default value =true)true, if all supported metrics should be used unless specified otherwise,falseif all metrics should be disabled by default.hamming_loss(Default value =true)true, if evaluation scores according to the Hamming loss should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.hamming_accuracy(Default value =true)true, if evaluation scores according to the Hamming accuracy metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.subset_zero_one_loss(Default value =true)true, if evaluation scores according to the subset 0/1 loss should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.subset_accuracy(Default value =true)true, if evaluation scores according to the subset accuracy metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.micro_precision(Default value =true)true, if evaluation scores according to the micro-averaged precision metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.micro_recall(Default value =true)true, if evaluation scores according to the micro-averaged recall metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.micro_f1(Default value =true)true, if evaluation scores according to the micro-averaged F1-measure should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.micro_jaccard(Default value =true)true, if evaluation scores according to the micro-averaged Jaccard metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.macro_precision(Default value =true)true, if evaluation scores according to the macro-averaged precision metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.macro_recall(Default value =true)true, if evaluation scores according to the macro-averaged recall metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.macro_f1(Default value =true)true, if evaluation scores according to the macro-averaged F1-measure should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.macro_jaccard(Default value =true)true, if evaluation scores according to the macro-averaged Jaccard metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.example_wise_precision(Default value =true)true, if evaluation scores according to the example-wise precision metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.example_wise_recall(Default value =true)true, if evaluation scores according to the example-wise recall metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.example_wise_f1(Default value =true)true, if evaluation scores according to the example-wise F1-measure should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.example_wise_jaccard(Default value =true)true, if evaluation scores according to the example-wise Jaccard metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set tolabels.accuracy(Default value =true)true, if evaluation scores according to the accuracy metric should be stored,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.zero_one_loss(Default value =true)true, if evaluation scores according to the 0/1 loss should be stored,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.precision(Default value =true)true, if evaluation scores according to the precision metric should be stored,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.recall(Default value =true)true, if evaluation scores according to the recall metric should be stored,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.f1(Default value =true)true, if evaluation scores according to the F1-measure should be stored,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.jaccard(Default value =true)true, if evaluation scores according to the Jaccard metric should be stored,falseotherwise. Does only have an effect when dealing with single-label data and if the parameter--prediction-typeis set tolabels.mean_absolute_error(Default value =true)true, if evaluation scores according to the mean absolute error metric should be stored,falseotherwise. Does only have an effect if the parameter--prediction-typeis set toprobabilitiesorscores.mean_squared_error(Default value =true)true, if evaluation scores according to the mean squared error metric should be stored,falseotherwise. Does only have an effect if the parameter--prediction-typeis set toprobabilitiesorscores.mean_absolute_error(Default value =true)true, if evaluation scores according to the mean absolute error metric should be stored,falseotherwise. Does only have an effect if the parameter--prediction-typeis set toprobabilitiesorscores.mean_absolute_percentage_error(Default value =true)true, if evaluation scores according to the mean absolute percentage error metric should be stored,falseotherwise. Does only have an effect if the parameter--prediction-typeis set toprobabilitiesorscores.rank_loss(Default value =true)true, if evaluation scores according to the rank loss should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set toprobabilitiesorscores.coverage_error(Default value =true)true, if evaluation scores according to the coverage error metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set toprobabilitiesorscores.lrap(Default value =true)true, if evaluation scores according to the label ranking average precision metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set toprobabilitiesorscores.dcg(Default value =true)true, if evaluation scores according to the discounted cumulative gain metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set toprobabilitiesorscores.ndcg(Default value =true)true, if evaluation scores according to the normalized discounted cumulative gain metric should be stored,falseotherwise. Does only have an effect when dealing with multi-label data and if the parameter--prediction-typeis set toprobabilitiesorscores.training_time(Default value =true)true, if the time that was needed for training should be stored,falseotherwise.prediction_time(Default value =true)true, if the time that was needed for prediction should be stored,falseotherwise.
falseThe evaluation results are not written into .csv files.
Predictions¶
--print-predictions(Default value =false)trueThe predictions for individual examples and outputs are printed on the console.decimals(Default value =2) The number of decimals to be used for real-valued predictions or 0, if the number of decimals should not be restricted.
falseThe predictions are not printed on the console.
--store-predictions(Default value =false)trueThe predictions for individual examples and outputs are written into ARFF files. Does only have an effect if the parameter--output-diris specified.decimals(Default value =0) The number of decimals to be used for real-valued predictions or 0, if the number of decimals should not be restricted.
falsePredictions are not written into ARFF files.
Prediction Characteristics¶
--print-prediction-characteristics(Default value =false)trueThe characteristics of binary predictions are printed on the console. Does only have an effect if the parameter--predict-probabilitiesis set tofalse.decimals(Default value =2) The number of decimals to be used for characteristics or 0, if the number of decimals should not be restricted.percentage(Default value =true)true, if the characteristics should be given as a percentage, if possible,falseotherwise.outputs(Default value =true)true, if the number of outputs should be printed,falseotherwise.output_density(Default value =true)true, if the density of the ground truth matrix should be printed,falseotherwise.output_sparsity(Default value =true)true, if the sparsity of the ground truth matrix should be printed,falseotherwise.label_imbalance_ratio(Default value =true, classification only)true, if the label imbalance ratio should be printed,falseotherwise.label_cardinality(Default value =true, classification only)true, if the average label cardinality should be printed,falseotherwise.distinct_label_vectors(Default value =true, classification only)true, if the number of distinct label vectors should be printed,falseotherwise.
falseThe characteristics of predictions are not printed on the console.
--store-prediction-characteristics(Default value =false)trueThe characteristics of binary predictions are written into .csv files. Does only have an effect if the parameter--predict-probabilitiesis set tofalse.decimals(Default value =0) The number of decimals to be used for characteristics or 0, if the number of decimals should not be restricted.percentage(Default value =true)true, if the characteristics should be given as a percentage, if possible,falseotherwise.outputs(Default value =true)true, if the number of outputs should be stored,falseotherwise.output_density(Default value =true)true, if the density of the ground truth matrix should be stored,falseotherwise.output_sparsity(Default value =true)true, if the sparsity of the ground truth matrix should be stored,falseotherwise.label_imbalance_ratio(Default value =true, classification only)true, if the label imbalance ratio should be stored,falseotherwise.label_cardinality(Default value =true, classification only)true, if the average label cardinality should be stored,falseotherwise.distinct_label_vectors(Default value =true, classification only)true, if the number of distinct label vectors should be stored,falseotherwise.
falseThe characteristics of predictions are not written into .csv files.
Data Characteristics¶
--print-data-characteristics(Default value =false)trueThe characteristics of the training data set are printed on the consoledecimals(Default value =2) The number of decimals to be used for characteristics or 0, if the number of decimals should not be restricted.percentage(Default value =true)true, if the characteristics should be given as a percentage, if possible,falseotherwise.outputs(Default value =true)true, if the number of outputs should be printed,falseotherwise.output_density(Default value =true)true, if the density of the ground truth matrix should be printed,falseotherwise.output_sparsity(Default value =true)true, if the sparsity of the ground truth matrix should be printed,falseotherwise.label_imbalance_ratio(Default value =true, classification only)true, if the label imbalance ratio should be printed,falseotherwise.label_cardinality(Default value =true, classification only)true, if the average label cardinality should be printed,falseotherwise.distinct_label_vectors(Default value =true, classification only)true, if the number of distinct label vectors should be printed,falseotherwise.examples(Default value =true)true, if the number of examples should be printed,falseotherwise.features(Default value =true)true, if the number of features should be printed,falseotherwise.numerical_features(Default value =true)true, if the number of numerical features should be printed,falseotherwise.nominal_features(Default value =true)true, if the number of nominal features should be printed,falseotherwise.feature_density(Default value =true)true, if the feature density should be printed,falseotherwise.feature_sparsity(Default value =true)true, if the feature sparsity should be printed,falseotherwise.
falseThe characteristics of the training data set are not printed on the console
--store-data-characteristics(Default value =false)trueThe characteristics of the training data set are written into a .csv file. Does only have an effect if the parameter--output-diris specified.decimals(Default value =0) The number of decimals to be used for characteristics or 0, if the number of decimals should not be restricted.percentage(Default value =true)true, if the characteristics should be given as a percentage, if possible,falseotherwise.outputs(Default value =true)true, if the number of outputs should be stored,falseotherwise.output_density(Default value =true)true, if the density of the ground truth matrix should be stored,falseotherwise.output_sparsity(Default value =true)true, if the sparsity of the ground truth matrix should be stored,falseotherwise.label_imbalance_ratio(Default value =true, classification only)true, if the label imbalance ratio should be stored,falseotherwise.label_cardinality(Default value =true, classification only)true, if the average label cardinality should be stored,falseotherwise.distinct_label_vectors(Default value =true, classification only)true, if the number of distinct label vectors should be stored,falseotherwise.examples(Default value =true)true, if the number of examples should be stored,falseotherwise.features(Default value =true)true, if the number of features should be stored,falseotherwise.numerical_features(Default value =true)true, if the number of numerical features should be stored,falseotherwise.nominal_features(Default value =true)true, if the number of nominal features should be stored,falseotherwise.feature_density(Default value =true)true, if the feature density should be stored,falseotherwise.feature_sparsity(Default value =true)true, if the feature sparsity should be stored,falseotherwise.
falseThe characteristics of the training data set are not written into a .csv file.
Label Vectors¶
--print-label-vectors(Default value =false, classification only)trueThe unique label vectors contained in the training data are printed on the console. The following options may be specified using the bracket notation:sparse(Default value =false)true, if a sparse representation of label vectors should be used,falseotherwise.
falseThe unique label vectors contained in the training data are not printed on the console.
--store-label-vectors(Default value =false, classification only)trueThe unique label vectors contained in the training data are written into a .csv file. Does only have an effect if the parameter`--output-diris specified. The following options may be specified using the bracket notation:sparse(Default value =false)true, if a sparse representation of label vectors should be used,falseotherwise.
falseThe unique label vectors contained in the training data are not written into a .csv file.
Model Characteristics¶
--print-model-characteristics(Default value =false)trueThe characteristics of rule models are printed on the consolefalseThe characteristics of rule models are not printed on the console
--store-model-characteristics(Default value =false)
Rules¶
--print-rules(Default value =false)trueThe induced rules are printed on the console. The following options may be specified using the bracket notation:print_feature_names(Default value =true)true, if the names of features should be printed instead of their indices,falseotherwise.print_output_names(Default value =true)true, if the names of outputs should be printed instead of their indices,falseotherwise.print_nominal_values(Default value =true)true, if the names of nominal values should be printed instead of their numerical representation,falseotherwise.print_bodies(Default value =true)true, if the bodies of rules should be printed,falseotherwise.print_heads(Default value =true)true, if the heads of rules should be printed,falseotherwise.decimals_body(Default value =2) The number of decimals to be used for numerical thresholds of conditions in a rule’s body or 0, if the number of decimals should not be restricted.decimals_head(Default value =2) The number of decimals to be used for predictions in a rule’s head or 0, if the number of decimals should not be restricted.
falseThe induced rules are not printed on the console.
--store-rules(Default value =false)trueThe induced rules are written into a TXT file. Does only have an effect if the parameter--output-diris specified. The following options may be specified using the bracket notation:print_feature_names(Default value =true)true, if the names of features should be printed instead of their indices,falseotherwise.print_output_names(Default value =true)true, if the names of outputs should be printed instead of their indices,falseotherwise.print_nominal_values(Default value =true)true, if the names of nominal values should be printed instead of their numerical representation,falseotherwise.print_bodies(Default value =true)true, if the bodies of rules should be printed,falseotherwise.print_heads(Default value =true)true, if the heads of rules should be printed,falseotherwise.decimals_body(Default value =2) The number of decimals to be used for numerical thresholds of conditions in a rule’s body or 0, if the number of decimals should not be restricted.decimals_head(Default value =2) The number of decimals to be used for predictions in a rule’s head or 0, if the number of decimals should not be restricted.
falseThe induced rules are not written into a TXT file.
Probability Calibration Models¶
--print-marginal-probability-calibration-model(Default value =false)trueThe model for the calibration of marginal probabilities is printed on the console. The following options may be specified using the bracket notation:decimals(Default value =2) The number of decimals to be used for thresholds and probabilities or 0, if the number of decimals should not be restricted.
falseThe model for the calibration of marginal probabilities is not printed on the console.
--store-marginal-probability-calibration-model(Default value =false)trueThe model for the calibration of marginal probabilities is written into a .csv file. Does only have an effect if the parameter--output-diris specified. The following options may be specified using the bracket notation:decimals(Default value =0) The number of decimals to be used for thresholds and probabilities or 0, if the number of decimals should not be restricted.
falseThe model for the calibration of marginal probabilities is not written into a .csv file.
--print-joint-probability-calibration-model(Default value =false)trueThe model for the calibration of joint probabilities is printed on the console. The following options may be specified using the bracket notation:decimals(Default value =2) The number of decimals to be used for thresholds and probabilities or 0, if the number of decimals should not be restricted.
falseThe model for the calibration of joint probabilities is not printed on the console.
--store-joint-probability-calibration-model(Default value =false)trueThe model for the calibration of joint probabilities is written into a .csv file. Does only have an effect if the parameter--output-diris specified. The following options may be specified using the bracket notation:decimals(Default value =2) The number of decimals to be used for thresholds and probabilities or 0, if the number of decimals should not be restricted.
falseThe model for the calibration of joint probabilities is not written into a .csv file.
Setting Algorithmic Parameters¶
In addition to the command line arguments that are discussed above, it is often desirable to not rely on the default configuration of the BOOMER algorithm in an experiment, but to use a custom configuration. For this purpose, all the algorithmic parameters that are discussed in the section Overview of Parameters may be set by providing corresponding arguments to the command line API.
In accordance with the syntax that is typically used by command line programs, the parameter names must be given according to the following syntax that slightly differs from the names that are used by the programmatic Python API:
All argument names must start with two leading dashes (
--).Underscores (
_) must be replaced with dashes (-).
For example, the value of the parameter feature_binning may be set as follows:
testbed mlrl.boosting \
--data-dir /path/to/datasets/ \
--dataset dataset-name \
--feature-binning equal-width
testbed mlrl.seco \
--data-dir /path/to/datasets/ \
--dataset dataset-name \
--feature-binning equal-width
Some algorithmic parameters, including the parameter feature_binning, allow to specify additional options as key-value pairs by using a bracket notation. This is also supported by the command line API, where the options may not contain any spaces and special characters like { or } must be escaped by using single-quotes ('):
testbed mlrl.boosting\
--data-dir /path/to/datasets/ \
--dataset dataset-name \
--feature-binning equal-width'{bin_ratio=0.33,min_bins=2,max_bins=64}'
testbed mlrl.seco \
--data-dir /path/to/datasets/ \
--dataset dataset-name \
--feature-binning equal-width'{bin_ratio=0.33,min_bins=2,max_bins=64}'