Using the Command Line API

As an alternative to using algorithms provided by this project in your own Python program (see Using the Python API), the command line API that is provided by the package mlrl-testbed (see Installation) can be used to run experiments without the need to write code. Currently, it provides the following functionalities:

  • The predictive performance in terms of commonly used evaluation measures can be assessed by using predefined splits of a dataset into training and test data or via cross validation.

  • Experimental results can be written into output files. This includes evaluation scores, the predictions of a model, textual representations of rules, as well as the characteristics of models or datasets.

  • Models can be stored on disk and reloaded for later use.

Running Experiments

In the following, a minimal working example of how to use the command line API for applying the BOOMER algorithm, or the SeCO algorithm, to a particular dataset is shown:

boomer --data-dir /path/to/datasets/ --dataset dataset-name
seco --data-dir /path/to/datasets/ --dataset dataset-name

Both arguments that are included in the above command are mandatory:

  • --data-dir An absolute or relative path to the directory where the data set files are located.

  • --dataset The name of the data set files (without suffix).

The program expects the data set files to be provided in the Mulan format. It requires two files to be present in the specified directory:

  1. An .arff file that specifies the feature values and ground truth labels of the training examples.

  2. An .xml file that specifies the names of the labels.

The Mulan dataset format is commonly used for benchmark datasets that allow to compare the performance of different machine learning approaches in empirical studies. A collection of publicly available benchmark datasets is available here.

If an .xml file is not provided, the program tries to retrieve the number of labels from the @relation declaration that is contained in the .arff file, as it is intended by the MEKA project’s dataset format. According to the MEKA format, the number of labels may be specified by including the substring “-C L” in the @relation name, where “L” is the number of leading attributes in the dataset that should be treated as labels.

Optional Arguments

In addition to the mandatory arguments that must be provided to the command line API for specifying the dataset used for training, a wide variety of optional arguments for customizing the program’s behavior are available as well. An overview of all available command line arguments is provided in the section Overview of Arguments. For example, they can be used to specify an output directory, where experimental results should be stored:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --output-dir /path/to/output/
seco --data-dir /path/to/datasets/ --dataset dataset-name --output-dir /path/to/output/

Moreover, algorithmic parameters that control the behavior of the machine learning algorithm can be set via command line arguments as well. For example, as shown in the section Setting Algorithmic Parameters, the value of the parameter feature_binning can be specified as follows:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --feature-binning equal-width
seco --data-dir /path/to/datasets/ --dataset dataset-name --feature-binning equal-width

Some algorithmic parameters, including the parameter feature_binning, come with additional options in the form of key-value pairs. They can be specified by using a Bracket Notation as shown below:

boomer --data-dir /path/to/datasets/ --dataset dataset-name --feature-binning equal-width'{bin_ratio=0.33,min_bins=2,max_bins=64}'
seco --data-dir /path/to/datasets/ --dataset dataset-name --feature-binning equal-width'{bin_ratio=0.33,min_bins=2,max_bins=64}'

Bracket Notation

Each algorithmic parameter is identified by an unique name. Depending on the type of a parameter, it either accepts numbers as possible values or allows to specify a string that corresponds to a predefined set of possible values (boolean values are also represented as strings).

In addition to the specified value, some parameters allow to provide additional options as key-value pairs. These options must be provided by using the following bracket notation:

'value{key1=value1,key2=value2}'

For example, the parameter feature_binning allows to provide additional options and may be configured as follows:

'equal-width{bin_ratio=0.33,min_bins=2,max_bins=64}'