mlrl.testbed package

Submodules

mlrl.testbed.args module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides functions for parsing command line arguments.

class mlrl.testbed.args.LogLevel(value)

Bases: enum.Enum

An enumeration.

CRITICAL = 'critical'
DEBUG = 'debug'
ERROR = 'error'
FATAL = 'fatal'
INFO = 'info'
NOTSET = 'notset'
WARN = 'warn'
WARNING = 'warning'
mlrl.testbed.args.add_learner_arguments(parser: argparse.ArgumentParser)
mlrl.testbed.args.add_log_level_argument(parser: argparse.ArgumentParser)
mlrl.testbed.args.add_random_state_argument(parser: argparse.ArgumentParser)
mlrl.testbed.args.add_rule_learner_arguments(parser: argparse.ArgumentParser)
mlrl.testbed.args.boolean_string(s)
mlrl.testbed.args.current_fold_string(s)
mlrl.testbed.args.log_level(s)
mlrl.testbed.args.optional_string(s)

mlrl.testbed.bbc_cv module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Implements “Bootstrap Bias Corrected Cross Validation” (BBC-CV) for evaluating different configurations of a learner and estimating unbiased performance estimations (see https://link.springer.com/article/10.1007/s10994-018-5714-4).

class mlrl.testbed.bbc_cv.BbcCv(configurations: List[dict], adapter: mlrl.testbed.bbc_cv.BbcCvAdapter, bootstrapping: mlrl.testbed.bbc_cv.Bootstrapping, learner: mlrl.common.learners.Learner)

Bases: mlrl.testbed.interfaces.Randomized

An implementation of “Bootstrap Bias Corrected Cross Validation” (BBC-CV).

evaluate(observer: mlrl.testbed.bbc_cv.BbcCvObserver)
Parameters

observer – The BbcCvObserver to be used

store_predictions()
class mlrl.testbed.bbc_cv.BbcCvAdapter(data_set: mlrl.testbed.training.DataSet, num_folds: int, model_dir: str)

Bases: mlrl.testbed.training.CrossValidation

An adapter that must be implemented for each type of model to be used with BBC-CV to obtain predictions for given test examples.

fit(x, y)
predict(x)
run()
class mlrl.testbed.bbc_cv.BbcCvObserver

Bases: abc.ABC

A base class for all observers that should be notified about the predictions and ground truth labellings that result from applying the BBC-CV method.

abstract evaluate(configurations: List[dict], meta_data: mlrl.testbed.data.MetaData, ground_truth_tuning: numpy.ndarray, predictions_tuning: numpy.ndarray, ground_truth_test: numpy.ndarray, predictions_test: numpy.ndarray, current_bootstrap: int, num_bootstraps: int)
Parameters
  • configurations – The configurations that have been provided to the BBC-CV method

  • meta_data – The meta data of the data set

  • ground_truth_tuning – The ground truth of the examples that belong to the tuning set

  • predictions_tuning – The predictions for the examples that belong to the tuning set

  • ground_truth_test – The ground truth of the examples that belong to the test set

  • predictions_test – The predictions for the examples that belong to the test set

  • current_bootstrap – The current bootstrap iteration

  • num_bootstraps – The total number of bootstrap iterations

class mlrl.testbed.bbc_cv.Bootstrapping

Bases: mlrl.testbed.interfaces.Randomized

abstract bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)
class mlrl.testbed.bbc_cv.CV(data_set: mlrl.testbed.training.DataSet, num_folds: int, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)

Bases: mlrl.testbed.training.CrossValidation

class mlrl.testbed.bbc_cv.CVBootstrapping(data_set: mlrl.testbed.training.DataSet, num_folds: int)

Bases: mlrl.testbed.bbc_cv.Bootstrapping

bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)
class mlrl.testbed.bbc_cv.DefaultBbcCvObserver(target_measure, target_measure_is_loss: bool, output_dir: Optional[str] = None)

Bases: mlrl.testbed.bbc_cv.BbcCvObserver

An observer that determines the best configuration per bootstrap iteration and computes the evaluation measures averaged over all iterations.

evaluate(configurations: List[dict], meta_data: mlrl.testbed.data.MetaData, ground_truth_tuning: numpy.ndarray, predictions_tuning: numpy.ndarray, ground_truth_test: numpy.ndarray, predictions_test: numpy.ndarray, current_bootstrap: int, num_bootstraps: int)
Parameters
  • configurations – The configurations that have been provided to the BBC-CV method

  • meta_data – The meta data of the data set

  • ground_truth_tuning – The ground truth of the examples that belong to the tuning set

  • predictions_tuning – The predictions for the examples that belong to the tuning set

  • ground_truth_test – The ground truth of the examples that belong to the test set

  • predictions_test – The predictions for the examples that belong to the test set

  • current_bootstrap – The current bootstrap iteration

  • num_bootstraps – The total number of bootstrap iterations

class mlrl.testbed.bbc_cv.DefaultBootstrapping(num_bootstraps: int)

Bases: mlrl.testbed.bbc_cv.Bootstrapping

bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)

mlrl.testbed.data module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides functions for handling multi-label data.

class mlrl.testbed.data.Attribute(attribute_name: str, attribute_type: mlrl.testbed.data.AttributeType, nominal_values: Optional[List[str]] = None)

Bases: object

Represents a numerical or nominal attribute that is contained by a data set.

class mlrl.testbed.data.AttributeType(value)

Bases: enum.Enum

All supported types of attributes.

NOMINAL = 2
NUMERIC = 1
class mlrl.testbed.data.Label(name: str)

Bases: mlrl.testbed.data.Attribute

Represents a label that is contained by a data set.

class mlrl.testbed.data.MetaData(attributes: List[mlrl.testbed.data.Attribute], labels: List[mlrl.testbed.data.Attribute], labels_at_start: bool)

Bases: object

Stores the meta data of a multi-label data set.

get_attribute_indices(attribute_type: Optional[mlrl.testbed.data.AttributeType] = None) List[int]

Returns a list that contains the indices of all attributes of a specific type (in ascending order).

Parameters

attribute_type – The type of the attributes whose indices should be returned or None, if all indices should be returned

Returns

A list that contains the indices of all attributes of the given type

mlrl.testbed.data.load_data_set(data_dir: str, arff_file_name: str, meta_data: mlrl.testbed.data.MetaData, feature_dtype=<class 'numpy.float32'>, label_dtype=<class 'numpy.uint8'>) -> (<class 'scipy.sparse._lil.lil_matrix'>, <class 'scipy.sparse._lil.lil_matrix'>)

Loads a multi-label data set from an ARFF file given its meta data.

Parameters
  • data_dir – The path of the directory that contains the ARFF file

  • arff_file_name – The name of the ARFF file (including the suffix)

  • meta_data – The meta data

  • feature_dtype – The requested dtype of the feature matrix

  • label_dtype – The requested dtype of the label matrix

Returns

A scipy.sparse.lil_matrix of type feature_dtype, shape (num_examples, num_features), representing the feature values of the examples, as well as a scipy.sparse.lil_matrix of type label_dtype, shape (num_examples, num_labels), representing the corresponding label vectors

mlrl.testbed.data.load_data_set_and_meta_data(data_dir: str, arff_file_name: str, xml_file_name: str, feature_dtype=<class 'numpy.float32'>, label_dtype=<class 'numpy.uint8'>) -> (<class 'scipy.sparse._lil.lil_matrix'>, <class 'scipy.sparse._lil.lil_matrix'>, <class 'mlrl.testbed.data.MetaData'>)

Loads a multi-label data set from an ARFF file and the corresponding Mulan XML file.

Parameters
  • data_dir – The path of the directory that contains the files

  • arff_file_name – The name of the ARFF file (including the suffix)

  • xml_file_name – The name of the XML file (including the suffix)

  • feature_dtype – The requested type of the feature matrix

  • label_dtype – The requested type of the label matrix

Returns

A scipy.sparse.lil_matrix of type feature_dtype, shape (num_examples, num_features), representing the feature values of the examples, a scipy.sparse.lil_matrix of type label_dtype, shape (num_examples, num_labels), representing the corresponding label vectors, as well as the data set’s meta data

mlrl.testbed.data.one_hot_encode(x, y, meta_data: mlrl.testbed.data.MetaData, encoder=None)

One-hot encodes the nominal attributes contained in a data set, if any.

If the given feature matrix is sparse, it will be converted into a dense matrix. Also, an updated variant of the given meta data, where the attributes have been removed, will be returned, as the original attributes become invalid by applying one-hot-encoding.

Parameters
  • x – A np.ndarray or scipy.sparse.matrix, shape (num_examples, num_features), representing the features of the examples in the data set

  • y – A np.ndarray or scipy.sparse.matrix, shape (num_examples, num_labels), representing the labels of the examples in the data set

  • meta_data – The meta data of the data set

  • encoder – The ‘ColumnTransformer’ to be used or None, if a new encoder should be created

Returns

A np.ndarray, shape (num_examples, num_encoded_features), representing the encoded features of the given examples, the encoder that has been used, as well as the updated meta data

mlrl.testbed.data.save_arff_file(output_dir: str, arff_file_name: str, x: numpy.ndarray, y: numpy.ndarray, meta_data: mlrl.testbed.data.MetaData)

Saves a multi-label data set to an ARFF file.

Parameters
  • output_dir – The path of the directory where the ARFF file should be saved

  • arff_file_name – The name of the ARFF file (including the suffix)

  • x – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the features of the examples that are contained in the data set

  • y – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of the examples that are contained in the data set

  • meta_data – The meta data of the data set that should be saved

mlrl.testbed.data.save_data_set(output_dir: str, arff_file_name: str, x: numpy.ndarray, y: numpy.ndarray) mlrl.testbed.data.MetaData

Saves a multi-label data set to an ARFF file. All attributes in the data set are considered to be numerical.

Parameters
  • output_dir – The path of the directory where the ARFF file should be saved

  • arff_file_name – The name of the ARFF file (including the suffix)

  • x – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the features of the examples that are contained in the data set

  • y – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of the examples that are contained in the data set

Returns

The meta data of the data set that has been saved

mlrl.testbed.data.save_data_set_and_meta_data(output_dir: str, arff_file_name: str, xml_file_name: str, x: numpy.ndarray, y: numpy.ndarray) mlrl.testbed.data.MetaData

Saves a multi-label data set to an ARFF file and its meta data to a XML file. All attributes in the data set are considered to be numerical.

Parameters
  • output_dir – The path of the directory where the ARFF file and the XML file should be saved

  • arff_file_name – The name of the ARFF file (including the suffix)

  • xml_file_name – The name of the XML file (including the suffix)

  • x – An array of type float, shape (num_examples, num_features), representing the features of the examples that are contained in the data set

  • y – An array of type float, shape (num_examples, num_labels), representing the label vectors of the examples that are contained in the data set

Returns

The meta data of the data set that has been saved

mlrl.testbed.data.save_meta_data(output_dir: str, xml_file_name: str, meta_data: mlrl.testbed.data.MetaData)

Saves the meta data of a multi-label data set to a XML file.

Parameters
  • output_dir – The path of the directory where the XML file should be saved

  • xml_file_name – The name of the XML file (including the suffix)

  • meta_data – The meta data of the data set

mlrl.testbed.data_characteristics module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides functions to determine certain characteristics of multi-label data sets.

class mlrl.testbed.data_characteristics.DataCharacteristics(num_examples: int, num_nominal_features: int, num_numerical_features: int, feature_density: float, num_labels: int, label_density: float, avg_label_imbalance_ratio: float, avg_label_cardinality: float, num_distinct_label_vectors: int)

Bases: object

Stores characteristics of a multi-label data set.

class mlrl.testbed.data_characteristics.DataCharacteristicsCsvOutput(output_dir: str, clear_dir: bool = True)

Bases: mlrl.testbed.data_characteristics.DataCharacteristicsOutput

Writes the characteristics of a data set to a CSV file.

write_data_characteristics(experiment_name: str, characteristics: mlrl.testbed.data_characteristics.DataCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a data set to the output.

Parameters
  • experiment_name – The name of the experiment

  • characteristics – The characteristics of the data set

  • total_folds – The total number of folds

  • fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.data_characteristics.DataCharacteristicsLogOutput

Bases: mlrl.testbed.data_characteristics.DataCharacteristicsOutput

Outputs the characteristics of a data set using the logger.

write_data_characteristics(experiment_name: str, characteristics: mlrl.testbed.data_characteristics.DataCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a data set to the output.

Parameters
  • experiment_name – The name of the experiment

  • characteristics – The characteristics of the data set

  • total_folds – The total number of folds

  • fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.data_characteristics.DataCharacteristicsOutput

Bases: abc.ABC

An abstract base class for all outputs, the characteristics of a data set may be written to.

abstract write_data_characteristics(experiment_name: str, characteristics: mlrl.testbed.data_characteristics.DataCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a data set to the output.

Parameters
  • experiment_name – The name of the experiment

  • characteristics – The characteristics of the data set

  • total_folds – The total number of folds

  • fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.data_characteristics.DataCharacteristicsPrinter(outputs: List[mlrl.testbed.data_characteristics.DataCharacteristicsOutput])

Bases: object

A class that allows to print the characteristics of data sets.

print(experiment_name: str, x, y, meta_data: mlrl.testbed.data.MetaData, current_fold: int, num_folds: int)
Parameters
  • experiment_name – The name of the experiment

  • x – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the feature values

  • y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the ground truth labels

  • meta_data – The meta data of the data set

  • current_fold – The current fold

  • num_folds – The total number of folds

mlrl.testbed.data_characteristics.density(m) float

Calculates and returns the density of a given feature or label matrix.

Parameters

m – A numpy.ndarray or scipy.sparse matrix, shape (num_rows, num_cols), that stores the feature values of training examples or their labels

Returns

The fraction of non-zero elements in the given matrix among all elements

mlrl.testbed.data_characteristics.distinct_label_vectors(y) int

Determines and returns the number of distinct label vectors in a label matrix.

Parameters

y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of training examples

Returns

The number of distinct label vectors in the given matrix

mlrl.testbed.data_characteristics.label_cardinality(y) float

Calculates and returns the average label cardinality of a given label matrix.

Parameters

y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of training examples

Returns

The average number of relevant labels per training example

mlrl.testbed.data_characteristics.label_imbalance_ratio(y) float

Calculates and returns the average label imbalance ratio of a given label matrix.

Parameters

y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of training examples

Returns

The label imbalance ratio averaged over the available labels

mlrl.testbed.evaluation module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for evaluating the predictions or rankings provided by a multi-label learner according to different measures. The evaluation results can be written to one or several outputs, e.g. to the console or to a file.

class mlrl.testbed.evaluation.AbstractEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)

Bases: mlrl.testbed.evaluation.Evaluation

An abstract base class for all classes that evaluate the predictions provided by a classifier or ranker and allow to write the results to one or several outputs.

evaluate(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, first_fold: int, current_fold: int, last_fold: int, num_folds: int, train_time: float, predict_time: float)

Evaluates the predictions provided by a classifier or ranker.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the data set

  • predictions – The predictions provided by the classifier

  • ground_truth – The true labels

  • first_fold – The first cross validation fold or 0, if no cross validation is used

  • current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used

  • last_fold – The last cross validation fold or 0, if no cross validation is used

  • num_folds – The total number of cross validation folds or 1, if no cross validation is used

  • train_time – The time needed to train the model

  • predict_time – The time needed to make predictions

class mlrl.testbed.evaluation.ClassificationEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)

Bases: mlrl.testbed.evaluation.AbstractEvaluation

Evaluates the predictions of a single- or multi-label classifier according to commonly used bipartition measures.

class mlrl.testbed.evaluation.Evaluation

Bases: abc.ABC

An abstract base class for all classes that evaluate the predictions provided by a classifier or ranker.

abstract evaluate(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, first_fold: int, current_fold: int, last_fold: int, num_folds: int, train_time: float, predict_time: float)

Evaluates the predictions provided by a classifier or ranker.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the data set

  • predictions – The predictions provided by the classifier

  • ground_truth – The true labels

  • first_fold – The first cross validation fold or 0, if no cross validation is used

  • current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used

  • last_fold – The last cross validation fold or 0, if no cross validation is used

  • num_folds – The total number of cross validation folds or 1, if no cross validation is used

  • train_time – The time needed to train the model

  • predict_time – The time needed to make predictions

class mlrl.testbed.evaluation.EvaluationCsvOutput(output_dir: str, clear_dir: bool = True, output_predictions: bool = False, output_individual_folds: bool = True)

Bases: mlrl.testbed.evaluation.EvaluationOutput

Writes evaluation results to CSV files.

write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)

Writes an evaluation result to the output.

Parameters
  • experiment_name – The name of the experiment

  • evaluation_result – The evaluation result to be written

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)

Writes predictions to the output.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the data set

  • predictions – The predictions

  • ground_truth – The ground truth

  • total_folds – The total number of folds

  • fold – The fold for which the predictions should be written or None, if no cross validation is used

class mlrl.testbed.evaluation.EvaluationLogOutput(output_predictions: bool = False, output_individual_folds: bool = True)

Bases: mlrl.testbed.evaluation.EvaluationOutput

Outputs evaluation result using the logger.

write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)

Writes an evaluation result to the output.

Parameters
  • experiment_name – The name of the experiment

  • evaluation_result – The evaluation result to be written

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)

Writes predictions to the output.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the data set

  • predictions – The predictions

  • ground_truth – The ground truth

  • total_folds – The total number of folds

  • fold – The fold for which the predictions should be written or None, if no cross validation is used

class mlrl.testbed.evaluation.EvaluationOutput(output_predictions: bool, output_individual_folds: bool)

Bases: abc.ABC

An abstract base class for all outputs, evaluation results may be written to.

abstract write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)

Writes an evaluation result to the output.

Parameters
  • experiment_name – The name of the experiment

  • evaluation_result – The evaluation result to be written

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

abstract write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)

Writes predictions to the output.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the data set

  • predictions – The predictions

  • ground_truth – The ground truth

  • total_folds – The total number of folds

  • fold – The fold for which the predictions should be written or None, if no cross validation is used

class mlrl.testbed.evaluation.EvaluationResult

Bases: object

Stores the evaluation results according to different measures.

avg(name: str) -> (<class 'float'>, <class 'float'>)

Returns the score and standard deviation according to a specific measure averaged over all available folds.

Parameters

name – The name of the measure

Returns

A tuple consisting of the averaged score and standard deviation

avg_dict() Dict
dict(fold: int) Dict
get(name: str, fold: int) float

Returns the score according to a specific measure and fold.

Parameters
  • name – The name of the measure

  • fold – The fold the score corresponds to

Returns

The score

put(name: str, score: float, fold: int, num_folds: int)

Adds a new score according to a specific measure to the evaluation result.

Parameters
  • name – The name of the measure

  • score – The score according to the measure

  • fold – The fold the score corresponds to

  • num_folds – The total number of cross validation folds

class mlrl.testbed.evaluation.RankingEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)

Bases: mlrl.testbed.evaluation.AbstractEvaluation

Evaluates the predictions of a multi-label ranker according to commonly used ranking measures.

mlrl.testbed.experiments module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for performing experiments.

class mlrl.testbed.experiments.Experiment(base_learner: mlrl.common.learners.Learner, data_set: mlrl.testbed.training.DataSet, num_folds: int = 1, current_fold: int = - 1, train_evaluation: Optional[mlrl.testbed.evaluation.Evaluation] = None, test_evaluation: Optional[mlrl.testbed.evaluation.Evaluation] = None, parameter_input: Optional[mlrl.testbed.parameters.ParameterInput] = None, model_printer: Optional[mlrl.testbed.model_characteristics.ModelPrinter] = None, model_characteristics_printer: Optional[mlrl.testbed.model_characteristics.ModelCharacteristicsPrinter] = None, data_characteristics_printer: Optional[mlrl.testbed.data_characteristics.DataCharacteristicsPrinter] = None, persistence: Optional[mlrl.testbed.persistence.ModelPersistence] = None)

Bases: mlrl.testbed.training.CrossValidation, abc.ABC

An experiment that trains and evaluates a single multi-label classifier or ranker on a specific data set using cross validation or separate training and test sets.

run()

mlrl.testbed.interfaces module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides common interfaces that are implemented by several classes.

class mlrl.testbed.interfaces.Randomized

Bases: abc.ABC

A base class for all classifiers, rankers or modules that use RNGs.

Attributes

random_state The seed to be used by RNGs

random_state: int = 1

mlrl.testbed.io module

Author Michael Rapp (michael.rapp.ml@gmail.com)

Provides functions for writing and reading files.

mlrl.testbed.io.clear_directory(directory: str)

Deletes all files contained in a directory (excluding subdirectories).

Parameters

directory – The directory to be cleared

mlrl.testbed.io.create_csv_dict_reader(csv_file) csv.DictReader

Creates and return a DictReader that allows to read from a CSV file.

Parameters

csv_file – The CSV file

Returns

The ‘DictReader’ that has been created

mlrl.testbed.io.create_csv_dict_writer(csv_file, header) csv.DictWriter

Creates and returns a DictWriter that allows to write a dictionary to a CSV file.

Parameters
  • csv_file – The CSV file

  • header – A list that contains the headers of the CSV file. They must correspond to the keys in the directory that should be written to the file

Returns

The DictWriter that has been created

mlrl.testbed.io.get_file_name(name: str, suffix: str)

Returns a file name, including a suffix.

Parameters
  • name – The name of the file (without suffix)

  • suffix – The suffix of the file

Returns

The file name

mlrl.testbed.io.get_file_name_per_fold(name: str, suffix: str, fold: int)

Returns a file name, including a suffix, that corresponds to a certain fold.

Parameters
  • name – The name of the file (without suffix)

  • suffix – The suffix of the file

  • fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold

Returns

The file name

mlrl.testbed.io.open_readable_csv_file(directory: str, file_name: str, fold: int)

Opens a CSV file to be read from.

Parameters
  • directory – The directory where the file is located

  • file_name – The name of the file to be opened (without suffix)

  • fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold

Returns

The file that has been opened

mlrl.testbed.io.open_writable_csv_file(directory: str, file_name: str, fold: Optional[int] = None, append: bool = False)

Opens a CSV file to be written to.

Parameters
  • directory – The directory where the file is located

  • file_name – The name of the file to be opened (without suffix)

  • fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold

  • append – True, if new data should be appended to the file, if it already exists, False otherwise

Returns

The file that has been opened

mlrl.testbed.io.open_writable_txt_file(directory: str, file_name: str, fold: Optional[int] = None, append: bool = False)

Opens a text file to be written to.

Parameters
  • directory – The directory where the file is located

  • file_name – The name of the file to be opened (without suffix)

  • fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold

  • append – True, if new data should be appended to the file, if it already exists, False otherwise

Returns

The file that has been opened

mlrl.testbed.io.write_xml_file(xml_file, root_element: xml.etree.ElementTree.Element, encoding='utf-8')

Writes a XML structure to a file.

Parameters
  • xml_file – The XML file

  • root_element – The root element of the XML structure

  • encoding – The encoding to be used

mlrl.testbed.main_boomer module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

class mlrl.testbed.main_boomer.BoomerRunnable

Bases: mlrl.testbed.runnables.RuleLearnerRunnable

mlrl.testbed.main_boomer.main()

mlrl.testbed.main_seco module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

class mlrl.testbed.main_seco.SeCoRunnable

Bases: mlrl.testbed.runnables.RuleLearnerRunnable

mlrl.testbed.main_seco.main()

mlrl.testbed.model_characteristics module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for printing textual representations of models. The models can be written to one or several outputs, e.g. to the console or to a file.

class mlrl.testbed.model_characteristics.ModelCharacteristicsPrinter

Bases: abc.ABC

A class that allows to print the characteristics of a Learner’s model.

print(experiment_name: str, learner: mlrl.common.learners.Learner, current_fold: int, num_folds: int)
class mlrl.testbed.model_characteristics.ModelPrinter(print_options: str, outputs: List[mlrl.testbed.model_characteristics.ModelPrinterOutput])

Bases: abc.ABC

An abstract base class for all classes that allow to print a textual representation of a Learner’s model.

print(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, learner: mlrl.common.learners.Learner, current_fold: int, num_folds: int)

Prints a textual representation of a Learner’s model.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the training data set

  • learner – The learner

  • current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used

  • num_folds – The total number of cross validation folds or 1, if no cross validation is used

class mlrl.testbed.model_characteristics.ModelPrinterLogOutput

Bases: mlrl.testbed.model_characteristics.ModelPrinterOutput

Outputs the textual representation of a model using the logger.

write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)

Write a textual representation of a model to the output.

Parameters
  • experiment_name – The name of the experiment

  • model – The textual representation of the model

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

class mlrl.testbed.model_characteristics.ModelPrinterOutput

Bases: abc.ABC

An abstract base class for all outputs, textual representations of models may be written to.

abstract write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)

Write a textual representation of a model to the output.

Parameters
  • experiment_name – The name of the experiment

  • model – The textual representation of the model

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

class mlrl.testbed.model_characteristics.ModelPrinterTxtOutput(output_dir: str, clear_dir: bool = True)

Bases: mlrl.testbed.model_characteristics.ModelPrinterOutput

Writes the textual representation of a model to a text file.

write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)

Write a textual representation of a model to the output.

Parameters
  • experiment_name – The name of the experiment

  • model – The textual representation of the model

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

class mlrl.testbed.model_characteristics.RuleModelCharacteristics(default_rule_index: int, default_rule_pos_predictions: int, default_rule_neg_predictions: int, num_leq: numpy.ndarray, num_gr: numpy.ndarray, num_eq: numpy.ndarray, num_neq: numpy.ndarray, num_pos_predictions: numpy.ndarray, num_neg_predictions: numpy.ndarray)

Bases: object

Stores the characteristics of a RuleModel.

class mlrl.testbed.model_characteristics.RuleModelCharacteristicsCsvOutput(output_dir: str, clear_dir: bool = True)

Bases: mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutput

Writes the characteristics of a RuleModel to a CSV file.

COL_CONDITIONS = 'conditions'
COL_EQ_CONDITIONS = 'conditions using == operator'
COL_GR_CONDITIONS = 'conditions using > operator'
COL_LEQ_CONDITIONS = 'conditions using <= operator'
COL_NEG_PREDICTIONS = 'neg. predictions'
COL_NEQ_CONDITIONS = 'conditions using != operator'
COL_NOMINAL_CONDITIONS = 'nominal conditions'
COL_NUMERICAL_CONDITIONS = 'numerical conditions'
COL_POS_PREDICTIONS = 'pos. predictions'
COL_PREDICTIONS = 'predictions'
COL_RULE_NAME = 'Rule'
write_model_characteristics(experiment_name: str, characteristics: mlrl.testbed.model_characteristics.RuleModelCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a RuleModel to the output.

Parameters
  • experiment_name – The name of the experiment

  • characteristics – The characteristics of the model

  • total_folds – The total number of folds

  • fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.model_characteristics.RuleModelCharacteristicsLogOutput

Bases: mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutput

Outputs the characteristics of a RuleModel using the logger.

write_model_characteristics(experiment_name: str, characteristics: mlrl.testbed.model_characteristics.RuleModelCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a RuleModel to the output.

Parameters
  • experiment_name – The name of the experiment

  • characteristics – The characteristics of the model

  • total_folds – The total number of folds

  • fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutput

Bases: abc.ABC

An abstract base class for all outputs, the characteristics of a MLRuleLearner’s model may be written to.

abstract write_model_characteristics(experiment_name: str, characteristics: mlrl.testbed.model_characteristics.RuleModelCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a RuleModel to the output.

Parameters
  • experiment_name – The name of the experiment

  • characteristics – The characteristics of the model

  • total_folds – The total number of folds

  • fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.model_characteristics.RuleModelCharacteristicsPrinter(outputs: List[mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutput])

Bases: mlrl.testbed.model_characteristics.ModelCharacteristicsPrinter

A class that allows to print the characteristics of MLRuleLearner’s model.

class mlrl.testbed.model_characteristics.RuleModelCharacteristicsVisitor

Bases: mlrl.common.cython.rule_model.RuleModelVisitor

A visitor that allows to determine the characteristics of a RuleModel.

visit_complete_head(head: mlrl.common.cython.rule_model.CompleteHead)

Must be implemented by subclasses in order to visit the heads of rules that predict for all available labels.

Parameters

head – A CompleteHead to be visited

visit_conjunctive_body(body: mlrl.common.cython.rule_model.ConjunctiveBody)

Must be implemented by subclasses in order to visit the bodies of rule that are given as a conjunction of several conditions.

Parameters

body – A ConjunctiveBody to be visited

visit_empty_body(_: mlrl.common.cython.rule_model.EmptyBody)

Must be implemented by subclasses in order to visit bodies of rules that do not contain any conditions.

Parameters

body – An EmptyBody to be visited

visit_partial_head(head: mlrl.common.cython.rule_model.PartialHead)

Must be implemented by subclasses in order to visit the heads of rules that predict for a subset of the available labels.

Parameters

head – A PartialHead to be visited

class mlrl.testbed.model_characteristics.RuleModelFormatter(attributes: List[mlrl.testbed.data.Attribute], labels: List[mlrl.testbed.data.Attribute], print_feature_names: bool, print_label_names: bool, print_nominal_values: bool)

Bases: mlrl.common.cython.rule_model.RuleModelVisitor

Allows to create textual representation of the rules in a RuleModel.

get_text() str

Returns the textual representation that has been created via the format method.

Returns

The textual representation

visit_complete_head(head: mlrl.common.cython.rule_model.CompleteHead)

Must be implemented by subclasses in order to visit the heads of rules that predict for all available labels.

Parameters

head – A CompleteHead to be visited

visit_conjunctive_body(body: mlrl.common.cython.rule_model.ConjunctiveBody)

Must be implemented by subclasses in order to visit the bodies of rule that are given as a conjunction of several conditions.

Parameters

body – A ConjunctiveBody to be visited

visit_empty_body(_: mlrl.common.cython.rule_model.EmptyBody)

Must be implemented by subclasses in order to visit bodies of rules that do not contain any conditions.

Parameters

body – An EmptyBody to be visited

visit_partial_head(head: mlrl.common.cython.rule_model.PartialHead)

Must be implemented by subclasses in order to visit the heads of rules that predict for a subset of the available labels.

Parameters

head – A PartialHead to be visited

class mlrl.testbed.model_characteristics.RulePrinter(print_options: str, outputs: List[mlrl.testbed.model_characteristics.ModelPrinterOutput])

Bases: mlrl.testbed.model_characteristics.ModelPrinter

Allows to print a textual representation of a MLRuleLearner’s rule-based model.

mlrl.testbed.parameters module

Author Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for parameter tuning.

class mlrl.testbed.parameters.NestedCrossValidation(num_nested_folds: int)

Bases: mlrl.testbed.parameters.ParameterSearch

Allows to search for optimal parameters using (nested) cross validation.

search(meta_data: mlrl.testbed.data.MetaData, x, y, first_fold: int, current_fold: int, last_fold: int, num_folds: int)

Tests different parameter settings given a training data set.

Parameters
  • meta_data – The meta data of the training data set

  • x – The feature matrix of the training examples

  • y – The label matrix of the training examples

  • first_fold – The first fold or 0, if no cross validation is used

  • current_fold – The current fold starting at 0, or 0 if no cross validation is used

  • last_fold – The last fold or 0, if no cross validation is used

  • num_folds – The total number of cross validation folds or 1, if no cross validation is used

class mlrl.testbed.parameters.ParameterCsvInput(input_dir: str)

Bases: mlrl.testbed.parameters.ParameterInput

Reads parameter settings from CSV files.

read_parameters(fold: Optional[int] = None) dict

Reads a parameter setting from the input.

Parameters

fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

Returns

A dictionary that stores the parameters

class mlrl.testbed.parameters.ParameterCsvOutput(output_dir: str, clear_dir: bool = True)

Bases: mlrl.testbed.parameters.ParameterOutput

Writes parameter settings to CSV files.

write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)

Writes a parameter setting to the output.

Parameters
  • parameters – A dictionary that stores the parameters

  • score – The evaluation score that has been achieved using the parameter setting

  • total_folds – The total number of folds

  • fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

class mlrl.testbed.parameters.ParameterInput

Bases: abc.ABC

abstract read_parameters(fold: Optional[int] = None) dict

Reads a parameter setting from the input.

Parameters

fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

Returns

A dictionary that stores the parameters

class mlrl.testbed.parameters.ParameterLogOutput

Bases: mlrl.testbed.parameters.ParameterOutput

Outputs parameter settings using the logger.

write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)

Writes a parameter setting to the output.

Parameters
  • parameters – A dictionary that stores the parameters

  • score – The evaluation score that has been achieved using the parameter setting

  • total_folds – The total number of folds

  • fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

class mlrl.testbed.parameters.ParameterOutput

Bases: abc.ABC

An abstract base class for all outputs, parameter settings may be written to.

abstract write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)

Writes a parameter setting to the output.

Parameters
  • parameters – A dictionary that stores the parameters

  • score – The evaluation score that has been achieved using the parameter setting

  • total_folds – The total number of folds

  • fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

class mlrl.testbed.parameters.ParameterSearch

Bases: mlrl.testbed.interfaces.Randomized, abc.ABC

A base class for all classes that implement strategies to search for optimal parameters given a training data set.

abstract get_params()

Returns the best parameter setting tested so far.

Returns

A dictionary that stores the parameters

abstract get_score()

Returns the evaluation score that has been achieved using the best parameter setting.

Returns

An evaluation score

abstract search(meta_data: mlrl.testbed.data.MetaData, x, y, first_fold: int, current_fold: int, last_fold: int, num_folds: int)

Tests different parameter settings given a training data set.

Parameters
  • meta_data – The meta data of the training data set

  • x – The feature matrix of the training examples

  • y – The label matrix of the training examples

  • first_fold – The first fold or 0, if no cross validation is used

  • current_fold – The current fold starting at 0, or 0 if no cross validation is used

  • last_fold – The last fold or 0, if no cross validation is used

  • num_folds – The total number of cross validation folds or 1, if no cross validation is used

class mlrl.testbed.parameters.ParameterTuning(data_set: mlrl.testbed.training.DataSet, num_folds: int, current_fold: int, parameter_search: mlrl.testbed.parameters.ParameterSearch, *args: mlrl.testbed.parameters.ParameterOutput)

Bases: mlrl.testbed.training.CrossValidation

Allows to tune parameters for a single training data set or all training data sets that are used in cross validation using a ParameterSearch and writes the optimal parameters to one or several outputs.

mlrl.testbed.persistence module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for saving/loading models to/from disk.

class mlrl.testbed.persistence.ModelPersistence(model_dir: str)

Bases: object

Allows to save a model in a file and load it later.

load_model(model_name: str, fold: Optional[int] = None, raise_exception: bool = False)

Loads a model from a file.

Parameters
  • model_name – The name of the model to be loaded

  • fold – The fold, the model corresponds to, or None if no cross validation is used

  • raise_exception – True, if an exception should be raised if an error occurs, False, if None should be returned in such case

Returns

The loaded model

save_model(model, model_name: str, fold: Optional[int] = None)

Saves a model to a file.

Parameters
  • model – The model to be persisted

  • model_name – The name of the model to be persisted

  • fold – The fold, the model corresponds to, or None if no cross validation is used

mlrl.testbed.runnables module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides base classes for programs that can be configured via command line arguments.

class mlrl.testbed.runnables.RuleLearnerRunnable

Bases: mlrl.testbed.runnables.Runnable, abc.ABC

A base class for all programs that perform an experiment that involves training and evaluation of a rule learner.

class mlrl.testbed.runnables.Runnable

Bases: abc.ABC

A base class for all programs that can be configured via command line arguments.

run(parser: argparse.ArgumentParser)

mlrl.testbed.training module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for training and evaluating multi-label classifiers using either cross validation or separate training and test sets.

class mlrl.testbed.training.CrossValidation(data_set: mlrl.testbed.training.DataSet, num_folds: int, current_fold: int)

Bases: mlrl.testbed.interfaces.Randomized, abc.ABC

A base class for all classes that use cross validation or a train-test split to train and evaluate a multi-label classifier or ranker.

run()
class mlrl.testbed.training.DataSet(data_dir: str, data_set_name: str, use_one_hot_encoding: bool)

Bases: object

Stores the properties of a data set to be used for training and evaluating multi-label classifiers.

Module contents