mlrl.testbed package

Submodules

mlrl.testbed.args module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides functions for parsing command line arguments.

class mlrl.testbed.args.LogLevel(value)

Bases: enum.Enum

An enumeration.

CRITICAL = 'critical'

DEBUG = 'debug'

ERROR = 'error'

FATAL = 'fatal'

INFO = 'info'

NOTSET = 'notset'

WARN = 'warn'

WARNING = 'warning'

mlrl.testbed.args.add_learner_arguments(parser: argparse.ArgumentParser)

mlrl.testbed.args.add_log_level_argument(parser: argparse.ArgumentParser)

mlrl.testbed.args.add_random_state_argument(parser: argparse.ArgumentParser)

mlrl.testbed.args.add_rule_learner_arguments(parser: argparse.ArgumentParser)

mlrl.testbed.args.boolean_string(s)

mlrl.testbed.args.current_fold_string(s)

mlrl.testbed.args.log_level(s)

mlrl.testbed.args.optional_string(s)

mlrl.testbed.bbc_cv module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Implements “Bootstrap Bias Corrected Cross Validation” (BBC-CV) for evaluating different configurations of a learner and estimating unbiased performance estimations (see https://link.springer.com/article/10.1007/s10994-018-5714-4).

class mlrl.testbed.bbc_cv.BbcCv(configurations: List[dict], adapter: mlrl.testbed.bbc_cv.BbcCvAdapter, bootstrapping: mlrl.testbed.bbc_cv.Bootstrapping, learner: mlrl.common.learners.Learner)

Bases: mlrl.testbed.interfaces.Randomized

An implementation of “Bootstrap Bias Corrected Cross Validation” (BBC-CV).

evaluate(observer: mlrl.testbed.bbc_cv.BbcCvObserver)

Parameters: observer – The BbcCvObserver to be used

store_predictions()

class mlrl.testbed.bbc_cv.BbcCvAdapter(data_set: mlrl.testbed.training.DataSet, num_folds: int, model_dir: str)

Bases: mlrl.testbed.training.CrossValidation

An adapter that must be implemented for each type of model to be used with BBC-CV to obtain predictions for given test examples.

fit(x, y)

predict(x)

run()

class mlrl.testbed.bbc_cv.BbcCvObserver

Bases: abc.ABC

A base class for all observers that should be notified about the predictions and ground truth labellings that result from applying the BBC-CV method.

abstract evaluate(configurations: List[dict], meta_data: mlrl.testbed.data.MetaData, ground_truth_tuning: numpy.ndarray, predictions_tuning: numpy.ndarray, ground_truth_test: numpy.ndarray, predictions_test: numpy.ndarray, current_bootstrap: int, num_bootstraps: int)

Parameters

configurations – The configurations that have been provided to the BBC-CV method
meta_data – The meta data of the data set
ground_truth_tuning – The ground truth of the examples that belong to the tuning set
predictions_tuning – The predictions for the examples that belong to the tuning set
ground_truth_test – The ground truth of the examples that belong to the test set
predictions_test – The predictions for the examples that belong to the test set
current_bootstrap – The current bootstrap iteration
num_bootstraps – The total number of bootstrap iterations

class mlrl.testbed.bbc_cv.Bootstrapping

Bases: mlrl.testbed.interfaces.Randomized

abstract bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)

class mlrl.testbed.bbc_cv.CV(data_set: mlrl.testbed.training.DataSet, num_folds: int, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver): Bases: mlrl.testbed.training.CrossValidation

class mlrl.testbed.bbc_cv.CVBootstrapping(data_set: mlrl.testbed.training.DataSet, num_folds: int)

Bases: mlrl.testbed.bbc_cv.Bootstrapping

bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)

class mlrl.testbed.bbc_cv.DefaultBbcCvObserver(target_measure, target_measure_is_loss: bool, output_dir: Optional[str] = None)

Bases: mlrl.testbed.bbc_cv.BbcCvObserver

An observer that determines the best configuration per bootstrap iteration and computes the evaluation measures averaged over all iterations.

evaluate(configurations: List[dict], meta_data: mlrl.testbed.data.MetaData, ground_truth_tuning: numpy.ndarray, predictions_tuning: numpy.ndarray, ground_truth_test: numpy.ndarray, predictions_test: numpy.ndarray, current_bootstrap: int, num_bootstraps: int)

Parameters

configurations – The configurations that have been provided to the BBC-CV method
meta_data – The meta data of the data set
ground_truth_tuning – The ground truth of the examples that belong to the tuning set
predictions_tuning – The predictions for the examples that belong to the tuning set
ground_truth_test – The ground truth of the examples that belong to the test set
predictions_test – The predictions for the examples that belong to the test set
current_bootstrap – The current bootstrap iteration
num_bootstraps – The total number of bootstrap iterations

class mlrl.testbed.bbc_cv.DefaultBootstrapping(num_bootstraps: int)

Bases: mlrl.testbed.bbc_cv.Bootstrapping

bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)

mlrl.testbed.data module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides functions for handling multi-label data.

class mlrl.testbed.data.Attribute(attribute_name: str, attribute_type: mlrl.testbed.data.AttributeType, nominal_values: Optional[List[str]] = None)

Bases: object

Represents a numerical or nominal attribute that is contained by a data set.

class mlrl.testbed.data.AttributeType(value)

Bases: enum.Enum

All supported types of attributes.

NOMINAL = 2

NUMERIC = 1

class mlrl.testbed.data.Label(name: str)

Bases: mlrl.testbed.data.Attribute

Represents a label that is contained by a data set.

class mlrl.testbed.data.MetaData(attributes: List[mlrl.testbed.data.Attribute], labels: List[mlrl.testbed.data.Attribute], labels_at_start: bool)

Bases: object

Stores the meta data of a multi-label data set.

get_attribute_indices(attribute_type: Optional[mlrl.testbed.data.AttributeType] = None) → List[int]

Returns a list that contains the indices of all attributes of a specific type (in ascending order).

Parameters: attribute_type – The type of the attributes whose indices should be returned or None, if all indices should be returned
Returns: A list that contains the indices of all attributes of the given type

mlrl.testbed.data.load_data_set(data_dir: str, arff_file_name: str, meta_data: mlrl.testbed.data.MetaData, feature_dtype=<class 'numpy.float32'>, label_dtype=<class 'numpy.uint8'>) -> (<class 'scipy.sparse._lil.lil_matrix'>, <class 'scipy.sparse._lil.lil_matrix'>)

Loads a multi-label data set from an ARFF file given its meta data.

Parameters

data_dir – The path of the directory that contains the ARFF file
arff_file_name – The name of the ARFF file (including the suffix)
meta_data – The meta data
feature_dtype – The requested dtype of the feature matrix
label_dtype – The requested dtype of the label matrix

Returns

A scipy.sparse.lil_matrix of type feature_dtype, shape (num_examples, num_features), representing the feature values of the examples, as well as a scipy.sparse.lil_matrix of type label_dtype, shape (num_examples, num_labels), representing the corresponding label vectors

mlrl.testbed.data.load_data_set_and_meta_data(data_dir: str, arff_file_name: str, xml_file_name: str, feature_dtype=<class 'numpy.float32'>, label_dtype=<class 'numpy.uint8'>) -> (<class 'scipy.sparse._lil.lil_matrix'>, <class 'scipy.sparse._lil.lil_matrix'>, <class 'mlrl.testbed.data.MetaData'>)

Loads a multi-label data set from an ARFF file and the corresponding Mulan XML file.

Parameters

data_dir – The path of the directory that contains the files
arff_file_name – The name of the ARFF file (including the suffix)
xml_file_name – The name of the XML file (including the suffix)
feature_dtype – The requested type of the feature matrix
label_dtype – The requested type of the label matrix

Returns

A scipy.sparse.lil_matrix of type feature_dtype, shape (num_examples, num_features), representing the feature values of the examples, a scipy.sparse.lil_matrix of type label_dtype, shape (num_examples, num_labels), representing the corresponding label vectors, as well as the data set’s meta data

mlrl.testbed.data.one_hot_encode(x, y, meta_data: mlrl.testbed.data.MetaData, encoder=None)

One-hot encodes the nominal attributes contained in a data set, if any.

If the given feature matrix is sparse, it will be converted into a dense matrix. Also, an updated variant of the given meta data, where the attributes have been removed, will be returned, as the original attributes become invalid by applying one-hot-encoding.

Parameters

x – A np.ndarray or scipy.sparse.matrix, shape (num_examples, num_features), representing the features of the examples in the data set
y – A np.ndarray or scipy.sparse.matrix, shape (num_examples, num_labels), representing the labels of the examples in the data set
meta_data – The meta data of the data set
encoder – The ‘ColumnTransformer’ to be used or None, if a new encoder should be created

Returns

A np.ndarray, shape (num_examples, num_encoded_features), representing the encoded features of the given examples, the encoder that has been used, as well as the updated meta data

mlrl.testbed.data.save_arff_file(output_dir: str, arff_file_name: str, x: numpy.ndarray, y: numpy.ndarray, meta_data: mlrl.testbed.data.MetaData)

Saves a multi-label data set to an ARFF file.

Parameters

output_dir – The path of the directory where the ARFF file should be saved
arff_file_name – The name of the ARFF file (including the suffix)
x – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the features of the examples that are contained in the data set
y – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of the examples that are contained in the data set
meta_data – The meta data of the data set that should be saved

mlrl.testbed.data.save_data_set(output_dir: str, arff_file_name: str, x: numpy.ndarray, y: numpy.ndarray) → mlrl.testbed.data.MetaData

Saves a multi-label data set to an ARFF file. All attributes in the data set are considered to be numerical.

Parameters

output_dir – The path of the directory where the ARFF file should be saved
arff_file_name – The name of the ARFF file (including the suffix)
x – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the features of the examples that are contained in the data set
y – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of the examples that are contained in the data set

Returns

The meta data of the data set that has been saved

mlrl.testbed.data.save_data_set_and_meta_data(output_dir: str, arff_file_name: str, xml_file_name: str, x: numpy.ndarray, y: numpy.ndarray) → mlrl.testbed.data.MetaData

Saves a multi-label data set to an ARFF file and its meta data to a XML file. All attributes in the data set are considered to be numerical.

Parameters

output_dir – The path of the directory where the ARFF file and the XML file should be saved
arff_file_name – The name of the ARFF file (including the suffix)
xml_file_name – The name of the XML file (including the suffix)
x – An array of type float, shape (num_examples, num_features), representing the features of the examples that are contained in the data set
y – An array of type float, shape (num_examples, num_labels), representing the label vectors of the examples that are contained in the data set

Returns

The meta data of the data set that has been saved

mlrl.testbed.data.save_meta_data(output_dir: str, xml_file_name: str, meta_data: mlrl.testbed.data.MetaData)

Saves the meta data of a multi-label data set to a XML file.

Parameters

output_dir – The path of the directory where the XML file should be saved
xml_file_name – The name of the XML file (including the suffix)
meta_data – The meta data of the data set

mlrl.testbed.data_characteristics module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides functions to determine certain characteristics of multi-label data sets.

class mlrl.testbed.data_characteristics.DataCharacteristics(num_examples: int, num_nominal_features: int, num_numerical_features: int, feature_density: float, num_labels: int, label_density: float, avg_label_imbalance_ratio: float, avg_label_cardinality: float, num_distinct_label_vectors: int)

Bases: object

Stores characteristics of a multi-label data set.

class mlrl.testbed.data_characteristics.DataCharacteristicsCsvOutput(output_dir: str, clear_dir: bool = True)

Bases: mlrl.testbed.data_characteristics.DataCharacteristicsOutput

Writes the characteristics of a data set to a CSV file.

write_data_characteristics(experiment_name: str, characteristics: mlrl.testbed.data_characteristics.DataCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a data set to the output.

Parameters

experiment_name – The name of the experiment
characteristics – The characteristics of the data set
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.data_characteristics.DataCharacteristicsLogOutput

Bases: mlrl.testbed.data_characteristics.DataCharacteristicsOutput

Outputs the characteristics of a data set using the logger.

write_data_characteristics(experiment_name: str, characteristics: mlrl.testbed.data_characteristics.DataCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a data set to the output.

Parameters

experiment_name – The name of the experiment
characteristics – The characteristics of the data set
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.data_characteristics.DataCharacteristicsOutput

Bases: abc.ABC

An abstract base class for all outputs, the characteristics of a data set may be written to.

abstract write_data_characteristics(experiment_name: str, characteristics: mlrl.testbed.data_characteristics.DataCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a data set to the output.

Parameters

experiment_name – The name of the experiment
characteristics – The characteristics of the data set
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.data_characteristics.DataCharacteristicsPrinter(outputs: List[mlrl.testbed.data_characteristics.DataCharacteristicsOutput])

Bases: object

A class that allows to print the characteristics of data sets.

print(experiment_name: str, x, y, meta_data: mlrl.testbed.data.MetaData, current_fold: int, num_folds: int)

Parameters

experiment_name – The name of the experiment
x – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the feature values
y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the ground truth labels
meta_data – The meta data of the data set
current_fold – The current fold
num_folds – The total number of folds

mlrl.testbed.data_characteristics.density(m) → float

Calculates and returns the density of a given feature or label matrix.

Parameters: m – A numpy.ndarray or scipy.sparse matrix, shape (num_rows, num_cols), that stores the feature values of training examples or their labels
Returns: The fraction of non-zero elements in the given matrix among all elements

mlrl.testbed.data_characteristics.distinct_label_vectors(y) → int

Determines and returns the number of distinct label vectors in a label matrix.

Parameters: y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of training examples
Returns: The number of distinct label vectors in the given matrix

mlrl.testbed.data_characteristics.label_cardinality(y) → float

Calculates and returns the average label cardinality of a given label matrix.

Parameters: y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of training examples
Returns: The average number of relevant labels per training example

mlrl.testbed.data_characteristics.label_imbalance_ratio(y) → float

Calculates and returns the average label imbalance ratio of a given label matrix.

Parameters: y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of training examples
Returns: The label imbalance ratio averaged over the available labels

mlrl.testbed.evaluation module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for evaluating the predictions or rankings provided by a multi-label learner according to different measures. The evaluation results can be written to one or several outputs, e.g. to the console or to a file.

class mlrl.testbed.evaluation.AbstractEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)

Bases: mlrl.testbed.evaluation.Evaluation

An abstract base class for all classes that evaluate the predictions provided by a classifier or ranker and allow to write the results to one or several outputs.

evaluate(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, first_fold: int, current_fold: int, last_fold: int, num_folds: int, train_time: float, predict_time: float)

Evaluates the predictions provided by a classifier or ranker.

Parameters

experiment_name – The name of the experiment
meta_data – The meta data of the data set
predictions – The predictions provided by the classifier
ground_truth – The true labels
first_fold – The first cross validation fold or 0, if no cross validation is used
current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used
last_fold – The last cross validation fold or 0, if no cross validation is used
num_folds – The total number of cross validation folds or 1, if no cross validation is used
train_time – The time needed to train the model
predict_time – The time needed to make predictions

class mlrl.testbed.evaluation.ClassificationEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)

Bases: mlrl.testbed.evaluation.AbstractEvaluation

Evaluates the predictions of a single- or multi-label classifier according to commonly used bipartition measures.

class mlrl.testbed.evaluation.Evaluation

Bases: abc.ABC

An abstract base class for all classes that evaluate the predictions provided by a classifier or ranker.

abstract evaluate(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, first_fold: int, current_fold: int, last_fold: int, num_folds: int, train_time: float, predict_time: float)

Evaluates the predictions provided by a classifier or ranker.

Parameters

experiment_name – The name of the experiment
meta_data – The meta data of the data set
predictions – The predictions provided by the classifier
ground_truth – The true labels
first_fold – The first cross validation fold or 0, if no cross validation is used
current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used
last_fold – The last cross validation fold or 0, if no cross validation is used
num_folds – The total number of cross validation folds or 1, if no cross validation is used
train_time – The time needed to train the model
predict_time – The time needed to make predictions

class mlrl.testbed.evaluation.EvaluationCsvOutput(output_dir: str, clear_dir: bool = True, output_predictions: bool = False, output_individual_folds: bool = True)

Bases: mlrl.testbed.evaluation.EvaluationOutput

Writes evaluation results to CSV files.

write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)

Writes an evaluation result to the output.

Parameters

experiment_name – The name of the experiment
evaluation_result – The evaluation result to be written
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)

Writes predictions to the output.

Parameters

experiment_name – The name of the experiment
meta_data – The meta data of the data set
predictions – The predictions
ground_truth – The ground truth
total_folds – The total number of folds
fold – The fold for which the predictions should be written or None, if no cross validation is used

class mlrl.testbed.evaluation.EvaluationLogOutput(output_predictions: bool = False, output_individual_folds: bool = True)

Bases: mlrl.testbed.evaluation.EvaluationOutput

Outputs evaluation result using the logger.

write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)

Writes an evaluation result to the output.

Parameters

experiment_name – The name of the experiment
evaluation_result – The evaluation result to be written
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)

Writes predictions to the output.

Parameters

experiment_name – The name of the experiment
meta_data – The meta data of the data set
predictions – The predictions
ground_truth – The ground truth
total_folds – The total number of folds
fold – The fold for which the predictions should be written or None, if no cross validation is used

class mlrl.testbed.evaluation.EvaluationOutput(output_predictions: bool, output_individual_folds: bool)

Bases: abc.ABC

An abstract base class for all outputs, evaluation results may be written to.

abstract write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)

Writes an evaluation result to the output.

Parameters

experiment_name – The name of the experiment
evaluation_result – The evaluation result to be written
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

abstract write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)

Writes predictions to the output.

Parameters

experiment_name – The name of the experiment
meta_data – The meta data of the data set
predictions – The predictions
ground_truth – The ground truth
total_folds – The total number of folds
fold – The fold for which the predictions should be written or None, if no cross validation is used

class mlrl.testbed.evaluation.EvaluationResult

Bases: object

Stores the evaluation results according to different measures.

avg(name: str) -> (<class 'float'>, <class 'float'>)

Returns the score and standard deviation according to a specific measure averaged over all available folds.

Parameters: name – The name of the measure
Returns: A tuple consisting of the averaged score and standard deviation

avg_dict() → Dict

dict(fold: int) → Dict

get(name: str, fold: int) → float

Returns the score according to a specific measure and fold.

Parameters

name – The name of the measure
fold – The fold the score corresponds to

Returns

The score

put(name: str, score: float, fold: int, num_folds: int)

Adds a new score according to a specific measure to the evaluation result.

Parameters

name – The name of the measure
score – The score according to the measure
fold – The fold the score corresponds to
num_folds – The total number of cross validation folds

class mlrl.testbed.evaluation.RankingEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)

Bases: mlrl.testbed.evaluation.AbstractEvaluation

Evaluates the predictions of a multi-label ranker according to commonly used ranking measures.

mlrl.testbed.experiments module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for performing experiments.

class mlrl.testbed.experiments.Experiment(base_learner: mlrl.common.learners.Learner, data_set: mlrl.testbed.training.DataSet, num_folds: int = 1, current_fold: int = - 1, train_evaluation: Optional[mlrl.testbed.evaluation.Evaluation] = None, test_evaluation: Optional[mlrl.testbed.evaluation.Evaluation] = None, parameter_input: Optional[mlrl.testbed.parameters.ParameterInput] = None, model_printer: Optional[mlrl.testbed.model_characteristics.ModelPrinter] = None, model_characteristics_printer: Optional[mlrl.testbed.model_characteristics.ModelCharacteristicsPrinter] = None, data_characteristics_printer: Optional[mlrl.testbed.data_characteristics.DataCharacteristicsPrinter] = None, persistence: Optional[mlrl.testbed.persistence.ModelPersistence] = None)

Bases: mlrl.testbed.training.CrossValidation, abc.ABC

An experiment that trains and evaluates a single multi-label classifier or ranker on a specific data set using cross validation or separate training and test sets.

run()

mlrl.testbed.interfaces module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides common interfaces that are implemented by several classes.

class mlrl.testbed.interfaces.Randomized

Bases: abc.ABC

A base class for all classifiers, rankers or modules that use RNGs.

Attributes: random_state The seed to be used by RNGs

random_state: int = 1

mlrl.testbed.io module

Author Michael Rapp (michael.rapp.ml@gmail.com)

Provides functions for writing and reading files.

mlrl.testbed.io.clear_directory(directory: str)

Deletes all files contained in a directory (excluding subdirectories).

Parameters: directory – The directory to be cleared

mlrl.testbed.io.create_csv_dict_reader(csv_file) → csv.DictReader

Creates and return a DictReader that allows to read from a CSV file.

Parameters: csv_file – The CSV file
Returns: The ‘DictReader’ that has been created

mlrl.testbed.io.create_csv_dict_writer(csv_file, header) → csv.DictWriter

Creates and returns a DictWriter that allows to write a dictionary to a CSV file.

Parameters

csv_file – The CSV file
header – A list that contains the headers of the CSV file. They must correspond to the keys in the directory that should be written to the file

Returns

The DictWriter that has been created

mlrl.testbed.io.get_file_name(name: str, suffix: str)

Returns a file name, including a suffix.

Parameters

name – The name of the file (without suffix)
suffix – The suffix of the file

Returns

The file name

mlrl.testbed.io.get_file_name_per_fold(name: str, suffix: str, fold: int)

Returns a file name, including a suffix, that corresponds to a certain fold.

Parameters

name – The name of the file (without suffix)
suffix – The suffix of the file
fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold

Returns

The file name

mlrl.testbed.io.open_readable_csv_file(directory: str, file_name: str, fold: int)

Opens a CSV file to be read from.

Parameters

directory – The directory where the file is located
file_name – The name of the file to be opened (without suffix)
fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold

Returns

The file that has been opened

mlrl.testbed.io.open_writable_csv_file(directory: str, file_name: str, fold: Optional[int] = None, append: bool = False)

Opens a CSV file to be written to.

Parameters

directory – The directory where the file is located
file_name – The name of the file to be opened (without suffix)
fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold
append – True, if new data should be appended to the file, if it already exists, False otherwise

Returns

The file that has been opened

mlrl.testbed.io.open_writable_txt_file(directory: str, file_name: str, fold: Optional[int] = None, append: bool = False)

Opens a text file to be written to.

Parameters

directory – The directory where the file is located
file_name – The name of the file to be opened (without suffix)
fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold
append – True, if new data should be appended to the file, if it already exists, False otherwise

Returns

The file that has been opened

mlrl.testbed.io.write_xml_file(xml_file, root_element: xml.etree.ElementTree.Element, encoding='utf-8')

Writes a XML structure to a file.

Parameters

xml_file – The XML file
root_element – The root element of the XML structure
encoding – The encoding to be used

mlrl.testbed.main_boomer module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

class mlrl.testbed.main_boomer.BoomerRunnable: Bases: mlrl.testbed.runnables.RuleLearnerRunnable

mlrl.testbed.main_boomer.main()

mlrl.testbed.main_seco module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

class mlrl.testbed.main_seco.SeCoRunnable: Bases: mlrl.testbed.runnables.RuleLearnerRunnable

mlrl.testbed.main_seco.main()

mlrl.testbed.model_characteristics module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for printing textual representations of models. The models can be written to one or several outputs, e.g. to the console or to a file.

class mlrl.testbed.model_characteristics.ModelCharacteristicsPrinter

Bases: abc.ABC

A class that allows to print the characteristics of a Learner’s model.

print(experiment_name: str, learner: mlrl.common.learners.Learner, current_fold: int, num_folds: int)

class mlrl.testbed.model_characteristics.ModelPrinter(print_options: str, outputs: List[mlrl.testbed.model_characteristics.ModelPrinterOutput])

Bases: abc.ABC

An abstract base class for all classes that allow to print a textual representation of a Learner’s model.

print(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, learner: mlrl.common.learners.Learner, current_fold: int, num_folds: int)

Prints a textual representation of a Learner’s model.

Parameters

experiment_name – The name of the experiment
meta_data – The meta data of the training data set
learner – The learner
current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used
num_folds – The total number of cross validation folds or 1, if no cross validation is used

class mlrl.testbed.model_characteristics.ModelPrinterLogOutput

Bases: mlrl.testbed.model_characteristics.ModelPrinterOutput

Outputs the textual representation of a model using the logger.

write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)

Write a textual representation of a model to the output.

Parameters

experiment_name – The name of the experiment
model – The textual representation of the model
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

class mlrl.testbed.model_characteristics.ModelPrinterOutput

Bases: abc.ABC

An abstract base class for all outputs, textual representations of models may be written to.

abstract write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)

Write a textual representation of a model to the output.

Parameters

experiment_name – The name of the experiment
model – The textual representation of the model
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

class mlrl.testbed.model_characteristics.ModelPrinterTxtOutput(output_dir: str, clear_dir: bool = True)

Bases: mlrl.testbed.model_characteristics.ModelPrinterOutput

Writes the textual representation of a model to a text file.

write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)

Write a textual representation of a model to the output.

Parameters

experiment_name – The name of the experiment
model – The textual representation of the model
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

class mlrl.testbed.model_characteristics.RuleModelCharacteristics(default_rule_index: int, default_rule_pos_predictions: int, default_rule_neg_predictions: int, num_leq: numpy.ndarray, num_gr: numpy.ndarray, num_eq: numpy.ndarray, num_neq: numpy.ndarray, num_pos_predictions: numpy.ndarray, num_neg_predictions: numpy.ndarray)

Bases: object

Stores the characteristics of a RuleModel.

class mlrl.testbed.model_characteristics.RuleModelCharacteristicsCsvOutput(output_dir: str, clear_dir: bool = True)

Bases: mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutput

Writes the characteristics of a RuleModel to a CSV file.

COL_CONDITIONS = 'conditions'

COL_EQ_CONDITIONS = 'conditions using == operator'

COL_GR_CONDITIONS = 'conditions using > operator'

COL_LEQ_CONDITIONS = 'conditions using <= operator'

COL_NEG_PREDICTIONS = 'neg. predictions'

COL_NEQ_CONDITIONS = 'conditions using != operator'

COL_NOMINAL_CONDITIONS = 'nominal conditions'

COL_NUMERICAL_CONDITIONS = 'numerical conditions'

COL_POS_PREDICTIONS = 'pos. predictions'

COL_PREDICTIONS = 'predictions'

COL_RULE_NAME = 'Rule'

write_model_characteristics(experiment_name: str, characteristics: mlrl.testbed.model_characteristics.RuleModelCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a RuleModel to the output.

Parameters

experiment_name – The name of the experiment
characteristics – The characteristics of the model
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.model_characteristics.RuleModelCharacteristicsLogOutput

Bases: mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutput

Outputs the characteristics of a RuleModel using the logger.

write_model_characteristics(experiment_name: str, characteristics: mlrl.testbed.model_characteristics.RuleModelCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a RuleModel to the output.

Parameters

experiment_name – The name of the experiment
characteristics – The characteristics of the model
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutput

Bases: abc.ABC

An abstract base class for all outputs, the characteristics of a MLRuleLearner’s model may be written to.

abstract write_model_characteristics(experiment_name: str, characteristics: mlrl.testbed.model_characteristics.RuleModelCharacteristics, total_folds: int, fold: Optional[int] = None)

Writes the characteristics of a RuleModel to the output.

Parameters

experiment_name – The name of the experiment
characteristics – The characteristics of the model
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used

class mlrl.testbed.model_characteristics.RuleModelCharacteristicsPrinter(outputs: List[mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutput])

Bases: mlrl.testbed.model_characteristics.ModelCharacteristicsPrinter

A class that allows to print the characteristics of MLRuleLearner’s model.

class mlrl.testbed.model_characteristics.RuleModelCharacteristicsVisitor

Bases: mlrl.common.cython.rule_model.RuleModelVisitor

A visitor that allows to determine the characteristics of a RuleModel.

visit_complete_head(head: mlrl.common.cython.rule_model.CompleteHead)

Must be implemented by subclasses in order to visit the heads of rules that predict for all available labels.

Parameters: head – A CompleteHead to be visited

visit_conjunctive_body(body: mlrl.common.cython.rule_model.ConjunctiveBody)

Must be implemented by subclasses in order to visit the bodies of rule that are given as a conjunction of several conditions.

Parameters: body – A ConjunctiveBody to be visited

visit_empty_body(_: mlrl.common.cython.rule_model.EmptyBody)

Must be implemented by subclasses in order to visit bodies of rules that do not contain any conditions.

Parameters: body – An EmptyBody to be visited

visit_partial_head(head: mlrl.common.cython.rule_model.PartialHead)

Must be implemented by subclasses in order to visit the heads of rules that predict for a subset of the available labels.

Parameters: head – A PartialHead to be visited

class mlrl.testbed.model_characteristics.RuleModelFormatter(attributes: List[mlrl.testbed.data.Attribute], labels: List[mlrl.testbed.data.Attribute], print_feature_names: bool, print_label_names: bool, print_nominal_values: bool)

Bases: mlrl.common.cython.rule_model.RuleModelVisitor

Allows to create textual representation of the rules in a RuleModel.

get_text() → str

Returns the textual representation that has been created via the format method.

Returns: The textual representation

visit_complete_head(head: mlrl.common.cython.rule_model.CompleteHead)

Must be implemented by subclasses in order to visit the heads of rules that predict for all available labels.

Parameters: head – A CompleteHead to be visited

visit_conjunctive_body(body: mlrl.common.cython.rule_model.ConjunctiveBody)

Must be implemented by subclasses in order to visit the bodies of rule that are given as a conjunction of several conditions.

Parameters: body – A ConjunctiveBody to be visited

visit_empty_body(_: mlrl.common.cython.rule_model.EmptyBody)

Must be implemented by subclasses in order to visit bodies of rules that do not contain any conditions.

Parameters: body – An EmptyBody to be visited

visit_partial_head(head: mlrl.common.cython.rule_model.PartialHead)

Must be implemented by subclasses in order to visit the heads of rules that predict for a subset of the available labels.

Parameters: head – A PartialHead to be visited

class mlrl.testbed.model_characteristics.RulePrinter(print_options: str, outputs: List[mlrl.testbed.model_characteristics.ModelPrinterOutput])

Bases: mlrl.testbed.model_characteristics.ModelPrinter

Allows to print a textual representation of a MLRuleLearner’s rule-based model.

mlrl.testbed.parameters module

Author Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for parameter tuning.

class mlrl.testbed.parameters.NestedCrossValidation(num_nested_folds: int)

Bases: mlrl.testbed.parameters.ParameterSearch

Allows to search for optimal parameters using (nested) cross validation.

search(meta_data: mlrl.testbed.data.MetaData, x, y, first_fold: int, current_fold: int, last_fold: int, num_folds: int)

Tests different parameter settings given a training data set.

Parameters

meta_data – The meta data of the training data set
x – The feature matrix of the training examples
y – The label matrix of the training examples
first_fold – The first fold or 0, if no cross validation is used
current_fold – The current fold starting at 0, or 0 if no cross validation is used
last_fold – The last fold or 0, if no cross validation is used
num_folds – The total number of cross validation folds or 1, if no cross validation is used

class mlrl.testbed.parameters.ParameterCsvInput(input_dir: str)

Bases: mlrl.testbed.parameters.ParameterInput

Reads parameter settings from CSV files.

read_parameters(fold: Optional[int] = None) → dict

Reads a parameter setting from the input.

Parameters: fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold
Returns: A dictionary that stores the parameters

class mlrl.testbed.parameters.ParameterCsvOutput(output_dir: str, clear_dir: bool = True)

Bases: mlrl.testbed.parameters.ParameterOutput

Writes parameter settings to CSV files.

write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)

Writes a parameter setting to the output.

Parameters

parameters – A dictionary that stores the parameters
score – The evaluation score that has been achieved using the parameter setting
total_folds – The total number of folds
fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

class mlrl.testbed.parameters.ParameterInput

Bases: abc.ABC

abstract read_parameters(fold: Optional[int] = None) → dict

Reads a parameter setting from the input.

Parameters: fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold
Returns: A dictionary that stores the parameters

class mlrl.testbed.parameters.ParameterLogOutput

Bases: mlrl.testbed.parameters.ParameterOutput

Outputs parameter settings using the logger.

write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)

Writes a parameter setting to the output.

Parameters

parameters – A dictionary that stores the parameters
score – The evaluation score that has been achieved using the parameter setting
total_folds – The total number of folds
fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

class mlrl.testbed.parameters.ParameterOutput

Bases: abc.ABC

An abstract base class for all outputs, parameter settings may be written to.

abstract write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)

Writes a parameter setting to the output.

Parameters

parameters – A dictionary that stores the parameters
score – The evaluation score that has been achieved using the parameter setting
total_folds – The total number of folds
fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

class mlrl.testbed.parameters.ParameterSearch

Bases: mlrl.testbed.interfaces.Randomized, abc.ABC

A base class for all classes that implement strategies to search for optimal parameters given a training data set.

abstract get_params()

Returns the best parameter setting tested so far.

Returns: A dictionary that stores the parameters

abstract get_score()

Returns the evaluation score that has been achieved using the best parameter setting.

Returns: An evaluation score

abstract search(meta_data: mlrl.testbed.data.MetaData, x, y, first_fold: int, current_fold: int, last_fold: int, num_folds: int)

Tests different parameter settings given a training data set.

Parameters

meta_data – The meta data of the training data set
x – The feature matrix of the training examples
y – The label matrix of the training examples
first_fold – The first fold or 0, if no cross validation is used
current_fold – The current fold starting at 0, or 0 if no cross validation is used
last_fold – The last fold or 0, if no cross validation is used
num_folds – The total number of cross validation folds or 1, if no cross validation is used

class mlrl.testbed.parameters.ParameterTuning(data_set: mlrl.testbed.training.DataSet, num_folds: int, current_fold: int, parameter_search: mlrl.testbed.parameters.ParameterSearch, *args: mlrl.testbed.parameters.ParameterOutput)

Bases: mlrl.testbed.training.CrossValidation

Allows to tune parameters for a single training data set or all training data sets that are used in cross validation using a ParameterSearch and writes the optimal parameters to one or several outputs.

mlrl.testbed.persistence module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for saving/loading models to/from disk.

class mlrl.testbed.persistence.ModelPersistence(model_dir: str)

Bases: object

Allows to save a model in a file and load it later.

load_model(model_name: str, fold: Optional[int] = None, raise_exception: bool = False)

Loads a model from a file.

Parameters

model_name – The name of the model to be loaded
fold – The fold, the model corresponds to, or None if no cross validation is used
raise_exception – True, if an exception should be raised if an error occurs, False, if None should be returned in such case

Returns

The loaded model

save_model(model, model_name: str, fold: Optional[int] = None)

Saves a model to a file.

Parameters

model – The model to be persisted
model_name – The name of the model to be persisted
fold – The fold, the model corresponds to, or None if no cross validation is used

mlrl.testbed.runnables module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides base classes for programs that can be configured via command line arguments.

class mlrl.testbed.runnables.RuleLearnerRunnable

Bases: mlrl.testbed.runnables.Runnable, abc.ABC

A base class for all programs that perform an experiment that involves training and evaluation of a rule learner.

class mlrl.testbed.runnables.Runnable

Bases: abc.ABC

A base class for all programs that can be configured via command line arguments.

run(parser: argparse.ArgumentParser)

mlrl.testbed.training module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for training and evaluating multi-label classifiers using either cross validation or separate training and test sets.

class mlrl.testbed.training.CrossValidation(data_set: mlrl.testbed.training.DataSet, num_folds: int, current_fold: int)

Bases: mlrl.testbed.interfaces.Randomized, abc.ABC

A base class for all classes that use cross validation or a train-test split to train and evaluate a multi-label classifier or ranker.

run()

class mlrl.testbed.training.DataSet(data_dir: str, data_set_name: str, use_one_hot_encoding: bool)

Bases: object

Stores the properties of a data set to be used for training and evaluating multi-label classifiers.

mlrl.testbed package

Submodules

mlrl.testbed.args module

mlrl.testbed.bbc_cv module

mlrl.testbed.data module

mlrl.testbed.data_characteristics module

mlrl.testbed.evaluation module

mlrl.testbed.experiments module

mlrl.testbed.interfaces module

mlrl.testbed.io module

mlrl.testbed.main_boomer module

mlrl.testbed.main_seco module

mlrl.testbed.model_characteristics module

mlrl.testbed.parameters module

mlrl.testbed.persistence module

mlrl.testbed.runnables module

mlrl.testbed.training module

Module contents