mlrl.testbed package

Submodules

mlrl.testbed.bbc_cv module

Author: Michael Rapp (mrapp@ke.tu-darmstadt.de)

Implements “Bootstrap Bias Corrected Cross Validation” (BBC-CV) for evaluating different configurations of a learner and estimating unbiased performance estimations (see https://link.springer.com/article/10.1007/s10994-018-5714-4).

class mlrl.testbed.bbc_cv.BbcCv(configurations: List[dict], adapter: mlrl.testbed.bbc_cv.BbcCvAdapter, bootstrapping: mlrl.testbed.bbc_cv.Bootstrapping, learner: mlrl.common.learners.Learner)

Bases: mlrl.testbed.interfaces.Randomized

An implementation of “Bootstrap Bias Corrected Cross Validation” (BBC-CV).

evaluate(observer: mlrl.testbed.bbc_cv.BbcCvObserver)
Parameters

observer – The BbcCvObserver to be used

store_predictions()
class mlrl.testbed.bbc_cv.BbcCvAdapter(data_set: mlrl.testbed.training.DataSet, num_folds: int, model_dir: str)

Bases: mlrl.testbed.training.CrossValidation

An adapter that must be implemented for each type of model to be used with BBC-CV to obtain predictions for given test examples.

fit(x, y)
predict(x)
run()
class mlrl.testbed.bbc_cv.BbcCvObserver

Bases: abc.ABC

A base class for all observers that should be notified about the predictions and ground truth labelings that result from applying the BBC-CV method.

abstract evaluate(configurations: List[dict], meta_data: mlrl.testbed.data.MetaData, ground_truth_tuning: numpy.ndarray, predictions_tuning: numpy.ndarray, ground_truth_test: numpy.ndarray, predictions_test: numpy.ndarray, current_bootstrap: int, num_bootstraps: int)
Parameters
  • configurations – The configurations that have been provided to the BBC-CV method

  • meta_data – The meta data of the data set

  • ground_truth_tuning – The ground truth of the examples that belong to the tuning set

  • predictions_tuning – The predictions for the examples that belong to the tuning set

  • ground_truth_test – The ground truth of the examples that belong to the test set

  • predictions_test – The predictions for the examples that belong to the test set

  • current_bootstrap – The current bootstrap iteration

  • num_bootstraps – The total number of bootstrap iterations

class mlrl.testbed.bbc_cv.Bootstrapping

Bases: mlrl.testbed.interfaces.Randomized

abstract bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)
class mlrl.testbed.bbc_cv.CV(data_set: mlrl.testbed.training.DataSet, num_folds: int, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)

Bases: mlrl.testbed.training.CrossValidation

class mlrl.testbed.bbc_cv.CVBootstrapping(data_set: mlrl.testbed.training.DataSet, num_folds: int)

Bases: mlrl.testbed.bbc_cv.Bootstrapping

bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)
class mlrl.testbed.bbc_cv.DefaultBbcCvObserver(target_measure, target_measure_is_loss: bool, output_dir: Optional[str] = None)

Bases: mlrl.testbed.bbc_cv.BbcCvObserver

An observer that determines the best configuration per bootstrap iteration and computes the evaluation measures averaged over all iterations.

evaluate(configurations: List[dict], meta_data: mlrl.testbed.data.MetaData, ground_truth_tuning: numpy.ndarray, predictions_tuning: numpy.ndarray, ground_truth_test: numpy.ndarray, predictions_test: numpy.ndarray, current_bootstrap: int, num_bootstraps: int)
Parameters
  • configurations – The configurations that have been provided to the BBC-CV method

  • meta_data – The meta data of the data set

  • ground_truth_tuning – The ground truth of the examples that belong to the tuning set

  • predictions_tuning – The predictions for the examples that belong to the tuning set

  • ground_truth_test – The ground truth of the examples that belong to the test set

  • predictions_test – The predictions for the examples that belong to the test set

  • current_bootstrap – The current bootstrap iteration

  • num_bootstraps – The total number of bootstrap iterations

class mlrl.testbed.bbc_cv.DefaultBootstrapping(num_bootstraps: int)

Bases: mlrl.testbed.bbc_cv.Bootstrapping

bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)

mlrl.testbed.data module

Author: Michael Rapp (mrapp@ke.tu-darmstadt.de)

Provides functions for handling multi-label data.

class mlrl.testbed.data.Attribute(attribute_name: str, attribute_type: mlrl.testbed.data.AttributeType, nominal_values: Optional[List[str]] = None)

Bases: object

Represents a numerical or nominal attribute that is contained by a data set.

class mlrl.testbed.data.AttributeType(value)

Bases: enum.Enum

All supported types of attributes.

NOMINAL = 2
NUMERIC = 1
class mlrl.testbed.data.Label(name: str)

Bases: mlrl.testbed.data.Attribute

Represents a label that is contained by a data set.

class mlrl.testbed.data.MetaData(attributes: List[mlrl.testbed.data.Attribute], labels: List[mlrl.testbed.data.Attribute], labels_at_start: bool)

Bases: object

Stores the meta data of a multi-label data set.

get_attribute_indices(attribute_type: Optional[mlrl.testbed.data.AttributeType] = None) List[int]

Returns a list that contains the indices of all attributes of a specific type (in ascending order).

Parameters

attribute_type – The type of the attributes whose indices should be returned or None, if all indices should be returned

Returns

A list that contains the indices of all attributes of the given type

mlrl.testbed.data.load_data_set(data_dir: str, arff_file_name: str, meta_data: mlrl.testbed.data.MetaData, feature_dtype=<class 'numpy.float32'>, label_dtype=<class 'numpy.uint8'>) -> (<class 'scipy.sparse.lil.lil_matrix'>, <class 'scipy.sparse.lil.lil_matrix'>)

Loads a multi-label data set from an ARFF file given its meta data.

Parameters
  • data_dir – The path of the directory that contains the ARFF file

  • arff_file_name – The name of the ARFF file (including the suffix)

  • meta_data – The meta data

  • feature_dtype – The requested dtype of the feature matrix

  • label_dtype – The requested dtype of the label matrix

Returns

A scipy.sparse.lil_matrix of type feature_dtype, shape (num_examples, num_features), representing the feature values of the examples, as well as a scipy.sparse.lil_matrix of type label_dtype, shape (num_examples, num_labels), representing the corresponding label vectors

mlrl.testbed.data.load_data_set_and_meta_data(data_dir: str, arff_file_name: str, xml_file_name: str, feature_dtype=<class 'numpy.float32'>, label_dtype=<class 'numpy.uint8'>) -> (<class 'scipy.sparse.lil.lil_matrix'>, <class 'scipy.sparse.lil.lil_matrix'>, <class 'mlrl.testbed.data.MetaData'>)

Loads a multi-label data set from an ARFF file and the corresponding Mulan XML file.

Parameters
  • data_dir – The path of the directory that contains the files

  • arff_file_name – The name of the ARFF file (including the suffix)

  • xml_file_name – The name of the XML file (including the suffix)

  • feature_dtype – The requested type of the feature matrix

  • label_dtype – The requested type of the label matrix

Returns

A scipy.sparse.lil_matrix of type feature_dtype, shape (num_examples, num_features), representing the feature values of the examples, a scipy.sparse.lil_matrix of type label_dtype, shape (num_examples, num_labels), representing the corresponding label vectors, as well as the data set’s meta data

mlrl.testbed.data.one_hot_encode(x, y, meta_data: mlrl.testbed.data.MetaData, encoder=None)

One-hot encodes the nominal attributes contained in a data set, if any.

If the given feature matrix is sparse, it will be converted into a dense matrix. Also, an updated variant of the given meta data, where the attributes have been removed, will be returned, as the original attributes become invalid by applying one-hot-encoding.

Parameters
  • x – A np.ndarray or scipy.sparse.matrix, shape (num_examples, num_features), representing the features of the examples in the data set

  • y – A np.ndarray or scipy.sparse.matrix, shape (num_examples, num_labels), representing the labels of the examples in the data set

  • meta_data – The meta data of the data set

  • encoder – The ‘ColumnTransformer’ to be used or None, if a new encoder should be created

Returns

A np.ndarray, shape (num_examples, num_encoded_features), representing the encoded features of the given examples, the encoder that has been used, as well as the updated meta data

mlrl.testbed.data.save_arff_file(output_dir: str, arff_file_name: str, x: numpy.ndarray, y: numpy.ndarray, meta_data: mlrl.testbed.data.MetaData)

Saves a multi-label data set to an ARFF file.

Parameters
  • output_dir – The path of the directory where the ARFF file should be saved

  • arff_file_name – The name of the ARFF file (including the suffix)

  • x – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the features of the examples that are contained in the data set

  • y – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of the examples that are contained in the data set

  • meta_data – The meta data of the data set that should be saved

mlrl.testbed.data.save_data_set(output_dir: str, arff_file_name: str, x: numpy.ndarray, y: numpy.ndarray) mlrl.testbed.data.MetaData

Saves a multi-label data set to an ARFF file. All attributes in the data set are considered to be numerical.

Parameters
  • output_dir – The path of the directory where the ARFF file should be saved

  • arff_file_name – The name of the ARFF file (including the suffix)

  • x – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the features of the examples that are contained in the data set

  • y – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of the examples that are contained in the data set

Returns

The meta data of the data set that has been saved

mlrl.testbed.data.save_data_set_and_meta_data(output_dir: str, arff_file_name: str, xml_file_name: str, x: numpy.ndarray, y: numpy.ndarray) mlrl.testbed.data.MetaData

Saves a multi-label data set to an ARFF file and its meta data to a XML file. All attributes in the data set are considered to be numerical.

Parameters
  • output_dir – The path of the directory where the ARFF file and the XML file should be saved

  • arff_file_name – The name of the ARFF file (including the suffix)

  • xml_file_name – The name of the XML file (including the suffix)

  • x – An array of type float, shape (num_examples, num_features), representing the features of the examples that are contained in the data set

  • y – An array of type float, shape (num_examples, num_labels), representing the label vectors of the examples that are contained in the data set

Returns

The meta data of the data set that has been saved

mlrl.testbed.data.save_meta_data(output_dir: str, xml_file_name: str, meta_data: mlrl.testbed.data.MetaData)

Saves the meta data of a multi-label data set to a XML file.

Parameters
  • output_dir – The path of the directory where the XML file should be saved

  • xml_file_name – The name of the XML file (including the suffix)

  • meta_data – The meta data of the data set

mlrl.testbed.evaluation module

Author: Michael Rapp (mrapp@ke.tu-darmstadt.de)

Provides classes for evaluating the predictions or rankings provided by a multi-label learner according to different measures. The evaluation results can be written to one or several outputs, e.g. to the console or to a file.

class mlrl.testbed.evaluation.AbstractEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)

Bases: mlrl.testbed.evaluation.Evaluation

An abstract base class for all classes that evaluate the predictions provided by a classifier or ranker and allow to write the results to one or several outputs.

evaluate(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, first_fold: int, current_fold: int, last_fold: int, num_folds: int, train_time: float, predict_time: float)

Evaluates the predictions provided by a classifier or ranker.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the data set

  • predictions – The predictions provided by the classifier

  • ground_truth – The true labels

  • first_fold – The first cross validation fold or 0, if no cross validation is used

  • current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used

  • last_fold – The last cross validation fold or 0, if no cross validation is used

  • num_folds – The total number of cross validation folds or 1, if no cross validation is used

  • train_time – The time needed to train the model

  • predict_time – The time needed to make predictions

class mlrl.testbed.evaluation.ClassificationEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)

Bases: mlrl.testbed.evaluation.AbstractEvaluation

Evaluates the predictions of a single- or multi-label classifier according to commonly used bipartition measures.

class mlrl.testbed.evaluation.Evaluation

Bases: abc.ABC

An abstract base class for all classes that evaluate the predictions provided by a classifier or ranker.

abstract evaluate(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, first_fold: int, current_fold: int, last_fold: int, num_folds: int, train_time: float, predict_time: float)

Evaluates the predictions provided by a classifier or ranker.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the data set

  • predictions – The predictions provided by the classifier

  • ground_truth – The true labels

  • first_fold – The first cross validation fold or 0, if no cross validation is used

  • current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used

  • last_fold – The last cross validation fold or 0, if no cross validation is used

  • num_folds – The total number of cross validation folds or 1, if no cross validation is used

  • train_time – The time needed to train the model

  • predict_time – The time needed to make predictions

class mlrl.testbed.evaluation.EvaluationCsvOutput(output_dir: str, clear_dir: bool = True, output_predictions: bool = False, output_individual_folds: bool = True)

Bases: mlrl.testbed.evaluation.EvaluationOutput

Writes evaluation results to CSV files.

write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)

Writes an evaluation result to the output.

Parameters
  • experiment_name – The name of the experiment

  • evaluation_result – The evaluation result to be written

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)

Writes predictions to the output.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the data set

  • predictions – The predictions

  • ground_truth – The ground truth

  • total_folds – The total number of folds

  • fold – The fold for which the predictions should be written or None, if no cross validation is used

class mlrl.testbed.evaluation.EvaluationLogOutput(output_predictions: bool = False, output_individual_folds: bool = True)

Bases: mlrl.testbed.evaluation.EvaluationOutput

Outputs evaluation result using the logger.

write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)

Writes an evaluation result to the output.

Parameters
  • experiment_name – The name of the experiment

  • evaluation_result – The evaluation result to be written

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)

Writes predictions to the output.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the data set

  • predictions – The predictions

  • ground_truth – The ground truth

  • total_folds – The total number of folds

  • fold – The fold for which the predictions should be written or None, if no cross validation is used

class mlrl.testbed.evaluation.EvaluationOutput(output_predictions: bool, output_individual_folds: bool)

Bases: abc.ABC

An abstract base class for all outputs, evaluation results may be written to.

abstract write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)

Writes an evaluation result to the output.

Parameters
  • experiment_name – The name of the experiment

  • evaluation_result – The evaluation result to be written

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

abstract write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)

Writes predictions to the output.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the data set

  • predictions – The predictions

  • ground_truth – The ground truth

  • total_folds – The total number of folds

  • fold – The fold for which the predictions should be written or None, if no cross validation is used

class mlrl.testbed.evaluation.EvaluationResult

Bases: object

Stores the evaluation results according to different measures.

avg(name: str) -> (<class 'float'>, <class 'float'>)

Returns the score and standard deviation according to a specific measure averaged over all available folds.

Parameters

name – The name of the measure

Returns

A tuple consisting of the averaged score and standard deviation

avg_dict() Dict
dict(fold: int) Dict
get(name: str, fold: int) float

Returns the score according to a specific measure and fold.

Parameters
  • name – The name of the measure

  • fold – The fold the score corresponds to

Returns

The score

put(name: str, score: float, fold: int, num_folds: int)

Adds a new score according to a specific measure to the evaluation result.

Parameters
  • name – The name of the measure

  • score – The score according to the measure

  • fold – The fold the score corresponds to

  • num_folds – The total number of cross validation folds

class mlrl.testbed.evaluation.RankingEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)

Bases: mlrl.testbed.evaluation.AbstractEvaluation

Evaluates the predictions of a multi-label ranker according to commonly used ranking measures.

mlrl.testbed.experiments module

Author: Michael Rapp (mrapp@ke.tu-darmstadt.de)

Provides classes for performing experiments.

class mlrl.testbed.experiments.Experiment(base_learner: mlrl.common.learners.Learner, data_set: mlrl.testbed.training.DataSet, num_folds: int = 1, current_fold: int = - 1, train_evaluation: Optional[mlrl.testbed.evaluation.Evaluation] = None, test_evaluation: Optional[mlrl.testbed.evaluation.Evaluation] = None, parameter_input: Optional[mlrl.testbed.parameters.ParameterInput] = None, model_printer: Optional[mlrl.testbed.printing.ModelPrinter] = None, persistence: Optional[mlrl.testbed.persistence.ModelPersistence] = None)

Bases: mlrl.testbed.training.CrossValidation, abc.ABC

An experiment that trains and evaluates a single multi-label classifier or ranker on a specific data set using cross validation or separate training and test sets.

run()

mlrl.testbed.interfaces module

Author: Michael Rapp (mrapp@ke.tu-darmstadt.de)

Provides common interfaces that are implemented by several classes.

class mlrl.testbed.interfaces.Randomized

Bases: abc.ABC

A base class for all classifiers, rankers or modules that use RNGs.

Attributes

random_state The seed to be used by RNGs

random_state: int = 1

mlrl.testbed.io module

Author Michael Rapp (mrapp@ke.tu-darmstadt.de)

Provides functions for writing and reading files.

mlrl.testbed.io.clear_directory(directory: str)

Deletes all files contained in a directory (excluding subdirectories).

Parameters

directory – The directory to be cleared

mlrl.testbed.io.create_csv_dict_reader(csv_file) csv.DictReader

Creates and return a DictReader that allows to read from a CSV file.

Parameters

csv_file – The CSV file

Returns

The ‘DictReader’ that has been created

mlrl.testbed.io.create_csv_dict_writer(csv_file, header) csv.DictWriter

Creates and returns a DictWriter that allows to write a dictionary to a CSV file.

Parameters
  • csv_file – The CSV file

  • header – A list that contains the headers of the CSV file. They must correspond to the keys in the directory that should be written to the file

Returns

The DictWriter that has been created

mlrl.testbed.io.get_file_name(name: str, suffix: str)

Returns a file name, including a suffix.

Parameters
  • name – The name of the file (without suffix)

  • suffix – The suffix of the file

Returns

The file name

mlrl.testbed.io.get_file_name_per_fold(name: str, suffix: str, fold: int)

Returns a file name, including a suffix, that corresponds to a certain fold.

Parameters
  • name – The name of the file (without suffix)

  • suffix – The suffix of the file

  • fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold

Returns

The file name

mlrl.testbed.io.open_readable_csv_file(directory: str, file_name: str, fold: int)

Opens a CSV file to be read from.

Parameters
  • directory – The directory where the file is located

  • file_name – The name of the file to be opened (without suffix)

  • fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold

Returns

The file that has been opened

mlrl.testbed.io.open_writable_csv_file(directory: str, file_name: str, fold: Optional[int] = None, append: bool = False)

Opens a CSV file to be written to.

Parameters
  • directory – The directory where the file is located

  • file_name – The name of the file to be opened (without suffix)

  • fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold

  • append – True, if new data should be appended to the file, if it already exists, False otherwise

Returns

The file that has been opened

mlrl.testbed.io.open_writable_txt_file(directory: str, file_name: str, fold: Optional[int] = None, append: bool = False)

Opens a text file to be written to.

Parameters
  • directory – The directory where the file is located

  • file_name – The name of the file to be opened (without suffix)

  • fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold

  • append – True, if new data should be appended to the file, if it already exists, False otherwise

Returns

The file that has been opened

mlrl.testbed.io.write_xml_file(xml_file, root_element: xml.etree.ElementTree.Element, encoding='utf-8')

Writes a XML structure to a file.

Parameters
  • xml_file – The XML file

  • root_element – The root element of the XML structure

  • encoding – The encoding to be used

mlrl.testbed.parameters module

Author Michael Rapp (mrapp@ke.tu-darmstadt.de)

Provides classes for parameter tuning.

class mlrl.testbed.parameters.NestedCrossValidation(num_nested_folds: int)

Bases: mlrl.testbed.parameters.ParameterSearch

Allows to search for optimal parameters using (nested) cross validation.

search(meta_data: mlrl.testbed.data.MetaData, x, y, first_fold: int, current_fold: int, last_fold: int, num_folds: int)

Tests different parameter settings given a training data set.

Parameters
  • meta_data – The meta data of the training data set

  • x – The feature matrix of the training examples

  • y – The label matrix of the training examples

  • first_fold – The first fold or 0, if no cross validation is used

  • current_fold – The current fold starting at 0, or 0 if no cross validation is used

  • last_fold – The last fold or 0, if no cross validation is used

  • num_folds – The total number of cross validation folds or 1, if no cross validation is used

class mlrl.testbed.parameters.ParameterCsvInput(input_dir: str)

Bases: mlrl.testbed.parameters.ParameterInput

Reads parameter settings from CSV files.

read_parameters(fold: Optional[int] = None) dict

Reads a parameter setting from the input.

Parameters

fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

Returns

A dictionary that stores the parameters

class mlrl.testbed.parameters.ParameterCsvOutput(output_dir: str, clear_dir: bool = True)

Bases: mlrl.testbed.parameters.ParameterOutput

Writes parameter settings to CSV files.

write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)

Writes a parameter setting to the output.

Parameters
  • parameters – A dictionary that stores the parameters

  • score – The evaluation score that has been achieved using the parameter setting

  • total_folds – The total number of folds

  • fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

class mlrl.testbed.parameters.ParameterInput

Bases: abc.ABC

abstract read_parameters(fold: Optional[int] = None) dict

Reads a parameter setting from the input.

Parameters

fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

Returns

A dictionary that stores the parameters

class mlrl.testbed.parameters.ParameterLogOutput

Bases: mlrl.testbed.parameters.ParameterOutput

Outputs parameter settings using the logger.

write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)

Writes a parameter setting to the output.

Parameters
  • parameters – A dictionary that stores the parameters

  • score – The evaluation score that has been achieved using the parameter setting

  • total_folds – The total number of folds

  • fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

class mlrl.testbed.parameters.ParameterOutput

Bases: abc.ABC

An abstract base class for all outputs, parameter settings may be written to.

abstract write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)

Writes a parameter setting to the output.

Parameters
  • parameters – A dictionary that stores the parameters

  • score – The evaluation score that has been achieved using the parameter setting

  • total_folds – The total number of folds

  • fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold

class mlrl.testbed.parameters.ParameterSearch

Bases: mlrl.testbed.interfaces.Randomized, abc.ABC

A base class for all classes that implement strategies to search for optimal parameters given a training data set.

abstract get_params()

Returns the best parameter setting tested so far.

Returns

A dictionary that stores the parameters

abstract get_score()

Returns the evaluation score that has been achieved using the best parameter setting.

Returns

An evaluation score

abstract search(meta_data: mlrl.testbed.data.MetaData, x, y, first_fold: int, current_fold: int, last_fold: int, num_folds: int)

Tests different parameter settings given a training data set.

Parameters
  • meta_data – The meta data of the training data set

  • x – The feature matrix of the training examples

  • y – The label matrix of the training examples

  • first_fold – The first fold or 0, if no cross validation is used

  • current_fold – The current fold starting at 0, or 0 if no cross validation is used

  • last_fold – The last fold or 0, if no cross validation is used

  • num_folds – The total number of cross validation folds or 1, if no cross validation is used

class mlrl.testbed.parameters.ParameterTuning(data_set: mlrl.testbed.training.DataSet, num_folds: int, current_fold: int, parameter_search: mlrl.testbed.parameters.ParameterSearch, *args: mlrl.testbed.parameters.ParameterOutput)

Bases: mlrl.testbed.training.CrossValidation

Allows to tune parameters for a single training data set or all training data sets that are used in cross validation using a ParameterSearch and writes the optimal parameters to one or several outputs.

mlrl.testbed.persistence module

Author: Michael Rapp (mrapp@ke.tu-darmstadt.de)

Provides classes for saving/loading models to/from disk.

class mlrl.testbed.persistence.ModelPersistence(model_dir: str)

Bases: object

Allows to save a model in a file and load it later.

load_model(model_name: str, fold: Optional[int] = None, raise_exception: bool = False)

Loads a model from a file.

Parameters
  • model_name – The name of the model to be loaded

  • fold – The fold, the model corresponds to, or None if no cross validation is used

  • raise_exception – True, if an exception should be raised if an error occurs, False, if None should be returned in such case

Returns

The loaded model

save_model(model, model_name: str, fold: Optional[int] = None)

Saves a model to a file.

Parameters
  • model – The model to be persisted

  • model_name – The name of the model to be persisted

  • fold – The fold, the model corresponds to, or None if no cross validation is used

mlrl.testbed.printing module

Author: Michael Rapp (mrapp@ke.tu-darmstadt.de)

Provides classes for printing textual representations of models. The models can be written to one or several outputs, e.g. to the console or to a file.

class mlrl.testbed.printing.ModelPrinter(print_options: str, outputs: List[mlrl.testbed.printing.ModelPrinterOutput])

Bases: abc.ABC

An abstract base class for all classes that allow to print a textual representation of a MLLearner’s model.

print(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, learner: mlrl.common.learners.Learner, current_fold: int, num_folds: int)

Prints a textual representation of a MLLearner’s model.

Parameters
  • experiment_name – The name of the experiment

  • meta_data – The meta data of the training data set

  • learner – The learner

  • current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used

  • num_folds – The total number of cross validation folds or 1, if no cross validation is used

class mlrl.testbed.printing.ModelPrinterLogOutput

Bases: mlrl.testbed.printing.ModelPrinterOutput

Outputs the textual representation of a model using the logger.

write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)

Write a textual representation of a model to the output.

Parameters
  • experiment_name – The name of the experiment

  • model – The textual representation of the model

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

class mlrl.testbed.printing.ModelPrinterOutput

Bases: abc.ABC

An abstract base class for all outputs, textual representations of models may be written to.

abstract write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)

Write a textual representation of a model to the output.

Parameters
  • experiment_name – The name of the experiment

  • model – The textual representation of the model

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

class mlrl.testbed.printing.ModelPrinterTxtOutput(output_dir: str, clear_dir: bool = True)

Bases: mlrl.testbed.printing.ModelPrinterOutput

Writes the textual representation of a model to a text file.

write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)

Write a textual representation of a model to the output.

Parameters
  • experiment_name – The name of the experiment

  • model – The textual representation of the model

  • total_folds – The total number of folds

  • fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written

class mlrl.testbed.printing.RulePrinter(print_options: str, outputs: List[mlrl.testbed.printing.ModelPrinterOutput])

Bases: mlrl.testbed.printing.ModelPrinter

Allows to print a textual representation of a MLRuleLearner’s rule-based model.

mlrl.testbed.training module

Author: Michael Rapp (mrapp@ke.tu-darmstadt.de)

Provides classes for training and evaluating multi-label classifiers using either cross validation or separate training and test sets.

class mlrl.testbed.training.CrossValidation(data_set: mlrl.testbed.training.DataSet, num_folds: int, current_fold: int)

Bases: mlrl.testbed.interfaces.Randomized, abc.ABC

A base class for all classes that use cross validation or a train-test split to train and evaluate a multi-label classifier or ranker.

run()
class mlrl.testbed.training.DataSet(data_dir: str, data_set_name: str, use_one_hot_encoding: bool)

Bases: object

Stores the properties of a data set to be used for training and evaluating multi-label classifiers.

Module contents