mlrl.testbed package
Submodules
mlrl.testbed.args module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides functions for parsing command line arguments.
- class mlrl.testbed.args.LogLevel(value)
Bases:
enum.EnumAn enumeration.
- CRITICAL = 'critical'
- DEBUG = 'debug'
- ERROR = 'error'
- FATAL = 'fatal'
- INFO = 'info'
- NOTSET = 'notset'
- WARN = 'warn'
- WARNING = 'warning'
- mlrl.testbed.args.add_learner_arguments(parser: argparse.ArgumentParser)
- mlrl.testbed.args.add_log_level_argument(parser: argparse.ArgumentParser)
- mlrl.testbed.args.add_random_state_argument(parser: argparse.ArgumentParser)
- mlrl.testbed.args.add_rule_learner_arguments(parser: argparse.ArgumentParser)
- mlrl.testbed.args.boolean_string(s)
- mlrl.testbed.args.current_fold_string(s)
- mlrl.testbed.args.log_level(s)
- mlrl.testbed.args.optional_string(s)
mlrl.testbed.bbc_cv module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Implements “Bootstrap Bias Corrected Cross Validation” (BBC-CV) for evaluating different configurations of a learner and estimating unbiased performance estimations (see https://link.springer.com/article/10.1007/s10994-018-5714-4).
- class mlrl.testbed.bbc_cv.BbcCv(configurations: List[dict], adapter: mlrl.testbed.bbc_cv.BbcCvAdapter, bootstrapping: mlrl.testbed.bbc_cv.Bootstrapping, learner: mlrl.common.learners.Learner)
Bases:
mlrl.testbed.interfaces.RandomizedAn implementation of “Bootstrap Bias Corrected Cross Validation” (BBC-CV).
- evaluate(observer: mlrl.testbed.bbc_cv.BbcCvObserver)
- Parameters
observer – The BbcCvObserver to be used
- store_predictions()
- class mlrl.testbed.bbc_cv.BbcCvAdapter(data_set: mlrl.testbed.training.DataSet, num_folds: int, model_dir: str)
Bases:
mlrl.testbed.training.CrossValidationAn adapter that must be implemented for each type of model to be used with BBC-CV to obtain predictions for given test examples.
- fit(x, y)
- predict(x)
- run()
- class mlrl.testbed.bbc_cv.BbcCvObserver
Bases:
abc.ABCA base class for all observers that should be notified about the predictions and ground truth labellings that result from applying the BBC-CV method.
- abstract evaluate(configurations: List[dict], meta_data: mlrl.testbed.data.MetaData, ground_truth_tuning: numpy.ndarray, predictions_tuning: numpy.ndarray, ground_truth_test: numpy.ndarray, predictions_test: numpy.ndarray, current_bootstrap: int, num_bootstraps: int)
- Parameters
configurations – The configurations that have been provided to the BBC-CV method
meta_data – The meta data of the data set
ground_truth_tuning – The ground truth of the examples that belong to the tuning set
predictions_tuning – The predictions for the examples that belong to the tuning set
ground_truth_test – The ground truth of the examples that belong to the test set
predictions_test – The predictions for the examples that belong to the test set
current_bootstrap – The current bootstrap iteration
num_bootstraps – The total number of bootstrap iterations
- class mlrl.testbed.bbc_cv.Bootstrapping
Bases:
mlrl.testbed.interfaces.Randomized- abstract bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)
- class mlrl.testbed.bbc_cv.CV(data_set: mlrl.testbed.training.DataSet, num_folds: int, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)
- class mlrl.testbed.bbc_cv.CVBootstrapping(data_set: mlrl.testbed.training.DataSet, num_folds: int)
Bases:
mlrl.testbed.bbc_cv.Bootstrapping- bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)
- class mlrl.testbed.bbc_cv.DefaultBbcCvObserver(target_measure, target_measure_is_loss: bool, output_dir: Optional[str] = None)
Bases:
mlrl.testbed.bbc_cv.BbcCvObserverAn observer that determines the best configuration per bootstrap iteration and computes the evaluation measures averaged over all iterations.
- evaluate(configurations: List[dict], meta_data: mlrl.testbed.data.MetaData, ground_truth_tuning: numpy.ndarray, predictions_tuning: numpy.ndarray, ground_truth_test: numpy.ndarray, predictions_test: numpy.ndarray, current_bootstrap: int, num_bootstraps: int)
- Parameters
configurations – The configurations that have been provided to the BBC-CV method
meta_data – The meta data of the data set
ground_truth_tuning – The ground truth of the examples that belong to the tuning set
predictions_tuning – The predictions for the examples that belong to the tuning set
ground_truth_test – The ground truth of the examples that belong to the test set
predictions_test – The predictions for the examples that belong to the test set
current_bootstrap – The current bootstrap iteration
num_bootstraps – The total number of bootstrap iterations
- class mlrl.testbed.bbc_cv.DefaultBootstrapping(num_bootstraps: int)
Bases:
mlrl.testbed.bbc_cv.Bootstrapping- bootstrap(meta_data: mlrl.testbed.data.MetaData, prediction_matrix, ground_truth_matrix, configurations: List[dict], observer: mlrl.testbed.bbc_cv.BbcCvObserver)
mlrl.testbed.data module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides functions for handling multi-label data.
- class mlrl.testbed.data.Attribute(attribute_name: str, attribute_type: mlrl.testbed.data.AttributeType, nominal_values: Optional[List[str]] = None)
Bases:
objectRepresents a numerical or nominal attribute that is contained by a data set.
- class mlrl.testbed.data.AttributeType(value)
Bases:
enum.EnumAll supported types of attributes.
- NOMINAL = 2
- NUMERIC = 1
- class mlrl.testbed.data.Label(name: str)
Bases:
mlrl.testbed.data.AttributeRepresents a label that is contained by a data set.
- class mlrl.testbed.data.MetaData(attributes: List[mlrl.testbed.data.Attribute], labels: List[mlrl.testbed.data.Attribute], labels_at_start: bool)
Bases:
objectStores the meta data of a multi-label data set.
- get_attribute_indices(attribute_type: Optional[mlrl.testbed.data.AttributeType] = None) List[int]
Returns a list that contains the indices of all attributes of a specific type (in ascending order).
- Parameters
attribute_type – The type of the attributes whose indices should be returned or None, if all indices should be returned
- Returns
A list that contains the indices of all attributes of the given type
- mlrl.testbed.data.load_data_set(data_dir: str, arff_file_name: str, meta_data: mlrl.testbed.data.MetaData, feature_dtype=<class 'numpy.float32'>, label_dtype=<class 'numpy.uint8'>) -> (<class 'scipy.sparse._lil.lil_matrix'>, <class 'scipy.sparse._lil.lil_matrix'>)
Loads a multi-label data set from an ARFF file given its meta data.
- Parameters
data_dir – The path of the directory that contains the ARFF file
arff_file_name – The name of the ARFF file (including the suffix)
meta_data – The meta data
feature_dtype – The requested dtype of the feature matrix
label_dtype – The requested dtype of the label matrix
- Returns
A scipy.sparse.lil_matrix of type feature_dtype, shape (num_examples, num_features), representing the feature values of the examples, as well as a scipy.sparse.lil_matrix of type label_dtype, shape (num_examples, num_labels), representing the corresponding label vectors
- mlrl.testbed.data.load_data_set_and_meta_data(data_dir: str, arff_file_name: str, xml_file_name: str, feature_dtype=<class 'numpy.float32'>, label_dtype=<class 'numpy.uint8'>) -> (<class 'scipy.sparse._lil.lil_matrix'>, <class 'scipy.sparse._lil.lil_matrix'>, <class 'mlrl.testbed.data.MetaData'>)
Loads a multi-label data set from an ARFF file and the corresponding Mulan XML file.
- Parameters
data_dir – The path of the directory that contains the files
arff_file_name – The name of the ARFF file (including the suffix)
xml_file_name – The name of the XML file (including the suffix)
feature_dtype – The requested type of the feature matrix
label_dtype – The requested type of the label matrix
- Returns
A scipy.sparse.lil_matrix of type feature_dtype, shape (num_examples, num_features), representing the feature values of the examples, a scipy.sparse.lil_matrix of type label_dtype, shape (num_examples, num_labels), representing the corresponding label vectors, as well as the data set’s meta data
- mlrl.testbed.data.one_hot_encode(x, y, meta_data: mlrl.testbed.data.MetaData, encoder=None)
One-hot encodes the nominal attributes contained in a data set, if any.
If the given feature matrix is sparse, it will be converted into a dense matrix. Also, an updated variant of the given meta data, where the attributes have been removed, will be returned, as the original attributes become invalid by applying one-hot-encoding.
- Parameters
x – A np.ndarray or scipy.sparse.matrix, shape (num_examples, num_features), representing the features of the examples in the data set
y – A np.ndarray or scipy.sparse.matrix, shape (num_examples, num_labels), representing the labels of the examples in the data set
meta_data – The meta data of the data set
encoder – The ‘ColumnTransformer’ to be used or None, if a new encoder should be created
- Returns
A np.ndarray, shape (num_examples, num_encoded_features), representing the encoded features of the given examples, the encoder that has been used, as well as the updated meta data
- mlrl.testbed.data.save_arff_file(output_dir: str, arff_file_name: str, x: numpy.ndarray, y: numpy.ndarray, meta_data: mlrl.testbed.data.MetaData)
Saves a multi-label data set to an ARFF file.
- Parameters
output_dir – The path of the directory where the ARFF file should be saved
arff_file_name – The name of the ARFF file (including the suffix)
x – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the features of the examples that are contained in the data set
y – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of the examples that are contained in the data set
meta_data – The meta data of the data set that should be saved
- mlrl.testbed.data.save_data_set(output_dir: str, arff_file_name: str, x: numpy.ndarray, y: numpy.ndarray) mlrl.testbed.data.MetaData
Saves a multi-label data set to an ARFF file. All attributes in the data set are considered to be numerical.
- Parameters
output_dir – The path of the directory where the ARFF file should be saved
arff_file_name – The name of the ARFF file (including the suffix)
x – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the features of the examples that are contained in the data set
y – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of the examples that are contained in the data set
- Returns
The meta data of the data set that has been saved
- mlrl.testbed.data.save_data_set_and_meta_data(output_dir: str, arff_file_name: str, xml_file_name: str, x: numpy.ndarray, y: numpy.ndarray) mlrl.testbed.data.MetaData
Saves a multi-label data set to an ARFF file and its meta data to a XML file. All attributes in the data set are considered to be numerical.
- Parameters
output_dir – The path of the directory where the ARFF file and the XML file should be saved
arff_file_name – The name of the ARFF file (including the suffix)
xml_file_name – The name of the XML file (including the suffix)
x – An array of type float, shape (num_examples, num_features), representing the features of the examples that are contained in the data set
y – An array of type float, shape (num_examples, num_labels), representing the label vectors of the examples that are contained in the data set
- Returns
The meta data of the data set that has been saved
- mlrl.testbed.data.save_meta_data(output_dir: str, xml_file_name: str, meta_data: mlrl.testbed.data.MetaData)
Saves the meta data of a multi-label data set to a XML file.
- Parameters
output_dir – The path of the directory where the XML file should be saved
xml_file_name – The name of the XML file (including the suffix)
meta_data – The meta data of the data set
mlrl.testbed.data_characteristics module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides functions to determine certain characteristics of multi-label data sets.
- class mlrl.testbed.data_characteristics.DataCharacteristics(num_examples: int, num_nominal_features: int, num_numerical_features: int, feature_density: float, num_labels: int, label_density: float, avg_label_imbalance_ratio: float, avg_label_cardinality: float, num_distinct_label_vectors: int)
Bases:
objectStores characteristics of a multi-label data set.
- class mlrl.testbed.data_characteristics.DataCharacteristicsCsvOutput(output_dir: str, clear_dir: bool = True)
Bases:
mlrl.testbed.data_characteristics.DataCharacteristicsOutputWrites the characteristics of a data set to a CSV file.
- write_data_characteristics(experiment_name: str, characteristics: mlrl.testbed.data_characteristics.DataCharacteristics, total_folds: int, fold: Optional[int] = None)
Writes the characteristics of a data set to the output.
- Parameters
experiment_name – The name of the experiment
characteristics – The characteristics of the data set
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used
- class mlrl.testbed.data_characteristics.DataCharacteristicsLogOutput
Bases:
mlrl.testbed.data_characteristics.DataCharacteristicsOutputOutputs the characteristics of a data set using the logger.
- write_data_characteristics(experiment_name: str, characteristics: mlrl.testbed.data_characteristics.DataCharacteristics, total_folds: int, fold: Optional[int] = None)
Writes the characteristics of a data set to the output.
- Parameters
experiment_name – The name of the experiment
characteristics – The characteristics of the data set
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used
- class mlrl.testbed.data_characteristics.DataCharacteristicsOutput
Bases:
abc.ABCAn abstract base class for all outputs, the characteristics of a data set may be written to.
- abstract write_data_characteristics(experiment_name: str, characteristics: mlrl.testbed.data_characteristics.DataCharacteristics, total_folds: int, fold: Optional[int] = None)
Writes the characteristics of a data set to the output.
- Parameters
experiment_name – The name of the experiment
characteristics – The characteristics of the data set
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used
- class mlrl.testbed.data_characteristics.DataCharacteristicsPrinter(outputs: List[mlrl.testbed.data_characteristics.DataCharacteristicsOutput])
Bases:
objectA class that allows to print the characteristics of data sets.
- print(experiment_name: str, x, y, meta_data: mlrl.testbed.data.MetaData, current_fold: int, num_folds: int)
- Parameters
experiment_name – The name of the experiment
x – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the feature values
y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the ground truth labels
meta_data – The meta data of the data set
current_fold – The current fold
num_folds – The total number of folds
- mlrl.testbed.data_characteristics.density(m) float
Calculates and returns the density of a given feature or label matrix.
- Parameters
m – A numpy.ndarray or scipy.sparse matrix, shape (num_rows, num_cols), that stores the feature values of training examples or their labels
- Returns
The fraction of non-zero elements in the given matrix among all elements
- mlrl.testbed.data_characteristics.distinct_label_vectors(y) int
Determines and returns the number of distinct label vectors in a label matrix.
- Parameters
y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of training examples
- Returns
The number of distinct label vectors in the given matrix
- mlrl.testbed.data_characteristics.label_cardinality(y) float
Calculates and returns the average label cardinality of a given label matrix.
- Parameters
y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of training examples
- Returns
The average number of relevant labels per training example
- mlrl.testbed.data_characteristics.label_imbalance_ratio(y) float
Calculates and returns the average label imbalance ratio of a given label matrix.
- Parameters
y – A numpy.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of training examples
- Returns
The label imbalance ratio averaged over the available labels
mlrl.testbed.evaluation module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides classes for evaluating the predictions or rankings provided by a multi-label learner according to different measures. The evaluation results can be written to one or several outputs, e.g. to the console or to a file.
- class mlrl.testbed.evaluation.AbstractEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)
Bases:
mlrl.testbed.evaluation.EvaluationAn abstract base class for all classes that evaluate the predictions provided by a classifier or ranker and allow to write the results to one or several outputs.
- evaluate(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, first_fold: int, current_fold: int, last_fold: int, num_folds: int, train_time: float, predict_time: float)
Evaluates the predictions provided by a classifier or ranker.
- Parameters
experiment_name – The name of the experiment
meta_data – The meta data of the data set
predictions – The predictions provided by the classifier
ground_truth – The true labels
first_fold – The first cross validation fold or 0, if no cross validation is used
current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used
last_fold – The last cross validation fold or 0, if no cross validation is used
num_folds – The total number of cross validation folds or 1, if no cross validation is used
train_time – The time needed to train the model
predict_time – The time needed to make predictions
- class mlrl.testbed.evaluation.ClassificationEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)
Bases:
mlrl.testbed.evaluation.AbstractEvaluationEvaluates the predictions of a single- or multi-label classifier according to commonly used bipartition measures.
- class mlrl.testbed.evaluation.Evaluation
Bases:
abc.ABCAn abstract base class for all classes that evaluate the predictions provided by a classifier or ranker.
- abstract evaluate(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, first_fold: int, current_fold: int, last_fold: int, num_folds: int, train_time: float, predict_time: float)
Evaluates the predictions provided by a classifier or ranker.
- Parameters
experiment_name – The name of the experiment
meta_data – The meta data of the data set
predictions – The predictions provided by the classifier
ground_truth – The true labels
first_fold – The first cross validation fold or 0, if no cross validation is used
current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used
last_fold – The last cross validation fold or 0, if no cross validation is used
num_folds – The total number of cross validation folds or 1, if no cross validation is used
train_time – The time needed to train the model
predict_time – The time needed to make predictions
- class mlrl.testbed.evaluation.EvaluationCsvOutput(output_dir: str, clear_dir: bool = True, output_predictions: bool = False, output_individual_folds: bool = True)
Bases:
mlrl.testbed.evaluation.EvaluationOutputWrites evaluation results to CSV files.
- write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)
Writes an evaluation result to the output.
- Parameters
experiment_name – The name of the experiment
evaluation_result – The evaluation result to be written
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written
- write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)
Writes predictions to the output.
- Parameters
experiment_name – The name of the experiment
meta_data – The meta data of the data set
predictions – The predictions
ground_truth – The ground truth
total_folds – The total number of folds
fold – The fold for which the predictions should be written or None, if no cross validation is used
- class mlrl.testbed.evaluation.EvaluationLogOutput(output_predictions: bool = False, output_individual_folds: bool = True)
Bases:
mlrl.testbed.evaluation.EvaluationOutputOutputs evaluation result using the logger.
- write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)
Writes an evaluation result to the output.
- Parameters
experiment_name – The name of the experiment
evaluation_result – The evaluation result to be written
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written
- write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)
Writes predictions to the output.
- Parameters
experiment_name – The name of the experiment
meta_data – The meta data of the data set
predictions – The predictions
ground_truth – The ground truth
total_folds – The total number of folds
fold – The fold for which the predictions should be written or None, if no cross validation is used
- class mlrl.testbed.evaluation.EvaluationOutput(output_predictions: bool, output_individual_folds: bool)
Bases:
abc.ABCAn abstract base class for all outputs, evaluation results may be written to.
- abstract write_evaluation_results(experiment_name: str, evaluation_result: mlrl.testbed.evaluation.EvaluationResult, total_folds: int, fold: Optional[int] = None)
Writes an evaluation result to the output.
- Parameters
experiment_name – The name of the experiment
evaluation_result – The evaluation result to be written
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written
- abstract write_predictions(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, predictions, ground_truth, total_folds: int, fold: Optional[int] = None)
Writes predictions to the output.
- Parameters
experiment_name – The name of the experiment
meta_data – The meta data of the data set
predictions – The predictions
ground_truth – The ground truth
total_folds – The total number of folds
fold – The fold for which the predictions should be written or None, if no cross validation is used
- class mlrl.testbed.evaluation.EvaluationResult
Bases:
objectStores the evaluation results according to different measures.
- avg(name: str) -> (<class 'float'>, <class 'float'>)
Returns the score and standard deviation according to a specific measure averaged over all available folds.
- Parameters
name – The name of the measure
- Returns
A tuple consisting of the averaged score and standard deviation
- avg_dict() Dict
- dict(fold: int) Dict
- get(name: str, fold: int) float
Returns the score according to a specific measure and fold.
- Parameters
name – The name of the measure
fold – The fold the score corresponds to
- Returns
The score
- put(name: str, score: float, fold: int, num_folds: int)
Adds a new score according to a specific measure to the evaluation result.
- Parameters
name – The name of the measure
score – The score according to the measure
fold – The fold the score corresponds to
num_folds – The total number of cross validation folds
- class mlrl.testbed.evaluation.RankingEvaluation(*args: mlrl.testbed.evaluation.EvaluationOutput)
Bases:
mlrl.testbed.evaluation.AbstractEvaluationEvaluates the predictions of a multi-label ranker according to commonly used ranking measures.
mlrl.testbed.experiments module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides classes for performing experiments.
- class mlrl.testbed.experiments.Experiment(base_learner: mlrl.common.learners.Learner, data_set: mlrl.testbed.training.DataSet, num_folds: int = 1, current_fold: int = - 1, train_evaluation: Optional[mlrl.testbed.evaluation.Evaluation] = None, test_evaluation: Optional[mlrl.testbed.evaluation.Evaluation] = None, parameter_input: Optional[mlrl.testbed.parameters.ParameterInput] = None, model_printer: Optional[mlrl.testbed.model_characteristics.ModelPrinter] = None, model_characteristics_printer: Optional[mlrl.testbed.model_characteristics.ModelCharacteristicsPrinter] = None, data_characteristics_printer: Optional[mlrl.testbed.data_characteristics.DataCharacteristicsPrinter] = None, persistence: Optional[mlrl.testbed.persistence.ModelPersistence] = None)
Bases:
mlrl.testbed.training.CrossValidation,abc.ABCAn experiment that trains and evaluates a single multi-label classifier or ranker on a specific data set using cross validation or separate training and test sets.
- run()
mlrl.testbed.interfaces module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides common interfaces that are implemented by several classes.
mlrl.testbed.io module
Author Michael Rapp (michael.rapp.ml@gmail.com)
Provides functions for writing and reading files.
- mlrl.testbed.io.clear_directory(directory: str)
Deletes all files contained in a directory (excluding subdirectories).
- Parameters
directory – The directory to be cleared
- mlrl.testbed.io.create_csv_dict_reader(csv_file) csv.DictReader
Creates and return a DictReader that allows to read from a CSV file.
- Parameters
csv_file – The CSV file
- Returns
The ‘DictReader’ that has been created
- mlrl.testbed.io.create_csv_dict_writer(csv_file, header) csv.DictWriter
Creates and returns a DictWriter that allows to write a dictionary to a CSV file.
- Parameters
csv_file – The CSV file
header – A list that contains the headers of the CSV file. They must correspond to the keys in the directory that should be written to the file
- Returns
The DictWriter that has been created
- mlrl.testbed.io.get_file_name(name: str, suffix: str)
Returns a file name, including a suffix.
- Parameters
name – The name of the file (without suffix)
suffix – The suffix of the file
- Returns
The file name
- mlrl.testbed.io.get_file_name_per_fold(name: str, suffix: str, fold: int)
Returns a file name, including a suffix, that corresponds to a certain fold.
- Parameters
name – The name of the file (without suffix)
suffix – The suffix of the file
fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold
- Returns
The file name
- mlrl.testbed.io.open_readable_csv_file(directory: str, file_name: str, fold: int)
Opens a CSV file to be read from.
- Parameters
directory – The directory where the file is located
file_name – The name of the file to be opened (without suffix)
fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold
- Returns
The file that has been opened
- mlrl.testbed.io.open_writable_csv_file(directory: str, file_name: str, fold: Optional[int] = None, append: bool = False)
Opens a CSV file to be written to.
- Parameters
directory – The directory where the file is located
file_name – The name of the file to be opened (without suffix)
fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold
append – True, if new data should be appended to the file, if it already exists, False otherwise
- Returns
The file that has been opened
- mlrl.testbed.io.open_writable_txt_file(directory: str, file_name: str, fold: Optional[int] = None, append: bool = False)
Opens a text file to be written to.
- Parameters
directory – The directory where the file is located
file_name – The name of the file to be opened (without suffix)
fold – The cross validation fold, the file corresponds to, or None, if the file does not correspond to a specific fold
append – True, if new data should be appended to the file, if it already exists, False otherwise
- Returns
The file that has been opened
- mlrl.testbed.io.write_xml_file(xml_file, root_element: xml.etree.ElementTree.Element, encoding='utf-8')
Writes a XML structure to a file.
- Parameters
xml_file – The XML file
root_element – The root element of the XML structure
encoding – The encoding to be used
mlrl.testbed.main_boomer module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
- class mlrl.testbed.main_boomer.BoomerRunnable
- mlrl.testbed.main_boomer.main()
mlrl.testbed.main_seco module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
- class mlrl.testbed.main_seco.SeCoRunnable
- mlrl.testbed.main_seco.main()
mlrl.testbed.model_characteristics module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides classes for printing textual representations of models. The models can be written to one or several outputs, e.g. to the console or to a file.
- class mlrl.testbed.model_characteristics.ModelCharacteristicsPrinter
Bases:
abc.ABCA class that allows to print the characteristics of a Learner’s model.
- print(experiment_name: str, learner: mlrl.common.learners.Learner, current_fold: int, num_folds: int)
- class mlrl.testbed.model_characteristics.ModelPrinter(print_options: str, outputs: List[mlrl.testbed.model_characteristics.ModelPrinterOutput])
Bases:
abc.ABCAn abstract base class for all classes that allow to print a textual representation of a Learner’s model.
- print(experiment_name: str, meta_data: mlrl.testbed.data.MetaData, learner: mlrl.common.learners.Learner, current_fold: int, num_folds: int)
Prints a textual representation of a Learner’s model.
- Parameters
experiment_name – The name of the experiment
meta_data – The meta data of the training data set
learner – The learner
current_fold – The current cross validation fold starting at 0, or 0 if no cross validation is used
num_folds – The total number of cross validation folds or 1, if no cross validation is used
- class mlrl.testbed.model_characteristics.ModelPrinterLogOutput
Bases:
mlrl.testbed.model_characteristics.ModelPrinterOutputOutputs the textual representation of a model using the logger.
- write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)
Write a textual representation of a model to the output.
- Parameters
experiment_name – The name of the experiment
model – The textual representation of the model
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written
- class mlrl.testbed.model_characteristics.ModelPrinterOutput
Bases:
abc.ABCAn abstract base class for all outputs, textual representations of models may be written to.
- abstract write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)
Write a textual representation of a model to the output.
- Parameters
experiment_name – The name of the experiment
model – The textual representation of the model
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written
- class mlrl.testbed.model_characteristics.ModelPrinterTxtOutput(output_dir: str, clear_dir: bool = True)
Bases:
mlrl.testbed.model_characteristics.ModelPrinterOutputWrites the textual representation of a model to a text file.
- write_model(experiment_name: str, model: str, total_folds: int, fold: Optional[int] = None)
Write a textual representation of a model to the output.
- Parameters
experiment_name – The name of the experiment
model – The textual representation of the model
total_folds – The total number of folds
fold – The fold for which the results should be written or None, if no cross validation is used or if the overall results, averaged over all folds, should be written
- class mlrl.testbed.model_characteristics.RuleModelCharacteristics(default_rule_index: int, default_rule_pos_predictions: int, default_rule_neg_predictions: int, num_leq: numpy.ndarray, num_gr: numpy.ndarray, num_eq: numpy.ndarray, num_neq: numpy.ndarray, num_pos_predictions: numpy.ndarray, num_neg_predictions: numpy.ndarray)
Bases:
objectStores the characteristics of a RuleModel.
- class mlrl.testbed.model_characteristics.RuleModelCharacteristicsCsvOutput(output_dir: str, clear_dir: bool = True)
Bases:
mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutputWrites the characteristics of a RuleModel to a CSV file.
- COL_CONDITIONS = 'conditions'
- COL_EQ_CONDITIONS = 'conditions using == operator'
- COL_GR_CONDITIONS = 'conditions using > operator'
- COL_LEQ_CONDITIONS = 'conditions using <= operator'
- COL_NEG_PREDICTIONS = 'neg. predictions'
- COL_NEQ_CONDITIONS = 'conditions using != operator'
- COL_NOMINAL_CONDITIONS = 'nominal conditions'
- COL_NUMERICAL_CONDITIONS = 'numerical conditions'
- COL_POS_PREDICTIONS = 'pos. predictions'
- COL_PREDICTIONS = 'predictions'
- COL_RULE_NAME = 'Rule'
- write_model_characteristics(experiment_name: str, characteristics: mlrl.testbed.model_characteristics.RuleModelCharacteristics, total_folds: int, fold: Optional[int] = None)
Writes the characteristics of a RuleModel to the output.
- Parameters
experiment_name – The name of the experiment
characteristics – The characteristics of the model
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used
- class mlrl.testbed.model_characteristics.RuleModelCharacteristicsLogOutput
Bases:
mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutputOutputs the characteristics of a RuleModel using the logger.
- write_model_characteristics(experiment_name: str, characteristics: mlrl.testbed.model_characteristics.RuleModelCharacteristics, total_folds: int, fold: Optional[int] = None)
Writes the characteristics of a RuleModel to the output.
- Parameters
experiment_name – The name of the experiment
characteristics – The characteristics of the model
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used
- class mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutput
Bases:
abc.ABCAn abstract base class for all outputs, the characteristics of a MLRuleLearner’s model may be written to.
- abstract write_model_characteristics(experiment_name: str, characteristics: mlrl.testbed.model_characteristics.RuleModelCharacteristics, total_folds: int, fold: Optional[int] = None)
Writes the characteristics of a RuleModel to the output.
- Parameters
experiment_name – The name of the experiment
characteristics – The characteristics of the model
total_folds – The total number of folds
fold – The fold for which the characteristics should be written or None, if no cross validation is used
- class mlrl.testbed.model_characteristics.RuleModelCharacteristicsPrinter(outputs: List[mlrl.testbed.model_characteristics.RuleModelCharacteristicsOutput])
Bases:
mlrl.testbed.model_characteristics.ModelCharacteristicsPrinterA class that allows to print the characteristics of MLRuleLearner’s model.
- class mlrl.testbed.model_characteristics.RuleModelCharacteristicsVisitor
Bases:
mlrl.common.cython.rule_model.RuleModelVisitorA visitor that allows to determine the characteristics of a RuleModel.
- visit_complete_head(head: mlrl.common.cython.rule_model.CompleteHead)
Must be implemented by subclasses in order to visit the heads of rules that predict for all available labels.
- Parameters
head – A CompleteHead to be visited
- visit_conjunctive_body(body: mlrl.common.cython.rule_model.ConjunctiveBody)
Must be implemented by subclasses in order to visit the bodies of rule that are given as a conjunction of several conditions.
- Parameters
body – A ConjunctiveBody to be visited
- visit_empty_body(_: mlrl.common.cython.rule_model.EmptyBody)
Must be implemented by subclasses in order to visit bodies of rules that do not contain any conditions.
- Parameters
body – An EmptyBody to be visited
- visit_partial_head(head: mlrl.common.cython.rule_model.PartialHead)
Must be implemented by subclasses in order to visit the heads of rules that predict for a subset of the available labels.
- Parameters
head – A PartialHead to be visited
- class mlrl.testbed.model_characteristics.RuleModelFormatter(attributes: List[mlrl.testbed.data.Attribute], labels: List[mlrl.testbed.data.Attribute], print_feature_names: bool, print_label_names: bool, print_nominal_values: bool)
Bases:
mlrl.common.cython.rule_model.RuleModelVisitorAllows to create textual representation of the rules in a RuleModel.
- get_text() str
Returns the textual representation that has been created via the format method.
- Returns
The textual representation
- visit_complete_head(head: mlrl.common.cython.rule_model.CompleteHead)
Must be implemented by subclasses in order to visit the heads of rules that predict for all available labels.
- Parameters
head – A CompleteHead to be visited
- visit_conjunctive_body(body: mlrl.common.cython.rule_model.ConjunctiveBody)
Must be implemented by subclasses in order to visit the bodies of rule that are given as a conjunction of several conditions.
- Parameters
body – A ConjunctiveBody to be visited
- visit_empty_body(_: mlrl.common.cython.rule_model.EmptyBody)
Must be implemented by subclasses in order to visit bodies of rules that do not contain any conditions.
- Parameters
body – An EmptyBody to be visited
- visit_partial_head(head: mlrl.common.cython.rule_model.PartialHead)
Must be implemented by subclasses in order to visit the heads of rules that predict for a subset of the available labels.
- Parameters
head – A PartialHead to be visited
- class mlrl.testbed.model_characteristics.RulePrinter(print_options: str, outputs: List[mlrl.testbed.model_characteristics.ModelPrinterOutput])
Bases:
mlrl.testbed.model_characteristics.ModelPrinterAllows to print a textual representation of a MLRuleLearner’s rule-based model.
mlrl.testbed.parameters module
Author Michael Rapp (michael.rapp.ml@gmail.com)
Provides classes for parameter tuning.
- class mlrl.testbed.parameters.NestedCrossValidation(num_nested_folds: int)
Bases:
mlrl.testbed.parameters.ParameterSearchAllows to search for optimal parameters using (nested) cross validation.
- search(meta_data: mlrl.testbed.data.MetaData, x, y, first_fold: int, current_fold: int, last_fold: int, num_folds: int)
Tests different parameter settings given a training data set.
- Parameters
meta_data – The meta data of the training data set
x – The feature matrix of the training examples
y – The label matrix of the training examples
first_fold – The first fold or 0, if no cross validation is used
current_fold – The current fold starting at 0, or 0 if no cross validation is used
last_fold – The last fold or 0, if no cross validation is used
num_folds – The total number of cross validation folds or 1, if no cross validation is used
- class mlrl.testbed.parameters.ParameterCsvInput(input_dir: str)
Bases:
mlrl.testbed.parameters.ParameterInputReads parameter settings from CSV files.
- read_parameters(fold: Optional[int] = None) dict
Reads a parameter setting from the input.
- Parameters
fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold
- Returns
A dictionary that stores the parameters
- class mlrl.testbed.parameters.ParameterCsvOutput(output_dir: str, clear_dir: bool = True)
Bases:
mlrl.testbed.parameters.ParameterOutputWrites parameter settings to CSV files.
- write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)
Writes a parameter setting to the output.
- Parameters
parameters – A dictionary that stores the parameters
score – The evaluation score that has been achieved using the parameter setting
total_folds – The total number of folds
fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold
- class mlrl.testbed.parameters.ParameterInput
Bases:
abc.ABC- abstract read_parameters(fold: Optional[int] = None) dict
Reads a parameter setting from the input.
- Parameters
fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold
- Returns
A dictionary that stores the parameters
- class mlrl.testbed.parameters.ParameterLogOutput
Bases:
mlrl.testbed.parameters.ParameterOutputOutputs parameter settings using the logger.
- write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)
Writes a parameter setting to the output.
- Parameters
parameters – A dictionary that stores the parameters
score – The evaluation score that has been achieved using the parameter setting
total_folds – The total number of folds
fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold
- class mlrl.testbed.parameters.ParameterOutput
Bases:
abc.ABCAn abstract base class for all outputs, parameter settings may be written to.
- abstract write_parameters(parameters: dict, score: float, total_folds: int, fold: Optional[int] = None)
Writes a parameter setting to the output.
- Parameters
parameters – A dictionary that stores the parameters
score – The evaluation score that has been achieved using the parameter setting
total_folds – The total number of folds
fold – The fold, the parameter setting corresponds to, or None, if the parameter setting does not correspond to a specific fold
- class mlrl.testbed.parameters.ParameterSearch
Bases:
mlrl.testbed.interfaces.Randomized,abc.ABCA base class for all classes that implement strategies to search for optimal parameters given a training data set.
- abstract get_params()
Returns the best parameter setting tested so far.
- Returns
A dictionary that stores the parameters
- abstract get_score()
Returns the evaluation score that has been achieved using the best parameter setting.
- Returns
An evaluation score
- abstract search(meta_data: mlrl.testbed.data.MetaData, x, y, first_fold: int, current_fold: int, last_fold: int, num_folds: int)
Tests different parameter settings given a training data set.
- Parameters
meta_data – The meta data of the training data set
x – The feature matrix of the training examples
y – The label matrix of the training examples
first_fold – The first fold or 0, if no cross validation is used
current_fold – The current fold starting at 0, or 0 if no cross validation is used
last_fold – The last fold or 0, if no cross validation is used
num_folds – The total number of cross validation folds or 1, if no cross validation is used
- class mlrl.testbed.parameters.ParameterTuning(data_set: mlrl.testbed.training.DataSet, num_folds: int, current_fold: int, parameter_search: mlrl.testbed.parameters.ParameterSearch, *args: mlrl.testbed.parameters.ParameterOutput)
Bases:
mlrl.testbed.training.CrossValidationAllows to tune parameters for a single training data set or all training data sets that are used in cross validation using a ParameterSearch and writes the optimal parameters to one or several outputs.
mlrl.testbed.persistence module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides classes for saving/loading models to/from disk.
- class mlrl.testbed.persistence.ModelPersistence(model_dir: str)
Bases:
objectAllows to save a model in a file and load it later.
- load_model(model_name: str, fold: Optional[int] = None, raise_exception: bool = False)
Loads a model from a file.
- Parameters
model_name – The name of the model to be loaded
fold – The fold, the model corresponds to, or None if no cross validation is used
raise_exception – True, if an exception should be raised if an error occurs, False, if None should be returned in such case
- Returns
The loaded model
- save_model(model, model_name: str, fold: Optional[int] = None)
Saves a model to a file.
- Parameters
model – The model to be persisted
model_name – The name of the model to be persisted
fold – The fold, the model corresponds to, or None if no cross validation is used
mlrl.testbed.runnables module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides base classes for programs that can be configured via command line arguments.
- class mlrl.testbed.runnables.RuleLearnerRunnable
Bases:
mlrl.testbed.runnables.Runnable,abc.ABCA base class for all programs that perform an experiment that involves training and evaluation of a rule learner.
mlrl.testbed.training module
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides classes for training and evaluating multi-label classifiers using either cross validation or separate training and test sets.
- class mlrl.testbed.training.CrossValidation(data_set: mlrl.testbed.training.DataSet, num_folds: int, current_fold: int)
Bases:
mlrl.testbed.interfaces.Randomized,abc.ABCA base class for all classes that use cross validation or a train-test split to train and evaluate a multi-label classifier or ranker.
- run()
- class mlrl.testbed.training.DataSet(data_dir: str, data_set_name: str, use_one_hot_encoding: bool)
Bases:
objectStores the properties of a data set to be used for training and evaluating multi-label classifiers.