mlrl.testbed.evaluation module¶

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for evaluating the predictions or rankings provided by a multi-label learner according to different measures. The evaluation results can be written to one or several outputs, e.g., to the console or to a file.

class mlrl.testbed.evaluation.BinaryEvaluationWriter(sinks: List[Sink])¶

Bases: EvaluationWriter

Evaluates the quality of binary predictions provided by a single- or multi-label classifier according to commonly used bipartition measures.

class mlrl.testbed.evaluation.EvaluationFunction(option: str, name: str, evaluation_function, percentage: bool = True, **kwargs)¶

Bases: Formatter

An evaluation function.

evaluate(ground_truth, predictions) → float¶

Applies the evaluation function to given predictions and ground truth labels.

Parameters:

ground_truth – The ground truth
predictions – The predictions

Returns:

An evaluation score

class mlrl.testbed.evaluation.EvaluationWriter(sinks: List[Sink])¶

Bases: OutputWriter, ABC

An abstract base class for all classes that evaluate the predictions provided by a learner and allow to write the evaluation results to one or several sinks.

class CsvSink(output_dir: str, options: ~mlrl.common.options.Options = <mlrl.common.options.Options object>)¶

Bases: CsvSink

Allows to write evaluation results to CSV files.

write_output(meta_data: MetaData, data_split: DataSplit, data_type: DataType | None, prediction_scope: PredictionScope | None, output_data, **kwargs)¶: See mlrl.testbed.output_writer.OutputWriter.Sink.write_output()

class EvaluationResult¶

Bases: Formattable, Tabularizable

Stores the evaluation results according to different measures.

avg(measure: Formatter, **kwargs) → Tuple[str, str]¶

Returns the score and standard deviation according to a specific measure averaged over all available folds.

Parameters:: measure – The measure
Returns:: A tuple consisting of textual representations of the averaged score and standard deviation

avg_dict(**kwargs) → Dict[Formatter, str]¶

Returns a dictionary that stores the scores, averaged across all folds, as well as the standard deviation, according to each measure.

Returns:: A dictionary that stores textual representations of the scores and standard deviation according to each measure

dict(fold: int | None, **kwargs) → Dict[Formatter, str]¶

Returns a dictionary that stores the scores for a specific fold according to each measure.

Parameters:: fold – The fold, the scores correspond to, or None, if no cross validation is used
Returns:: A dictionary that stores textual representations of the scores for the given fold according to each measure

format(options: Options, **kwargs) → str¶: See mlrl.testbed.output_writer.Formattable.format()

get(measure: Formatter, fold: int | None, **kwargs) → str¶

Returns the score according to a specific measure.

Parameters:

measure – The measure
fold – The fold, the score corresponds to, or None, if no cross validation is used

Returns:

A textual representation of the score

put(measure: Formatter, score: float, num_folds: int, fold: int | None)¶

Adds a new score according to a specific measure to the evaluation result.

Parameters:

measure – The measure
score – The score according to the measure
num_folds – The total number of cross validation folds
fold – The fold, the score corresponds to, or None, if no cross validation is used

tabularize(options: Options, **kwargs) → List[Dict[str, str]] | None¶: See mlrl.testbed.output_writer.Tabularizable.tabularize()

KWARG_FOLD = 'fold'¶

class LogSink(options: ~mlrl.common.options.Options = <mlrl.common.options.Options object>)¶

Bases: LogSink

Allows to write evaluation results to the console.

write_output(meta_data: MetaData, data_split: DataSplit, data_type: DataType | None, prediction_scope: PredictionScope | None, output_data, **kwargs)¶: See mlrl.testbed.output_writer.OutputWriter.Sink.write_output()

class mlrl.testbed.evaluation.ProbabilityEvaluationWriter(sinks: List[Sink])¶

Bases: ScoreEvaluationWriter

Evaluates the quality of probability estimates provided by a single- or multi-label classifier according to commonly used regression and ranking measures.

class mlrl.testbed.evaluation.ScoreEvaluationWriter(sinks: List[Sink])¶

Bases: EvaluationWriter

Evaluates the quality of regression scores provided by a single- or multi-output regressor according to commonly used regression and ranking measures.