mlrl.testbed.evaluation module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for evaluating the predictions or rankings provided by a multi-label learner according to different measures. The evaluation results can be written to one or several outputs, e.g., to the console or to a file.

class mlrl.testbed.evaluation.BinaryEvaluationWriter(sinks: List[Sink])

Bases: EvaluationWriter

Evaluates the quality of binary predictions provided by a single- or multi-label classifier according to commonly used bipartition measures.

class mlrl.testbed.evaluation.EvaluationFunction(option: str, name: str, evaluation_function, percentage: bool = True, **kwargs)

Bases: Formatter

An evaluation function.

evaluate(ground_truth, predictions) float

Applies the evaluation function to given predictions and ground truth labels.

Parameters:
  • ground_truth – The ground truth

  • predictions – The predictions

Returns:

An evaluation score

class mlrl.testbed.evaluation.EvaluationWriter(sinks: List[Sink])

Bases: OutputWriter, ABC

An abstract base class for all classes that evaluate the predictions provided by a learner and allow to write the evaluation results to one or several sinks.

class CsvSink(output_dir: str, options: ~mlrl.common.options.Options = <mlrl.common.options.Options object>)

Bases: CsvSink

Allows to write evaluation results to CSV files.

write_output(meta_data: MetaData, data_split: DataSplit, data_type: DataType | None, prediction_scope: PredictionScope | None, output_data, **kwargs)

See mlrl.testbed.output_writer.OutputWriter.Sink.write_output()

class EvaluationResult

Bases: Formattable, Tabularizable

Stores the evaluation results according to different measures.

avg(measure: Formatter, **kwargs) Tuple[str, str]

Returns the score and standard deviation according to a specific measure averaged over all available folds.

Parameters:

measure – The measure

Returns:

A tuple consisting of textual representations of the averaged score and standard deviation

avg_dict(**kwargs) Dict[Formatter, str]

Returns a dictionary that stores the scores, averaged across all folds, as well as the standard deviation, according to each measure.

Returns:

A dictionary that stores textual representations of the scores and standard deviation according to each measure

dict(fold: int | None, **kwargs) Dict[Formatter, str]

Returns a dictionary that stores the scores for a specific fold according to each measure.

Parameters:

fold – The fold, the scores correspond to, or None, if no cross validation is used

Returns:

A dictionary that stores textual representations of the scores for the given fold according to each measure

format(options: Options, **kwargs) str

See mlrl.testbed.output_writer.Formattable.format()

get(measure: Formatter, fold: int | None, **kwargs) str

Returns the score according to a specific measure.

Parameters:
  • measure – The measure

  • fold – The fold, the score corresponds to, or None, if no cross validation is used

Returns:

A textual representation of the score

put(measure: Formatter, score: float, num_folds: int, fold: int | None)

Adds a new score according to a specific measure to the evaluation result.

Parameters:
  • measure – The measure

  • score – The score according to the measure

  • num_folds – The total number of cross validation folds

  • fold – The fold, the score corresponds to, or None, if no cross validation is used

tabularize(options: Options, **kwargs) List[Dict[str, str]] | None

See mlrl.testbed.output_writer.Tabularizable.tabularize()

KWARG_FOLD = 'fold'
class LogSink(options: ~mlrl.common.options.Options = <mlrl.common.options.Options object>)

Bases: LogSink

Allows to write evaluation results to the console.

write_output(meta_data: MetaData, data_split: DataSplit, data_type: DataType | None, prediction_scope: PredictionScope | None, output_data, **kwargs)

See mlrl.testbed.output_writer.OutputWriter.Sink.write_output()

class mlrl.testbed.evaluation.ProbabilityEvaluationWriter(sinks: List[Sink])

Bases: ScoreEvaluationWriter

Evaluates the quality of probability estimates provided by a single- or multi-label classifier according to commonly used regression and ranking measures.

class mlrl.testbed.evaluation.ScoreEvaluationWriter(sinks: List[Sink])

Bases: EvaluationWriter

Evaluates the quality of regression scores provided by a single- or multi-output regressor according to commonly used regression and ranking measures.