mlrl.testbed.data_characteristics module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for printing certain characteristics of multi-label data sets. The characteristics can be written to one or several outputs, e.g., to the console or to a file.

class mlrl.testbed.data_characteristics.DataCharacteristicsWriter(sinks: List[Sink])

Bases: OutputWriter

Allows to write the characteristics of a data set to one or severals sinks.

class CsvSink(output_dir: str, options: ~mlrl.common.options.Options = <mlrl.common.options.Options object>)

Bases: CsvSink

Allows to write the characteristics of a data set to a CSV file.

class DataCharacteristics(feature_characteristics: FeatureCharacteristics, label_characteristics: LabelCharacteristics)

Bases: Formattable, Tabularizable

Stores characteristics of a feature matrix and a label matrix.

format(options: Options, **_) str

See mlrl.testbed.output_writer.Formattable.format()

tabularize(options: Options, **_) List[Dict[str, str]] | None

See mlrl.testbed.output_writer.Tabularizable.tabularize()

class LogSink(options: ~mlrl.common.options.Options = <mlrl.common.options.Options object>)

Bases: LogSink

Allows to write the characteristics of a data set to the console.

class mlrl.testbed.data_characteristics.FeatureCharacteristics(meta_data: MetaData, x)

Bases: object

Stores characteristics of a feature matrix.

property feature_density

The feature density.

property feature_sparsity

The feature sparsity.

property num_features

The total number of features.

property num_nominal_features

The total number of nominal features.

property num_numerical_features

The total number of numerical features.

property num_ordinal_features

The total number of ordinal features.