mlrl.testbed_sklearn.experiments.dataset module

Author: Michael Rapp (michael.rapp.ml@gmail.com)

Provides classes for representing tabular datasets.

class mlrl.testbed_sklearn.experiments.dataset.Attribute(name: str, attribute_type: AttributeType, nominal_values: list[str] | None = None)

Bases: object

An attribute, e.g., a feature, a ground truth label, or a regression score, that is contained by a dataset.

Attributes:

name: The name of the attribute attribute_type: The type of the attribute nominal_values: A list that contains the possible values in case of a nominal feature

attribute_type: AttributeType
name: str
nominal_values: list[str] | None = None
class mlrl.testbed_sklearn.experiments.dataset.AttributeType(*values)

Bases: Enum

All supported types of attributes.

NOMINAL = 3
NUMERICAL = 1
ORDINAL = 2
class mlrl.testbed_sklearn.experiments.dataset.TabularDataset(*args, **kwargs)

Bases: Any

A tabular dataset consisting of two matrices x and y, storing the features of examples and their respective ground truth, respectively.

Attributes:

x: A lil_array, shape (num_examples, num_features), that stores the features of examples y: A lil_array, shape (num_examples, num_features), that stores the ground truth of examples features: A list that contains all features in the dataset outputs: A list that contains all outputs in the dataset

enforce_dense_features() TabularDataset

Creates and returns a copy of this dataset, where the feature values have been converted into a dense format.

Returns:

The dataset that has been created

enforce_dense_outputs() TabularDataset

Creates and returns a copy of this dataset, where the ground truth has been converted into a dense format.

Returns:

The dataset that has been created

features: list[Attribute]
get_feature_indices(*feature_types: AttributeType) list[int]

Returns a list that contains the indices of all features with one out of a given set of types (in ascending order). If no types are given, all indices are returned.

Parameters:

feature_types – The types of the features whose indices should be returned

Returns:

A list that contains the indices of all features of the given types

get_num_features(*feature_types: AttributeType) int

Returns the number of features with one out of a given set of types. If no types are given, all features are counted.

Parameters:

feature_types – The types of the features to be counted

Returns:

The number of features of the given types

property has_sparse_features: bool

True, if feature values in the dataset are sparse, False otherwise.

property has_sparse_outputs: bool

True, if the ground truth in the dataset is sparse, False otherwise.

property num_examples: int

The number of examples in the dataset.

property num_features: int

The number of features in the dataset.

property num_outputs: int

The number of outputs in the dataset.

outputs: list[Attribute]
x: lil_array
y: lil_array