mlrl.testbed_sklearn.experiments.dataset module¶
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides classes for representing tabular datasets.
- class mlrl.testbed_sklearn.experiments.dataset.Attribute(name: str, attribute_type: AttributeType, nominal_values: list[str] | None = None)¶
Bases:
objectAn attribute, e.g., a feature, a ground truth label, or a regression score, that is contained by a dataset.
- Attributes:
name: The name of the attribute attribute_type: The type of the attribute nominal_values: A list that contains the possible values in case of a nominal feature
- attribute_type: AttributeType¶
- class mlrl.testbed_sklearn.experiments.dataset.AttributeType(*values)¶
Bases:
EnumAll supported types of attributes.
- NOMINAL = 3¶
- NUMERICAL = 1¶
- ORDINAL = 2¶
- class mlrl.testbed_sklearn.experiments.dataset.TabularDataset(*args, **kwargs)¶
Bases:
AnyA tabular dataset consisting of two matrices x and y, storing the features of examples and their respective ground truth, respectively.
- Attributes:
x: A lil_array, shape (num_examples, num_features), that stores the features of examples y: A lil_array, shape (num_examples, num_features), that stores the ground truth of examples features: A list that contains all features in the dataset outputs: A list that contains all outputs in the dataset
- enforce_dense_features() TabularDataset¶
Creates and returns a copy of this dataset, where the feature values have been converted into a dense format.
- Returns:
The dataset that has been created
- enforce_dense_outputs() TabularDataset¶
Creates and returns a copy of this dataset, where the ground truth has been converted into a dense format.
- Returns:
The dataset that has been created
- get_feature_indices(*feature_types: AttributeType) list[int]¶
Returns a list that contains the indices of all features with one out of a given set of types (in ascending order). If no types are given, all indices are returned.
- Parameters:
feature_types – The types of the features whose indices should be returned
- Returns:
A list that contains the indices of all features of the given types
- get_num_features(*feature_types: AttributeType) int¶
Returns the number of features with one out of a given set of types. If no types are given, all features are counted.
- Parameters:
feature_types – The types of the features to be counted
- Returns:
The number of features of the given types
- property has_sparse_features: bool¶
True, if feature values in the dataset are sparse, False otherwise.