mlrl.testbed.data module¶
Author: Michael Rapp (michael.rapp.ml@gmail.com)
Provides functions for handling multi-label data.
- class mlrl.testbed.data.Attribute(attribute_name: str, attribute_type: AttributeType, nominal_values: List[str] | None = None)¶
Bases:
object
Represents a numerical or nominal attribute that is contained by a data set.
- class mlrl.testbed.data.AttributeType(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
Enum
All supported types of attributes.
- NOMINAL = 3¶
- NUMERICAL = 1¶
- ORDINAL = 2¶
- class mlrl.testbed.data.Label(name: str)¶
Bases:
Attribute
Represents a label that is contained by a data set.
- class mlrl.testbed.data.MetaData(attributes: List[Attribute], labels: List[Attribute], labels_at_start: bool)¶
Bases:
object
Stores the meta-data of a multi-label data set.
- get_attribute_indices(attribute_types: Set[AttributeType] | None = None) List[int] ¶
Returns a list that contains the indices of all attributes with one out of a given set of types (in ascending order).
- Parameters:
attribute_types – A set that contains the types of the attributes whose indices should be returned or None, if all indices should be returned
- Returns:
A list that contains the indices of all attributes of the given types
- get_num_attributes(attribute_types: Set[AttributeType] | None = None) int ¶
Returns the number of attributes with one out of a given set of types.
- Parameters:
attribute_types – A set that contains the types of the attributes to be counted or None, if all attributes should be counted
- Returns:
The number of attributes of the given types
- mlrl.testbed.data.load_data_set(data_dir: str, arff_file_name: str, meta_data: ~mlrl.testbed.data.MetaData, feature_dtype=<class 'numpy.float32'>, label_dtype=<class 'numpy.uint8'>) Tuple[lil_matrix, lil_matrix] ¶
Loads a multi-label data set from an ARFF file given its meta-data.
- Parameters:
data_dir – The path of the directory that contains the ARFF file
arff_file_name – The name of the ARFF file (including the suffix)
meta_data – The meta-data
feature_dtype – The requested data type of the feature matrix
label_dtype – The requested data type of the label matrix
- Returns:
A scipy.sparse.lil_matrix of type feature_dtype, shape (num_examples, num_features), representing the feature values of the examples, as well as a scipy.sparse.lil_matrix of type label_dtype, shape (num_examples, num_labels), representing the corresponding label vectors
- mlrl.testbed.data.load_data_set_and_meta_data(data_dir: str, arff_file_name: str, xml_file_name: str, feature_dtype=<class 'numpy.float32'>, label_dtype=<class 'numpy.uint8'>) Tuple[lil_matrix, lil_matrix, MetaData] ¶
Loads a multi-label data set from an ARFF file and the corresponding Mulan XML file.
- Parameters:
data_dir – The path of the directory that contains the files
arff_file_name – The name of the ARFF file (including the suffix)
xml_file_name – The name of the XML file (including the suffix)
feature_dtype – The requested type of the feature matrix
label_dtype – The requested type of the label matrix
- Returns:
A scipy.sparse.lil_matrix of type feature_dtype, shape (num_examples, num_features), representing the feature values of the examples, a scipy.sparse.lil_matrix of type label_dtype, shape (num_examples, num_labels), representing the corresponding label vectors, as well as the data set’s meta-data
- mlrl.testbed.data.one_hot_encode(x, y, meta_data: MetaData, encoder=None)¶
One-hot encodes the nominal attributes contained in a data set, if any.
If the given feature matrix is sparse, it will be converted into a dense matrix. Also, an updated variant of the given meta-data, where the attributes have been removed, will be returned, as the original attributes become invalid by applying one-hot-encoding.
- Parameters:
x – A np.ndarray or scipy.sparse.matrix, shape (num_examples, num_features), representing the features of the examples in the data set
y – A np.ndarray or scipy.sparse.matrix, shape (num_examples, num_labels), representing the labels of the examples in the data set
meta_data – The meta-data of the data set
encoder – The ‘ColumnTransformer’ to be used or None, if a new encoder should be created
- Returns:
A np.ndarray, shape (num_examples, num_encoded_features), representing the encoded features of the given examples, the encoder that has been used, as well as the updated meta-data
- mlrl.testbed.data.save_arff_file(output_dir: str, arff_file_name: str, x: ndarray, y: ndarray, meta_data: MetaData)¶
Saves a multi-label data set to an ARFF file.
- Parameters:
output_dir – The path of the directory where the ARFF file should be saved
arff_file_name – The name of the ARFF file (including the suffix)
x – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the features of the examples that are contained in the data set
y – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of the examples that are contained in the data set
meta_data – The meta-data of the data set that should be saved
- mlrl.testbed.data.save_data_set(output_dir: str, arff_file_name: str, x: ndarray, y: ndarray) MetaData ¶
Saves a multi-label data set to an ARFF file. All attributes in the data set are considered to be numerical.
- Parameters:
output_dir – The path of the directory where the ARFF file should be saved
arff_file_name – The name of the ARFF file (including the suffix)
x – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_features), that stores the features of the examples that are contained in the data set
y – A np.ndarray or scipy.sparse matrix, shape (num_examples, num_labels), that stores the labels of the examples that are contained in the data set
- Returns:
The meta-data of the data set that has been saved
- mlrl.testbed.data.save_data_set_and_meta_data(output_dir: str, arff_file_name: str, xml_file_name: str, x: ndarray, y: ndarray) MetaData ¶
Saves a multi-label data set to an ARFF file and its meta-data to an XML file. All attributes in the data set are considered to be numerical.
- Parameters:
output_dir – The path of the directory where the ARFF file and the XML file should be saved
arff_file_name – The name of the ARFF file (including the suffix)
xml_file_name – The name of the XML file (including the suffix)
x – An array of type float, shape (num_examples, num_features), representing the features of the examples that are contained in the data set
y – An array of type float, shape (num_examples, num_labels), representing the label vectors of the examples that are contained in the data set
- Returns:
The meta-data of the data set that has been saved
- mlrl.testbed.data.save_meta_data(output_dir: str, xml_file_name: str, meta_data: MetaData)¶
Saves the meta-data of a multi-label data set to an XML file.
- Parameters:
output_dir – The path of the directory where the XML file should be saved
xml_file_name – The name of the XML file (including the suffix)
meta_data – The meta-data of the data set