Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
BOOMER 0.15.4
BOOMER 0.15.4

Quickstart

  • Installation
  • Using the Python API
  • Using the Command Line API

User Guide

  • Foundations
    • Rule Learning Algorithms
    • Problem Definition
    • Conceptual Framework
  • The BOOMER Algorithm
    • Methodology
    • Overview of Parameters
  • The SeCo Algorithm
    • Overview of Parameters
  • Algorithmic Optimizations
    • Pre-Sorted Search
    • Exploiting Feature Sparsity
    • Nominal Features
    • Missing Feature Values
    • Histogram-based Search
    • Multi-Threading
  • mlrl-testbed
    • Introduction
    • Supported Dataset Formats
    • Performance Evaluation
    • Data Pre-Processing
    • Saving and Loading Data
    • Using Your Own Algorithms
    • Batch Mode
    • Run Mode
    • Read Mode
    • Overview of Arguments
  • References

Developer Guide

  • Project Structure
  • Building from Source
  • Generating the Documentation
  • Continuous Integration
  • Coding Standards
  • Python API Reference
    • Package mlrl-testbed-sklearn
      • mlrl.testbed_sklearn.experiments package
        • mlrl.testbed_sklearn.experiments.input package
          • mlrl.testbed_sklearn.experiments.input.dataset package
            • mlrl.testbed_sklearn.experiments.input.dataset.preprocessors package
              • mlrl.testbed_sklearn.experiments.input.dataset.preprocessors.extension module
              • mlrl.testbed_sklearn.experiments.input.dataset.preprocessors.one_hot_encoder module
            • mlrl.testbed_sklearn.experiments.input.dataset.splitters package
              • mlrl.testbed_sklearn.experiments.input.dataset.splitters.extension module
              • mlrl.testbed_sklearn.experiments.input.dataset.splitters.splitter_bipartition module
              • mlrl.testbed_sklearn.experiments.input.dataset.splitters.splitter_cross_validation module
            • mlrl.testbed_sklearn.experiments.input.dataset.extension module
          • mlrl.testbed_sklearn.experiments.input.sources package
            • mlrl.testbed_sklearn.experiments.input.sources.source_svm module
        • mlrl.testbed_sklearn.experiments.output package
          • mlrl.testbed_sklearn.experiments.output.characteristics package
            • mlrl.testbed_sklearn.experiments.output.characteristics.data package
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.characteristics module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.characteristics_data module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.characteristics_prediction module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.extension module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.extension_prediction module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.matrix_feature module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.matrix_label module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.matrix_output module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.writer_data module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.writer_prediction module
          • mlrl.testbed_sklearn.experiments.output.dataset package
            • mlrl.testbed_sklearn.experiments.output.dataset.dataset module
            • mlrl.testbed_sklearn.experiments.output.dataset.dataset_ground_truth module
            • mlrl.testbed_sklearn.experiments.output.dataset.dataset_prediction module
            • mlrl.testbed_sklearn.experiments.output.dataset.extension_ground_truth module
            • mlrl.testbed_sklearn.experiments.output.dataset.extension_prediction module
            • mlrl.testbed_sklearn.experiments.output.dataset.writer_ground_truth module
            • mlrl.testbed_sklearn.experiments.output.dataset.writer_prediction module
          • mlrl.testbed_sklearn.experiments.output.evaluation package
            • mlrl.testbed_sklearn.experiments.output.evaluation.evaluation_result module
            • mlrl.testbed_sklearn.experiments.output.evaluation.extension module
            • mlrl.testbed_sklearn.experiments.output.evaluation.extractor_classification module
            • mlrl.testbed_sklearn.experiments.output.evaluation.extractor_ranking module
            • mlrl.testbed_sklearn.experiments.output.evaluation.extractor_regression module
            • mlrl.testbed_sklearn.experiments.output.evaluation.measures_classification module
            • mlrl.testbed_sklearn.experiments.output.evaluation.measures_ranking module
            • mlrl.testbed_sklearn.experiments.output.evaluation.measures_regression module
            • mlrl.testbed_sklearn.experiments.output.evaluation.writer module
          • mlrl.testbed_sklearn.experiments.output.label_vectors package
            • mlrl.testbed_sklearn.experiments.output.label_vectors.extension module
            • mlrl.testbed_sklearn.experiments.output.label_vectors.label_vector_histogram module
            • mlrl.testbed_sklearn.experiments.output.label_vectors.label_vectors module
            • mlrl.testbed_sklearn.experiments.output.label_vectors.writer module
        • mlrl.testbed_sklearn.experiments.prediction package
          • mlrl.testbed_sklearn.experiments.prediction.extension module
          • mlrl.testbed_sklearn.experiments.prediction.predictor module
          • mlrl.testbed_sklearn.experiments.prediction.predictor_global module
        • mlrl.testbed_sklearn.experiments.dataset module
        • mlrl.testbed_sklearn.experiments.experiment module
        • mlrl.testbed_sklearn.experiments.problem_domain module
      • mlrl.testbed_sklearn.runnables module
    • Package mlrl-util
      • mlrl.util.arrays module
      • mlrl.util.cli module
      • mlrl.util.format module
      • mlrl.util.options module
      • mlrl.util.validation module
      • mlrl.util.version module
    • Package mlrl-testbed-slurm
      • mlrl.testbed_slurm.arguments module
      • mlrl.testbed_slurm.extension module
      • mlrl.testbed_slurm.runner module
      • mlrl.testbed_slurm.sbatch module
    • Package mlrl-testbed
      • mlrl.testbed.experiments package
        • mlrl.testbed.experiments.input package
          • mlrl.testbed.experiments.input.dataset package
            • mlrl.testbed.experiments.input.dataset.preprocessors package
              • mlrl.testbed.experiments.input.dataset.preprocessors.preprocessor module
            • mlrl.testbed.experiments.input.dataset.splitters package
              • mlrl.testbed.experiments.input.dataset.splitters.arguments module
              • mlrl.testbed.experiments.input.dataset.splitters.splitter module
              • mlrl.testbed.experiments.input.dataset.splitters.splitter_no module
            • mlrl.testbed.experiments.input.dataset.arguments module
            • mlrl.testbed.experiments.input.dataset.dataset module
            • mlrl.testbed.experiments.input.dataset.extension module
            • mlrl.testbed.experiments.input.dataset.reader module
          • mlrl.testbed.experiments.input.meta_data package
            • mlrl.testbed.experiments.input.meta_data.meta_data module
            • mlrl.testbed.experiments.input.meta_data.reader module
          • mlrl.testbed.experiments.input.model package
            • mlrl.testbed.experiments.input.model.extension module
            • mlrl.testbed.experiments.input.model.model module
            • mlrl.testbed.experiments.input.model.reader module
          • mlrl.testbed.experiments.input.parameters package
            • mlrl.testbed.experiments.input.parameters.extension module
            • mlrl.testbed.experiments.input.parameters.parameters module
            • mlrl.testbed.experiments.input.parameters.reader module
          • mlrl.testbed.experiments.input.sources package
            • mlrl.testbed.experiments.input.sources.source module
            • mlrl.testbed.experiments.input.sources.source_csv module
            • mlrl.testbed.experiments.input.sources.source_pickle module
            • mlrl.testbed.experiments.input.sources.source_text module
            • mlrl.testbed.experiments.input.sources.source_yaml module
          • mlrl.testbed.experiments.input.arguments module
          • mlrl.testbed.experiments.input.data module
          • mlrl.testbed.experiments.input.extension module
          • mlrl.testbed.experiments.input.policies module
          • mlrl.testbed.experiments.input.reader module
        • mlrl.testbed.experiments.output package
          • mlrl.testbed.experiments.output.dataset package
            • mlrl.testbed.experiments.output.dataset.dataset module
          • mlrl.testbed.experiments.output.evaluation package
            • mlrl.testbed.experiments.output.evaluation.evaluation_result module
            • mlrl.testbed.experiments.output.evaluation.extension module
            • mlrl.testbed.experiments.output.evaluation.measurements module
            • mlrl.testbed.experiments.output.evaluation.measures module
            • mlrl.testbed.experiments.output.evaluation.writer module
          • mlrl.testbed.experiments.output.meta_data package
            • mlrl.testbed.experiments.output.meta_data.arguments module
            • mlrl.testbed.experiments.output.meta_data.extension module
            • mlrl.testbed.experiments.output.meta_data.meta_data module
            • mlrl.testbed.experiments.output.meta_data.writer module
          • mlrl.testbed.experiments.output.model package
            • mlrl.testbed.experiments.output.model.arguments module
            • mlrl.testbed.experiments.output.model.extension module
            • mlrl.testbed.experiments.output.model.model module
            • mlrl.testbed.experiments.output.model.writer module
          • mlrl.testbed.experiments.output.parameters package
            • mlrl.testbed.experiments.output.parameters.arguments module
            • mlrl.testbed.experiments.output.parameters.extension module
            • mlrl.testbed.experiments.output.parameters.parameters module
            • mlrl.testbed.experiments.output.parameters.writer module
          • mlrl.testbed.experiments.output.sinks package
            • mlrl.testbed.experiments.output.sinks.sink module
            • mlrl.testbed.experiments.output.sinks.sink_csv module
            • mlrl.testbed.experiments.output.sinks.sink_log module
            • mlrl.testbed.experiments.output.sinks.sink_pickle module
            • mlrl.testbed.experiments.output.sinks.sink_text module
            • mlrl.testbed.experiments.output.sinks.sink_yaml module
          • mlrl.testbed.experiments.output.arguments module
          • mlrl.testbed.experiments.output.data module
          • mlrl.testbed.experiments.output.extension module
          • mlrl.testbed.experiments.output.policies module
          • mlrl.testbed.experiments.output.writer module
        • mlrl.testbed.experiments.context module
        • mlrl.testbed.experiments.data module
        • mlrl.testbed.experiments.dataset module
        • mlrl.testbed.experiments.dataset_type module
        • mlrl.testbed.experiments.experiment module
        • mlrl.testbed.experiments.file_path module
        • mlrl.testbed.experiments.fold module
        • mlrl.testbed.experiments.meta_data module
        • mlrl.testbed.experiments.prediction_scope module
        • mlrl.testbed.experiments.prediction_type module
        • mlrl.testbed.experiments.problem_domain module
        • mlrl.testbed.experiments.recipe module
        • mlrl.testbed.experiments.state module
        • mlrl.testbed.experiments.table module
        • mlrl.testbed.experiments.timer module
      • mlrl.testbed.extensions package
        • mlrl.testbed.extensions.extension module
      • mlrl.testbed.modes package
        • mlrl.testbed.modes.mode module
        • mlrl.testbed.modes.mode_batch module
        • mlrl.testbed.modes.mode_read module
        • mlrl.testbed.modes.mode_run module
        • mlrl.testbed.modes.mode_single module
        • mlrl.testbed.modes.util module
      • mlrl.testbed.util package
        • mlrl.testbed.util.format module
        • mlrl.testbed.util.io module
        • mlrl.testbed.util.math module
        • mlrl.testbed.util.yml module
      • mlrl.testbed.arguments module
      • mlrl.testbed.command module
      • mlrl.testbed.main module
      • mlrl.testbed.program_info module
      • mlrl.testbed.runnables module
    • Package mlrl-common
      • mlrl.common.config package
        • mlrl.common.config.parameters module
      • mlrl.common.cython package
        • mlrl.common.cython.example_weights module
        • mlrl.common.cython.feature_binning module
        • mlrl.common.cython.feature_info module
        • mlrl.common.cython.feature_matrix module
        • mlrl.common.cython.feature_sampling module
        • mlrl.common.cython.instance_sampling module
        • mlrl.common.cython.label_matrix module
        • mlrl.common.cython.learner module
        • mlrl.common.cython.learner_classification module
        • mlrl.common.cython.learner_regression module
        • mlrl.common.cython.multi_threading module
        • mlrl.common.cython.output_matrix module
        • mlrl.common.cython.output_sampling module
        • mlrl.common.cython.output_space_info module
        • mlrl.common.cython.package_info module
        • mlrl.common.cython.partition_sampling module
        • mlrl.common.cython.post_optimization module
        • mlrl.common.cython.prediction module
        • mlrl.common.cython.probability_calibration module
        • mlrl.common.cython.regression_matrix module
        • mlrl.common.cython.rng module
        • mlrl.common.cython.rule_induction module
        • mlrl.common.cython.rule_model module
        • mlrl.common.cython.stopping_criterion module
      • mlrl.common.testbed package
        • mlrl.common.testbed.experiments package
          • mlrl.common.testbed.experiments.output package
            • mlrl.common.testbed.experiments.output.characteristics package
              • mlrl.common.testbed.experiments.output.characteristics.model package
                • mlrl.common.testbed.experiments.output.characteristics.model.characteristics module
                • mlrl.common.testbed.experiments.output.characteristics.model.extension module
                • mlrl.common.testbed.experiments.output.characteristics.model.statistics module
                • mlrl.common.testbed.experiments.output.characteristics.model.writer module
            • mlrl.common.testbed.experiments.output.label_vectors package
              • mlrl.common.testbed.experiments.output.label_vectors.extension module
            • mlrl.common.testbed.experiments.output.model_text package
              • mlrl.common.testbed.experiments.output.model_text.extension module
              • mlrl.common.testbed.experiments.output.model_text.model_text module
              • mlrl.common.testbed.experiments.output.model_text.writer module
          • mlrl.common.testbed.experiments.prediction package
            • mlrl.common.testbed.experiments.prediction.predictor_incremental module
          • mlrl.common.testbed.experiments.experiment module
        • mlrl.common.testbed.program_info module
        • mlrl.common.testbed.runnables module
      • mlrl.common.learners module
      • mlrl.common.mixins module
      • mlrl.common.package_info module
    • Package mlrl-boosting
      • mlrl.boosting.config package
        • mlrl.boosting.config.parameters module
      • mlrl.boosting.cython package
        • mlrl.boosting.cython.head_type module
        • mlrl.boosting.cython.label_binning module
        • mlrl.boosting.cython.learner module
        • mlrl.boosting.cython.learner_boomer module
        • mlrl.boosting.cython.learner_classification module
        • mlrl.boosting.cython.package_info module
        • mlrl.boosting.cython.post_processor module
        • mlrl.boosting.cython.prediction module
        • mlrl.boosting.cython.probability_calibration module
        • mlrl.boosting.cython.regularization module
      • mlrl.boosting.testbed package
        • mlrl.boosting.testbed.experiments package
          • mlrl.boosting.testbed.experiments.output package
            • mlrl.boosting.testbed.experiments.output.probability_calibration package
              • mlrl.boosting.testbed.experiments.output.probability_calibration.extension module
              • mlrl.boosting.testbed.experiments.output.probability_calibration.model_isotonic module
              • mlrl.boosting.testbed.experiments.output.probability_calibration.model_no module
              • mlrl.boosting.testbed.experiments.output.probability_calibration.writer module
        • mlrl.boosting.testbed.runnables module
      • mlrl.boosting.learners module
      • mlrl.boosting.package_info module
    • Package mlrl-testbed-arff
      • mlrl.testbed_arff.experiments package
        • mlrl.testbed_arff.experiments.input package
          • mlrl.testbed_arff.experiments.input.sources package
            • mlrl.testbed_arff.experiments.input.sources.source_arff module
        • mlrl.testbed_arff.experiments.output package
          • mlrl.testbed_arff.experiments.output.sinks package
            • mlrl.testbed_arff.experiments.output.sinks.sink_arff module
    • Package mlrl-seco
      • mlrl.seco.config package
        • mlrl.seco.config.parameters module
      • mlrl.seco.cython package
        • mlrl.seco.cython.heuristic module
        • mlrl.seco.cython.learner module
        • mlrl.seco.cython.learner_seco module
        • mlrl.seco.cython.lift_function module
        • mlrl.seco.cython.package_info module
        • mlrl.seco.cython.stopping_criterion module
      • mlrl.seco.testbed package
        • mlrl.seco.testbed.runnables module
      • mlrl.seco.learners module
      • mlrl.seco.package_info module
  • C++ API Reference
    • Library libmlrlcommon
      • File aggregation_function.hpp
      • File array.hpp
      • File body.hpp
      • File body_conjunctive.hpp
      • File body_empty.hpp
      • File condition.hpp
      • File condition_list.hpp
      • File coverage_mask.hpp
      • File default_rule.hpp
      • File dll_exports.hpp
      • File example_weights.hpp
      • File example_weights_equal.hpp
      • File example_weights_real_valued.hpp
      • File feature_binning.hpp
      • File feature_binning_equal_frequency.hpp
      • File feature_binning_equal_width.hpp
      • File feature_binning_no.hpp
      • File feature_info.hpp
      • File feature_info_equal.hpp
      • File feature_info_mixed.hpp
      • File feature_matrix.hpp
      • File feature_matrix_c_contiguous.hpp
      • File feature_matrix_column_wise.hpp
      • File feature_matrix_csc.hpp
      • File feature_matrix_csr.hpp
      • File feature_matrix_fortran_contiguous.hpp
      • File feature_matrix_row_wise.hpp
      • File feature_sampling.hpp
      • File feature_sampling_no.hpp
      • File feature_sampling_predefined.hpp
      • File feature_sampling_without_replacement.hpp
      • File feature_space.hpp
      • File feature_space_tabular.hpp
      • File feature_subspace.hpp
      • File feature_type.hpp
      • File feature_type_nominal.hpp
      • File feature_type_numerical.hpp
      • File feature_type_ordinal.hpp
      • File feature_vector.hpp
      • File feature_vector_binary.hpp
      • File feature_vector_binned.hpp
      • File feature_vector_equal.hpp
      • File feature_vector_missing.hpp
      • File feature_vector_nominal.hpp
      • File feature_vector_numerical.hpp
      • File feature_vector_ordinal.hpp
      • File global_pruning.hpp
      • File global_pruning_no.hpp
      • File global_pruning_post.hpp
      • File global_pruning_pre.hpp
      • File head.hpp
      • File head_complete.hpp
      • File head_partial.hpp
      • File index_vector.hpp
      • File index_vector_complete.hpp
      • File index_vector_partial.hpp
      • File indexed_value.hpp
      • File instance_sampling.hpp
      • File instance_sampling_no.hpp
      • File instance_sampling_stratified_example_wise.hpp
      • File instance_sampling_stratified_output_wise.hpp
      • File instance_sampling_with_replacement.hpp
      • File instance_sampling_without_replacement.hpp
      • File interval.hpp
      • File iterator_binned.hpp
      • File iterator_equal.hpp
      • File iterator_forward_non_zero_index.hpp
      • File iterator_forward_sparse.hpp
      • File iterator_forward_sparse_binary.hpp
      • File iterator_index.hpp
      • File iterators.hpp
      • File label_matrix_c_contiguous.hpp
      • File label_matrix_csr.hpp
      • File label_matrix_row_wise.hpp
      • File label_vector.hpp
      • File label_vector_set.hpp
      • File learner.hpp
      • File learner_classification.hpp
      • File learner_classification_common.hpp
      • File learner_common.hpp
      • File learner_regression.hpp
      • File learner_regression_common.hpp
      • File library_info.hpp
      • File math.hpp
      • File matrix_c_contiguous.hpp
      • File matrix_dense.hpp
      • File matrix_lil.hpp
      • File matrix_lil_binary.hpp
      • File matrix_sparse_binary.hpp
      • File matrix_sparse_set.hpp
      • File measure_distance.hpp
      • File measure_evaluation.hpp
      • File measure_evaluation_sparse.hpp
      • File memory.hpp
      • File model_builder.hpp
      • File model_builder_intermediate.hpp
      • File multi_threading.hpp
      • File multi_threading_manual.hpp
      • File multi_threading_no.hpp
      • File opencl.hpp
      • File openmp.hpp
      • File output_matrix.hpp
      • File output_sampling.hpp
      • File output_sampling_no.hpp
      • File output_sampling_round_robin.hpp
      • File output_sampling_without_replacement.hpp
      • File output_space_info.hpp
      • File output_space_info_no.hpp
      • File partition.hpp
      • File partition_bi.hpp
      • File partition_sampling.hpp
      • File partition_sampling_bi_random.hpp
      • File partition_sampling_bi_stratified_example_wise.hpp
      • File partition_sampling_bi_stratified_output_wise.hpp
      • File partition_sampling_no.hpp
      • File partition_single.hpp
      • File post_optimization.hpp
      • File post_optimization_no.hpp
      • File post_optimization_phase_list.hpp
      • File post_optimization_sequential.hpp
      • File post_optimization_unused_rule_removal.hpp
      • File post_processor.hpp
      • File post_processor_no.hpp
      • File prediction.hpp
      • File prediction_complete.hpp
      • File prediction_evaluated.hpp
      • File prediction_matrix_dense.hpp
      • File prediction_matrix_sparse_binary.hpp
      • File prediction_partial.hpp
      • File predictor.hpp
      • File predictor_binary.hpp
      • File predictor_binary_no.hpp
      • File predictor_common.hpp
      • File predictor_probability.hpp
      • File predictor_probability_no.hpp
      • File predictor_score.hpp
      • File predictor_score_no.hpp
      • File probability_calibration.hpp
      • File probability_calibration_isotonic.hpp
      • File probability_calibration_joint.hpp
      • File probability_calibration_marginal.hpp
      • File probability_calibration_no.hpp
      • File properties.hpp
      • File quality.hpp
      • File refinement.hpp
      • File refinement_comparator_fixed.hpp
      • File refinement_comparator_single.hpp
      • File regression_matrix_c_contiguous.hpp
      • File regression_matrix_csr.hpp
      • File regression_matrix_row_wise.hpp
      • File ring_buffer.hpp
      • File rng.hpp
      • File rule_compare_function.hpp
      • File rule_induction.hpp
      • File rule_induction_top_down_beam_search.hpp
      • File rule_induction_top_down_greedy.hpp
      • File rule_list.hpp
      • File rule_model.hpp
      • File rule_model_assemblage.hpp
      • File rule_model_assemblage_sequential.hpp
      • File rule_pruning.hpp
      • File rule_pruning_irep.hpp
      • File rule_pruning_no.hpp
      • File rule_refinement.hpp
      • File rule_refinement_statistics_based.hpp
      • File score_processor.hpp
      • File score_vector.hpp
      • File score_vector_binned_dense.hpp
      • File score_vector_bit.hpp
      • File score_vector_dense.hpp
      • File statistics.hpp
      • File statistics_provider.hpp
      • File statistics_space.hpp
      • File statistics_state.hpp
      • File statistics_subset.hpp
      • File statistics_subset_resettable.hpp
      • File statistics_update.hpp
      • File statistics_update_candidate.hpp
      • File statistics_update_candidate_common.hpp
      • File statistics_weighted.hpp
      • File stopping_criterion.hpp
      • File stopping_criterion_list.hpp
      • File stopping_criterion_no.hpp
      • File stopping_criterion_size.hpp
      • File stopping_criterion_time.hpp
      • File stratified_sampling_example_wise.hpp
      • File stratified_sampling_output_wise.hpp
      • File strings.hpp
      • File threads.hpp
      • File types.hpp
      • File validation.hpp
      • File vector_bit.hpp
      • File vector_dense.hpp
      • File vector_dok_binary.hpp
      • File vector_sparse_array.hpp
      • File vector_sparse_array_binary.hpp
      • File view.hpp
      • File view_composite.hpp
      • File view_compressed.hpp
      • File view_functions.hpp
      • File view_matrix.hpp
      • File view_matrix_c_contiguous.hpp
      • File view_matrix_composite.hpp
      • File view_matrix_csc.hpp
      • File view_matrix_csc_binary.hpp
      • File view_matrix_csr.hpp
      • File view_matrix_csr_binary.hpp
      • File view_matrix_dense.hpp
      • File view_matrix_fortran_contiguous.hpp
      • File view_matrix_lil.hpp
      • File view_matrix_sparse.hpp
      • File view_matrix_sparse_binary.hpp
      • File view_matrix_sparse_set.hpp
      • File view_vector.hpp
      • File view_vector_binned.hpp
      • File view_vector_bit.hpp
      • File view_vector_composite.hpp
      • File view_vector_compressed.hpp
      • File view_vector_indexed.hpp
      • File view_vector_sparse_set.hpp
      • File weight_sampling.hpp
      • File weight_vector.hpp
      • File weight_vector_bit.hpp
      • File weight_vector_dense.hpp
      • File weight_vector_equal.hpp
      • File weight_vector_out_of_sample.hpp
    • Library libmlrlboosting
      • File blas.hpp
      • File default_rule_auto.hpp
      • File discretization_function.hpp
      • File discretization_function_probability.hpp
      • File discretization_function_score.hpp
      • File dll_exports.hpp
      • File feature_binning_auto.hpp
      • File head_type.hpp
      • File head_type_auto.hpp
      • File head_type_complete.hpp
      • File head_type_partial_dynamic.hpp
      • File head_type_partial_fixed.hpp
      • File head_type_single.hpp
      • File iterator_diagonal.hpp
      • File label_binning.hpp
      • File label_binning_auto.hpp
      • File label_binning_equal_width.hpp
      • File label_binning_no.hpp
      • File lapack.hpp
      • File learner.hpp
      • File learner_boomer_classifier.hpp
      • File learner_boomer_regressor.hpp
      • File learner_classification.hpp
      • File learner_common.hpp
      • File library_info.hpp
      • File loss.hpp
      • File loss_decomposable.hpp
      • File loss_decomposable_logistic.hpp
      • File loss_decomposable_sparse.hpp
      • File loss_decomposable_squared_error.hpp
      • File loss_decomposable_squared_hinge.hpp
      • File loss_non_decomposable.hpp
      • File loss_non_decomposable_logistic.hpp
      • File loss_non_decomposable_squared_error.hpp
      • File loss_non_decomposable_squared_hinge.hpp
      • File math.hpp
      • File matrix_c_contiguous_numeric.hpp
      • File matrix_sparse_set_numeric.hpp
      • File parallel_rule_refinement_auto.hpp
      • File parallel_statistic_update_auto.hpp
      • File partition_sampling_auto.hpp
      • File predictor_binary_auto.hpp
      • File predictor_binary_common.hpp
      • File predictor_binary_example_wise.hpp
      • File predictor_binary_gfm.hpp
      • File predictor_binary_output_wise.hpp
      • File predictor_probability_auto.hpp
      • File predictor_probability_common.hpp
      • File predictor_probability_marginalized.hpp
      • File predictor_probability_output_wise.hpp
      • File predictor_score_common.hpp
      • File predictor_score_output_wise.hpp
      • File probability_calibration_isotonic.hpp
      • File probability_function_chain_rule.hpp
      • File probability_function_joint.hpp
      • File probability_function_logistic.hpp
      • File probability_function_marginal.hpp
      • File regularization.hpp
      • File regularization_manual.hpp
      • File regularization_no.hpp
      • File rule_compare_function.hpp
      • File rule_evaluation.hpp
      • File rule_evaluation_decomposable.hpp
      • File rule_evaluation_decomposable_complete.hpp
      • File rule_evaluation_decomposable_complete_binned.hpp
      • File rule_evaluation_decomposable_partial_dynamic.hpp
      • File rule_evaluation_decomposable_partial_dynamic_binned.hpp
      • File rule_evaluation_decomposable_partial_fixed.hpp
      • File rule_evaluation_decomposable_partial_fixed_binned.hpp
      • File rule_evaluation_decomposable_single.hpp
      • File rule_evaluation_decomposable_sparse.hpp
      • File rule_evaluation_non_decomposable.hpp
      • File rule_evaluation_non_decomposable_complete.hpp
      • File rule_evaluation_non_decomposable_complete_binned.hpp
      • File rule_evaluation_non_decomposable_partial_dynamic.hpp
      • File rule_evaluation_non_decomposable_partial_dynamic_binned.hpp
      • File rule_evaluation_non_decomposable_partial_fixed.hpp
      • File rule_evaluation_non_decomposable_partial_fixed_binned.hpp
      • File rule_list_builder.hpp
      • File shrinkage_constant.hpp
      • File statistic.hpp
      • File statistic_format.hpp
      • File statistic_format_auto.hpp
      • File statistic_format_dense.hpp
      • File statistic_format_sparse.hpp
      • File statistic_type.hpp
      • File statistic_type_float32.hpp
      • File statistic_type_float64.hpp
      • File statistics.hpp
      • File statistics_decomposable.hpp
      • File statistics_non_decomposable.hpp
      • File statistics_provider_decomposable_dense.hpp
      • File statistics_provider_decomposable_sparse.hpp
      • File statistics_provider_non_decomposable_dense.hpp
      • File transformation_binary.hpp
      • File transformation_binary_example_wise.hpp
      • File transformation_binary_gfm.hpp
      • File transformation_binary_output_wise.hpp
      • File transformation_probability.hpp
      • File transformation_probability_marginalized.hpp
      • File transformation_probability_output_wise.hpp
      • File vector_statistic_decomposable_dense.hpp
      • File vector_statistic_decomposable_sparse.hpp
      • File vector_statistic_non_decomposable_dense.hpp
      • File view_statistic_non_decomposable_dense.hpp
    • Library libmlrlseco
      • File confusion_matrix.hpp
      • File decision_list_builder.hpp
      • File dll_exports.hpp
      • File head_type.hpp
      • File head_type_partial.hpp
      • File head_type_single.hpp
      • File heuristic.hpp
      • File heuristic_accuracy.hpp
      • File heuristic_f_measure.hpp
      • File heuristic_laplace.hpp
      • File heuristic_m_estimate.hpp
      • File heuristic_precision.hpp
      • File heuristic_recall.hpp
      • File heuristic_wra.hpp
      • File learner.hpp
      • File learner_common.hpp
      • File learner_seco_classifier.hpp
      • File library_info.hpp
      • File lift_function.hpp
      • File lift_function_kln.hpp
      • File lift_function_no.hpp
      • File lift_function_peak.hpp
      • File matrix_coverage_dense.hpp
      • File matrix_statistic_decomposable_dense.hpp
      • File predictor_binary_output_wise.hpp
      • File rule_compare_function.hpp
      • File rule_evaluation.hpp
      • File rule_evaluation_decomposable.hpp
      • File rule_evaluation_decomposable_partial.hpp
      • File rule_evaluation_decomposable_single.hpp
      • File statistics.hpp
      • File statistics_decomposable.hpp
      • File statistics_provider_decomposable_dense.hpp
      • File stopping_criterion_coverage.hpp
      • File vector_confusion_matrix_dense.hpp

Further Information

  • Release Notes
  • Contributors
  • Code of Conduct
  • MIT License
  • Source Code
  • Issue Tracker
Back to top
View this page

Pre-Sorted Search¶

For the efficient induction of single rules, an efficient evaluation of a rule’s possible refinements is crucial. Instead of evaluating each possible refinement in isolation, this can often be sped up by integrating the evaluation of multiple refinements into a single pass through the data. In this section, we discuss pre-sorting of examples as a way that works particularly well for numeric data. We will first discuss the base algorithm and subsequently show how it can be extended to deal with sparse, nominal, and missing feature values.

A pre-sorted search algorithm sorts the available training examples by their values for individual features before training starts. Afterward, the examples are repeatedly processed in this predetermined order to build a model. This idea originates from early work on the efficient construction of decision trees[1][2]. Due to the conceptual similarities between tree- and rule-based models, it can easily be generalized to rule learning methods. For example, it is used by JRip, an implementation of RIPPER[3] that is part of the WEKA[4] project, or the implementations of SLIPPER[5] and RuleFit[6] included in the imodels[7] package for interpretable models. Both rule- and tree-based learning approaches require enumerating the thresholds that may be used to make up nodes in a decision tree or conditions in a rule, respectively. These thresholds result from the feature values of the training examples, given in the form of a feature matrix as previously defined here. For each training example, it assigns a feature value to each of the available features. In the following, we restrict ourselves to numerical features before discussing means to deal with nominal features or missing feature values.

Memory Layout¶

When searching for the best condition that may be added to a rule, the available features are dealt with independently. For each feature \(A_{l}\) to be considered by the algorithm, the thresholds that may be used by the first condition of a rule result from a vector of feature values \(( x_{1l}, \dots, x_{nl} )\) that corresponds to the \(l\)-th column of the feature matrix. As different features are dealt with in isolation, we usually omit the index of the respective feature for brevity. To facilitate column-wise access to the feature matrix, it should be given in the Fortran-contiguous memory layout, also known as column-major order. The following illustration show representations of a \(3 \times 3\) matrix in the Fortran-contiguous and compressed sparse column (CSC) format. The former uses a single one-dimensional array to store all values in column-major order, whereas the latter, which is better suited for sparse data, uses the three arrays:

  1. The array values stores all non-zero values in column-major order.

  2. For each value in values, row_indices stores the index of the corresponding row, starting at zero.

  3. The \(i\)-th element in column_indices specifies the index of the first element in values and row_indices that belongs to the \(i\)-th column.

Representation of a 3×3 matrix in the Fortran-contiguous and compressed sparse column (CSC) format Representation of a 3×3 matrix in the Fortran-contiguous and compressed sparse column (CSC) format

Feature Permutation¶

To enumerate the thresholds for a particular feature, the elements in the corresponding column vector must be sorted in increasing order. For this purpose, we use a bijective permutation function \(\tau: \mathbb{N}^{+} \rightarrow \mathbb{N}^{+}\), where \(\tau ( i )\) specifies the index of the example that corresponds to the \(i\)-th element in the sorted vector

(1)¶\[( x_{\tau ( 1 )}, \dots, x_{\tau ( N )} ) \text{ with } \tau ( i ) \leq \tau ( i + 1 ), \forall i \in [ 1, N ).\]

An exemplary vector of sorted feature values, together with the corresponding thresholds, is shown below. Each of the thresholds is typically computed by averaging two adjacent feature values. Because these values do not change as additional conditions or rules should be learned, sorting the values that correspond to a particular feature is necessary only once during training, and previously sorted vectors can be kept in memory for repeated access.

An exemplary vector of sorted feature values, together with the corresponding thresholds An exemplary vector of sorted feature values, together with the corresponding thresholds

Note, however, that this representation must not necessarily be used as the memory layout for storing the values of a particular feature. For storing ordinal and nominal feature values, our algorithm use a sparse memory layout, similar to the CSC layout presented above, for maximum efficiency.

Indicator Function¶

If an existing rule should be refined by adding a condition to its body, only a subset of the feature values must be considered to make up potential thresholds. The subset corresponds to the examples that satisfy the existing conditions. We use an indicator function \(I: \mathbb{N}^{+} \rightarrow \{ 0, 1 \}\) to check whether individual examples should be taken into account by the search algorithm:

(2)¶\[\begin{split}I ( n ) = \begin{cases} 1 & \text{if example } \boldsymbol{x}_{n} \text{ is currently covered} \\ 0 & \text{otherwise}. \end{cases}\end{split}\]

If an example is not covered, its feature value may not be used to make up thresholds for additional conditions. A data structure that helps to keep track of the covered examples efficiently, rather than comparing the feature values of each example to the existing conditions, is presented in the section Exploiting Feature Sparsity. This data structure also facilitates dealing with sparse feature values.

Enumeration of Conditions¶

As shown in the pseudocode below, the feature values that correspond to a particular feature are processed in sorted order to enumerate the thresholds of potential conditions. When dealing with numerical features, the thresholds result from averaging adjacent feature values (cf. line 9). The calculation of thresholds is restricted to the feature values of examples that are covered according to the indicator function \(I\) and have non-zero weights according to a weight vector \(\boldsymbol{w}\). The weights result from the application of an (optional) sampling method (cf. lines 3 and 7). When dealing with numerical features, each threshold can be used to form two distinct conditions, using the relational operator \(\leq\) or \(>\), respectively.

\[\begin{split}\textbf{in:}\quad & \text{Vector of sorted feature values } (x_{\tau( n )})_n^N, \\ & \text{quality of the current rule } q, \text{ indicator function } I, \\ & \text{weights of training examples } \boldsymbol{w}, \\ & \text{statistics } S = \{ \boldsymbol{s}_n \}_n^N, \text{ globally aggregated statistics } \boldsymbol{s}, \\ \\ \text{1:} \quad & \text{best refinement } r^* = \emptyset, \text{ best quality } q^* = q \\ \text{2:} \quad & \textbf{for } i = 1 \textbf{ to } N \textbf{ do} \\ \text{3:} \quad & \quad \textbf{if } I ( \tau (i)) = 1 \textbf{ and } w_{\tau ( i )} > 0 \textbf{ then} \\ \text{4:} \quad & \quad \quad \textbf{break} \\ \text{5:} \quad & \text{initialize sum of statistics } \boldsymbol{s}' = \boldsymbol{s}_{\tau ( i )} \\ \text{6:} \quad & \textbf{for } j = i + 1 \textbf{ to } N \textbf{ do} \\ \text{7:} \quad & \quad \textbf{if } I ( \tau (j)) = 1 \textbf{ and } w_{\tau ( j )} > 0 \textbf{ then} \\ \text{8:} \quad & \quad \quad \text{update sum } \boldsymbol{s}' = \boldsymbol{s}' + \boldsymbol{s}_{\tau (j)} \\ \text{9:} \quad & \quad \quad \text{threshold } t = \text{avg} ( x_{\tau (i)}, x_{\tau (j)} ) \\ \text{10:} \quad & \quad \quad \text{updated head } \boldsymbol{\hat{p}}', \text{ quality } q' = \texttt{FIND\_HEAD} ( \boldsymbol{s}' ) \\ \text{11:} \quad & \quad \quad \textbf{if } q' \text{ is better than } q^* \textbf{ then} \\ \text{12:} \quad & \quad \quad \quad \text{update refinement } r^* = \{ t, \leq, \boldsymbol{\hat{p}}' \} \text{ and its quality } q^* = r' \\ \text{13:} \quad & \quad \quad \boldsymbol{\hat{p}}', q' = \texttt{FIND\_HEAD} ( \boldsymbol{s} - \boldsymbol{s}' ) \\ \text{14:} \quad & \quad \quad \textbf{if } q' \text{ is better than } q^* \textbf{ then} \\ \text{15:} \quad & \quad \quad \quad \text{update refinement } r^* = \{ t, >, \boldsymbol{\hat{p}}' \} \text{ and its quality } q^* = r' \\ \text{16:} \quad & \quad \quad i = j \\ \text{17:} \quad & \textbf{return} \text{ best refinement } r^*, \text{ its quality } q^*\end{split}\]

As can be seen in the image below, when a condition that uses the former operator is added to a rule, it results in all examples that correspond to the previously processed feature values being covered. In contrast, a condition that uses the latter operator covers all the other examples.

Coverage of numerical conditions that can be created from a single threshold 1.0 using the $\leq$ or $>$ operator Coverage of numerical conditions that can be created from a single threshold 1.0 using the $\leq$ or $>$ operator

Aggregation of Statistics¶

As can be seen in the lines 10 and 13 of the pseudocode given above, it is necessary to construct a head for each candidate rule that results from adding a new condition to a rule. In addition, the quality of the resulting rule must be assessed in terms of a numerical score. Both the predictions provided by a head and the estimated quality depend on the aggregated label space statistics of the covered examples. We exploit the fact that conditions using the \(\leq\) operator, when evaluated in sorted order by increasing thresholds, are satisfied by a superset of the examples covered by the previous condition using the same operator but a smaller threshold. Processing the possible conditions in the aforementioned order enables the pre-sorted search algorithm to compute the aggregated statistics (corresponding to a vector of gradients and Hessians in case of the BOOMER algorithm and confusion matrices in case of the SeCo algorithm) incrementally (cf. line 8). For the efficient evaluation of conditions that use the \(>\) operator, the search algorithm is provided with the statistics that result from the aggregation over all training examples that are currently covered and have a non-zero weight (denoted as \(\boldsymbol{s}\)). The difference between the globally aggregated statistics and the previously aggregated ones (\(\boldsymbol{s} - \boldsymbol{s}'\)) yields the aggregated statistics of the examples covered by such a condition (cf. line 13). As the global aggregation of statistics does not depend on a particular feature, this operation must be performed only once per rule, even when searching for a rule’s best refinement across multiple features. The following image provides an example of how the aggregated statistics are computed for the conditions that can be created from a single threshold.

Aggregation of statistics depending on the coverage of conditions that use the $\leq$ or $>$ operator and a single threshold 1.0 Aggregation of statistics depending on the coverage of conditions that use the $\leq$ or $>$ operator and a single threshold 1.0

[1]

Manish Mehta, Rakesh Agrawal, and Jorma Rissanen (1996). ‘SLIQ: A Fast Scalable Classifier for Data Mining’. In: Proc. International Conference on Extending Database Technology, pp. 18-32.

[2]

John C. Shafer, Rakesh Agrawal, and Manish Mehta (1996). ‘SPRINT: A Scalable Parallel Classifier for Data Mining’. In: Proc. International Conference on Very Large Data Bases, 96, pp. 544-555.

[3]

William W. Cohen (1995). ‘Fast Effective Rule Induction’. In: Proc. International Conference on Machine Learning (ICML), pp. 115–123.

[4]

Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H. Witten (2009). ‘The WEKA Data Mining Software: An Update’. In: SIGKDD Explorations, 11.1, pp. 10-18.

[5]

William W. Cohen and Yoram Singer (1999). ‘A Simple, Fast, and Effective Rule Learner’. In: Proc. AAAI Conference on Artificial Intelligence, pp. 335–342.

[6]

Jerome H. Friedman and Bogdan E. Popescu (2008). ‘Predictive learning via rule ensembles’. In: The Annals of Applied Statistics, 2.3, pp. 916-954.

[7]

Chandan Singh, Keyan Nasseri, Yan Shuo Tan, Tiffany Tang, and Bin Yu (2021). ‘imodels: a python package for fitting interpretable models’. In: Journal of Open Source Software, 6.61, p. 3192.

Next
Exploiting Feature Sparsity
Previous
Algorithmic Optimizations
Copyright © 2020-2026, Michael Rapp et al.
Made with Sphinx and @pradyunsg's Furo
On this page
  • Pre-Sorted Search
    • Memory Layout
    • Feature Permutation
    • Indicator Function
    • Enumeration of Conditions
    • Aggregation of Statistics