Contents Menu Expand Light mode Dark mode Auto light/dark, in light mode Auto light/dark, in dark mode Skip to content
BOOMER 0.15.4
BOOMER 0.15.4

Quickstart

  • Installation
  • Using the Python API
  • Using the Command Line API

User Guide

  • Foundations
    • Rule Learning Algorithms
    • Problem Definition
    • Conceptual Framework
  • The BOOMER Algorithm
    • Methodology
    • Overview of Parameters
  • The SeCo Algorithm
    • Overview of Parameters
  • Algorithmic Optimizations
    • Pre-Sorted Search
    • Exploiting Feature Sparsity
    • Nominal Features
    • Missing Feature Values
    • Histogram-based Search
    • Multi-Threading
  • mlrl-testbed
    • Introduction
    • Supported Dataset Formats
    • Performance Evaluation
    • Data Pre-Processing
    • Saving and Loading Data
    • Using Your Own Algorithms
    • Batch Mode
    • Run Mode
    • Read Mode
    • Overview of Arguments
  • References

Developer Guide

  • Project Structure
  • Building from Source
  • Generating the Documentation
  • Continuous Integration
  • Coding Standards
  • Python API Reference
    • Package mlrl-testbed-sklearn
      • mlrl.testbed_sklearn.experiments package
        • mlrl.testbed_sklearn.experiments.input package
          • mlrl.testbed_sklearn.experiments.input.dataset package
            • mlrl.testbed_sklearn.experiments.input.dataset.preprocessors package
              • mlrl.testbed_sklearn.experiments.input.dataset.preprocessors.extension module
              • mlrl.testbed_sklearn.experiments.input.dataset.preprocessors.one_hot_encoder module
            • mlrl.testbed_sklearn.experiments.input.dataset.splitters package
              • mlrl.testbed_sklearn.experiments.input.dataset.splitters.extension module
              • mlrl.testbed_sklearn.experiments.input.dataset.splitters.splitter_bipartition module
              • mlrl.testbed_sklearn.experiments.input.dataset.splitters.splitter_cross_validation module
            • mlrl.testbed_sklearn.experiments.input.dataset.extension module
          • mlrl.testbed_sklearn.experiments.input.sources package
            • mlrl.testbed_sklearn.experiments.input.sources.source_svm module
        • mlrl.testbed_sklearn.experiments.output package
          • mlrl.testbed_sklearn.experiments.output.characteristics package
            • mlrl.testbed_sklearn.experiments.output.characteristics.data package
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.characteristics module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.characteristics_data module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.characteristics_prediction module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.extension module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.extension_prediction module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.matrix_feature module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.matrix_label module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.matrix_output module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.writer_data module
              • mlrl.testbed_sklearn.experiments.output.characteristics.data.writer_prediction module
          • mlrl.testbed_sklearn.experiments.output.dataset package
            • mlrl.testbed_sklearn.experiments.output.dataset.dataset module
            • mlrl.testbed_sklearn.experiments.output.dataset.dataset_ground_truth module
            • mlrl.testbed_sklearn.experiments.output.dataset.dataset_prediction module
            • mlrl.testbed_sklearn.experiments.output.dataset.extension_ground_truth module
            • mlrl.testbed_sklearn.experiments.output.dataset.extension_prediction module
            • mlrl.testbed_sklearn.experiments.output.dataset.writer_ground_truth module
            • mlrl.testbed_sklearn.experiments.output.dataset.writer_prediction module
          • mlrl.testbed_sklearn.experiments.output.evaluation package
            • mlrl.testbed_sklearn.experiments.output.evaluation.evaluation_result module
            • mlrl.testbed_sklearn.experiments.output.evaluation.extension module
            • mlrl.testbed_sklearn.experiments.output.evaluation.extractor_classification module
            • mlrl.testbed_sklearn.experiments.output.evaluation.extractor_ranking module
            • mlrl.testbed_sklearn.experiments.output.evaluation.extractor_regression module
            • mlrl.testbed_sklearn.experiments.output.evaluation.measures_classification module
            • mlrl.testbed_sklearn.experiments.output.evaluation.measures_ranking module
            • mlrl.testbed_sklearn.experiments.output.evaluation.measures_regression module
            • mlrl.testbed_sklearn.experiments.output.evaluation.writer module
          • mlrl.testbed_sklearn.experiments.output.label_vectors package
            • mlrl.testbed_sklearn.experiments.output.label_vectors.extension module
            • mlrl.testbed_sklearn.experiments.output.label_vectors.label_vector_histogram module
            • mlrl.testbed_sklearn.experiments.output.label_vectors.label_vectors module
            • mlrl.testbed_sklearn.experiments.output.label_vectors.writer module
        • mlrl.testbed_sklearn.experiments.prediction package
          • mlrl.testbed_sklearn.experiments.prediction.extension module
          • mlrl.testbed_sklearn.experiments.prediction.predictor module
          • mlrl.testbed_sklearn.experiments.prediction.predictor_global module
        • mlrl.testbed_sklearn.experiments.dataset module
        • mlrl.testbed_sklearn.experiments.experiment module
        • mlrl.testbed_sklearn.experiments.problem_domain module
      • mlrl.testbed_sklearn.runnables module
    • Package mlrl-util
      • mlrl.util.arrays module
      • mlrl.util.cli module
      • mlrl.util.format module
      • mlrl.util.options module
      • mlrl.util.validation module
      • mlrl.util.version module
    • Package mlrl-testbed-slurm
      • mlrl.testbed_slurm.arguments module
      • mlrl.testbed_slurm.extension module
      • mlrl.testbed_slurm.runner module
      • mlrl.testbed_slurm.sbatch module
    • Package mlrl-testbed
      • mlrl.testbed.experiments package
        • mlrl.testbed.experiments.input package
          • mlrl.testbed.experiments.input.dataset package
            • mlrl.testbed.experiments.input.dataset.preprocessors package
              • mlrl.testbed.experiments.input.dataset.preprocessors.preprocessor module
            • mlrl.testbed.experiments.input.dataset.splitters package
              • mlrl.testbed.experiments.input.dataset.splitters.arguments module
              • mlrl.testbed.experiments.input.dataset.splitters.splitter module
              • mlrl.testbed.experiments.input.dataset.splitters.splitter_no module
            • mlrl.testbed.experiments.input.dataset.arguments module
            • mlrl.testbed.experiments.input.dataset.dataset module
            • mlrl.testbed.experiments.input.dataset.extension module
            • mlrl.testbed.experiments.input.dataset.reader module
          • mlrl.testbed.experiments.input.meta_data package
            • mlrl.testbed.experiments.input.meta_data.meta_data module
            • mlrl.testbed.experiments.input.meta_data.reader module
          • mlrl.testbed.experiments.input.model package
            • mlrl.testbed.experiments.input.model.extension module
            • mlrl.testbed.experiments.input.model.model module
            • mlrl.testbed.experiments.input.model.reader module
          • mlrl.testbed.experiments.input.parameters package
            • mlrl.testbed.experiments.input.parameters.extension module
            • mlrl.testbed.experiments.input.parameters.parameters module
            • mlrl.testbed.experiments.input.parameters.reader module
          • mlrl.testbed.experiments.input.sources package
            • mlrl.testbed.experiments.input.sources.source module
            • mlrl.testbed.experiments.input.sources.source_csv module
            • mlrl.testbed.experiments.input.sources.source_pickle module
            • mlrl.testbed.experiments.input.sources.source_text module
            • mlrl.testbed.experiments.input.sources.source_yaml module
          • mlrl.testbed.experiments.input.arguments module
          • mlrl.testbed.experiments.input.data module
          • mlrl.testbed.experiments.input.extension module
          • mlrl.testbed.experiments.input.policies module
          • mlrl.testbed.experiments.input.reader module
        • mlrl.testbed.experiments.output package
          • mlrl.testbed.experiments.output.dataset package
            • mlrl.testbed.experiments.output.dataset.dataset module
          • mlrl.testbed.experiments.output.evaluation package
            • mlrl.testbed.experiments.output.evaluation.evaluation_result module
            • mlrl.testbed.experiments.output.evaluation.extension module
            • mlrl.testbed.experiments.output.evaluation.measurements module
            • mlrl.testbed.experiments.output.evaluation.measures module
            • mlrl.testbed.experiments.output.evaluation.writer module
          • mlrl.testbed.experiments.output.meta_data package
            • mlrl.testbed.experiments.output.meta_data.arguments module
            • mlrl.testbed.experiments.output.meta_data.extension module
            • mlrl.testbed.experiments.output.meta_data.meta_data module
            • mlrl.testbed.experiments.output.meta_data.writer module
          • mlrl.testbed.experiments.output.model package
            • mlrl.testbed.experiments.output.model.arguments module
            • mlrl.testbed.experiments.output.model.extension module
            • mlrl.testbed.experiments.output.model.model module
            • mlrl.testbed.experiments.output.model.writer module
          • mlrl.testbed.experiments.output.parameters package
            • mlrl.testbed.experiments.output.parameters.arguments module
            • mlrl.testbed.experiments.output.parameters.extension module
            • mlrl.testbed.experiments.output.parameters.parameters module
            • mlrl.testbed.experiments.output.parameters.writer module
          • mlrl.testbed.experiments.output.sinks package
            • mlrl.testbed.experiments.output.sinks.sink module
            • mlrl.testbed.experiments.output.sinks.sink_csv module
            • mlrl.testbed.experiments.output.sinks.sink_log module
            • mlrl.testbed.experiments.output.sinks.sink_pickle module
            • mlrl.testbed.experiments.output.sinks.sink_text module
            • mlrl.testbed.experiments.output.sinks.sink_yaml module
          • mlrl.testbed.experiments.output.arguments module
          • mlrl.testbed.experiments.output.data module
          • mlrl.testbed.experiments.output.extension module
          • mlrl.testbed.experiments.output.policies module
          • mlrl.testbed.experiments.output.writer module
        • mlrl.testbed.experiments.context module
        • mlrl.testbed.experiments.data module
        • mlrl.testbed.experiments.dataset module
        • mlrl.testbed.experiments.dataset_type module
        • mlrl.testbed.experiments.experiment module
        • mlrl.testbed.experiments.file_path module
        • mlrl.testbed.experiments.fold module
        • mlrl.testbed.experiments.meta_data module
        • mlrl.testbed.experiments.prediction_scope module
        • mlrl.testbed.experiments.prediction_type module
        • mlrl.testbed.experiments.problem_domain module
        • mlrl.testbed.experiments.recipe module
        • mlrl.testbed.experiments.state module
        • mlrl.testbed.experiments.table module
        • mlrl.testbed.experiments.timer module
      • mlrl.testbed.extensions package
        • mlrl.testbed.extensions.extension module
      • mlrl.testbed.modes package
        • mlrl.testbed.modes.mode module
        • mlrl.testbed.modes.mode_batch module
        • mlrl.testbed.modes.mode_read module
        • mlrl.testbed.modes.mode_run module
        • mlrl.testbed.modes.mode_single module
        • mlrl.testbed.modes.util module
      • mlrl.testbed.util package
        • mlrl.testbed.util.format module
        • mlrl.testbed.util.io module
        • mlrl.testbed.util.math module
        • mlrl.testbed.util.yml module
      • mlrl.testbed.arguments module
      • mlrl.testbed.command module
      • mlrl.testbed.main module
      • mlrl.testbed.program_info module
      • mlrl.testbed.runnables module
    • Package mlrl-common
      • mlrl.common.config package
        • mlrl.common.config.parameters module
      • mlrl.common.cython package
        • mlrl.common.cython.example_weights module
        • mlrl.common.cython.feature_binning module
        • mlrl.common.cython.feature_info module
        • mlrl.common.cython.feature_matrix module
        • mlrl.common.cython.feature_sampling module
        • mlrl.common.cython.instance_sampling module
        • mlrl.common.cython.label_matrix module
        • mlrl.common.cython.learner module
        • mlrl.common.cython.learner_classification module
        • mlrl.common.cython.learner_regression module
        • mlrl.common.cython.multi_threading module
        • mlrl.common.cython.output_matrix module
        • mlrl.common.cython.output_sampling module
        • mlrl.common.cython.output_space_info module
        • mlrl.common.cython.package_info module
        • mlrl.common.cython.partition_sampling module
        • mlrl.common.cython.post_optimization module
        • mlrl.common.cython.prediction module
        • mlrl.common.cython.probability_calibration module
        • mlrl.common.cython.regression_matrix module
        • mlrl.common.cython.rng module
        • mlrl.common.cython.rule_induction module
        • mlrl.common.cython.rule_model module
        • mlrl.common.cython.stopping_criterion module
      • mlrl.common.testbed package
        • mlrl.common.testbed.experiments package
          • mlrl.common.testbed.experiments.output package
            • mlrl.common.testbed.experiments.output.characteristics package
              • mlrl.common.testbed.experiments.output.characteristics.model package
                • mlrl.common.testbed.experiments.output.characteristics.model.characteristics module
                • mlrl.common.testbed.experiments.output.characteristics.model.extension module
                • mlrl.common.testbed.experiments.output.characteristics.model.statistics module
                • mlrl.common.testbed.experiments.output.characteristics.model.writer module
            • mlrl.common.testbed.experiments.output.label_vectors package
              • mlrl.common.testbed.experiments.output.label_vectors.extension module
            • mlrl.common.testbed.experiments.output.model_text package
              • mlrl.common.testbed.experiments.output.model_text.extension module
              • mlrl.common.testbed.experiments.output.model_text.model_text module
              • mlrl.common.testbed.experiments.output.model_text.writer module
          • mlrl.common.testbed.experiments.prediction package
            • mlrl.common.testbed.experiments.prediction.predictor_incremental module
          • mlrl.common.testbed.experiments.experiment module
        • mlrl.common.testbed.program_info module
        • mlrl.common.testbed.runnables module
      • mlrl.common.learners module
      • mlrl.common.mixins module
      • mlrl.common.package_info module
    • Package mlrl-boosting
      • mlrl.boosting.config package
        • mlrl.boosting.config.parameters module
      • mlrl.boosting.cython package
        • mlrl.boosting.cython.head_type module
        • mlrl.boosting.cython.label_binning module
        • mlrl.boosting.cython.learner module
        • mlrl.boosting.cython.learner_boomer module
        • mlrl.boosting.cython.learner_classification module
        • mlrl.boosting.cython.package_info module
        • mlrl.boosting.cython.post_processor module
        • mlrl.boosting.cython.prediction module
        • mlrl.boosting.cython.probability_calibration module
        • mlrl.boosting.cython.regularization module
      • mlrl.boosting.testbed package
        • mlrl.boosting.testbed.experiments package
          • mlrl.boosting.testbed.experiments.output package
            • mlrl.boosting.testbed.experiments.output.probability_calibration package
              • mlrl.boosting.testbed.experiments.output.probability_calibration.extension module
              • mlrl.boosting.testbed.experiments.output.probability_calibration.model_isotonic module
              • mlrl.boosting.testbed.experiments.output.probability_calibration.model_no module
              • mlrl.boosting.testbed.experiments.output.probability_calibration.writer module
        • mlrl.boosting.testbed.runnables module
      • mlrl.boosting.learners module
      • mlrl.boosting.package_info module
    • Package mlrl-testbed-arff
      • mlrl.testbed_arff.experiments package
        • mlrl.testbed_arff.experiments.input package
          • mlrl.testbed_arff.experiments.input.sources package
            • mlrl.testbed_arff.experiments.input.sources.source_arff module
        • mlrl.testbed_arff.experiments.output package
          • mlrl.testbed_arff.experiments.output.sinks package
            • mlrl.testbed_arff.experiments.output.sinks.sink_arff module
    • Package mlrl-seco
      • mlrl.seco.config package
        • mlrl.seco.config.parameters module
      • mlrl.seco.cython package
        • mlrl.seco.cython.heuristic module
        • mlrl.seco.cython.learner module
        • mlrl.seco.cython.learner_seco module
        • mlrl.seco.cython.lift_function module
        • mlrl.seco.cython.package_info module
        • mlrl.seco.cython.stopping_criterion module
      • mlrl.seco.testbed package
        • mlrl.seco.testbed.runnables module
      • mlrl.seco.learners module
      • mlrl.seco.package_info module
  • C++ API Reference
    • Library libmlrlcommon
      • File aggregation_function.hpp
      • File array.hpp
      • File body.hpp
      • File body_conjunctive.hpp
      • File body_empty.hpp
      • File condition.hpp
      • File condition_list.hpp
      • File coverage_mask.hpp
      • File default_rule.hpp
      • File dll_exports.hpp
      • File example_weights.hpp
      • File example_weights_equal.hpp
      • File example_weights_real_valued.hpp
      • File feature_binning.hpp
      • File feature_binning_equal_frequency.hpp
      • File feature_binning_equal_width.hpp
      • File feature_binning_no.hpp
      • File feature_info.hpp
      • File feature_info_equal.hpp
      • File feature_info_mixed.hpp
      • File feature_matrix.hpp
      • File feature_matrix_c_contiguous.hpp
      • File feature_matrix_column_wise.hpp
      • File feature_matrix_csc.hpp
      • File feature_matrix_csr.hpp
      • File feature_matrix_fortran_contiguous.hpp
      • File feature_matrix_row_wise.hpp
      • File feature_sampling.hpp
      • File feature_sampling_no.hpp
      • File feature_sampling_predefined.hpp
      • File feature_sampling_without_replacement.hpp
      • File feature_space.hpp
      • File feature_space_tabular.hpp
      • File feature_subspace.hpp
      • File feature_type.hpp
      • File feature_type_nominal.hpp
      • File feature_type_numerical.hpp
      • File feature_type_ordinal.hpp
      • File feature_vector.hpp
      • File feature_vector_binary.hpp
      • File feature_vector_binned.hpp
      • File feature_vector_equal.hpp
      • File feature_vector_missing.hpp
      • File feature_vector_nominal.hpp
      • File feature_vector_numerical.hpp
      • File feature_vector_ordinal.hpp
      • File global_pruning.hpp
      • File global_pruning_no.hpp
      • File global_pruning_post.hpp
      • File global_pruning_pre.hpp
      • File head.hpp
      • File head_complete.hpp
      • File head_partial.hpp
      • File index_vector.hpp
      • File index_vector_complete.hpp
      • File index_vector_partial.hpp
      • File indexed_value.hpp
      • File instance_sampling.hpp
      • File instance_sampling_no.hpp
      • File instance_sampling_stratified_example_wise.hpp
      • File instance_sampling_stratified_output_wise.hpp
      • File instance_sampling_with_replacement.hpp
      • File instance_sampling_without_replacement.hpp
      • File interval.hpp
      • File iterator_binned.hpp
      • File iterator_equal.hpp
      • File iterator_forward_non_zero_index.hpp
      • File iterator_forward_sparse.hpp
      • File iterator_forward_sparse_binary.hpp
      • File iterator_index.hpp
      • File iterators.hpp
      • File label_matrix_c_contiguous.hpp
      • File label_matrix_csr.hpp
      • File label_matrix_row_wise.hpp
      • File label_vector.hpp
      • File label_vector_set.hpp
      • File learner.hpp
      • File learner_classification.hpp
      • File learner_classification_common.hpp
      • File learner_common.hpp
      • File learner_regression.hpp
      • File learner_regression_common.hpp
      • File library_info.hpp
      • File math.hpp
      • File matrix_c_contiguous.hpp
      • File matrix_dense.hpp
      • File matrix_lil.hpp
      • File matrix_lil_binary.hpp
      • File matrix_sparse_binary.hpp
      • File matrix_sparse_set.hpp
      • File measure_distance.hpp
      • File measure_evaluation.hpp
      • File measure_evaluation_sparse.hpp
      • File memory.hpp
      • File model_builder.hpp
      • File model_builder_intermediate.hpp
      • File multi_threading.hpp
      • File multi_threading_manual.hpp
      • File multi_threading_no.hpp
      • File opencl.hpp
      • File openmp.hpp
      • File output_matrix.hpp
      • File output_sampling.hpp
      • File output_sampling_no.hpp
      • File output_sampling_round_robin.hpp
      • File output_sampling_without_replacement.hpp
      • File output_space_info.hpp
      • File output_space_info_no.hpp
      • File partition.hpp
      • File partition_bi.hpp
      • File partition_sampling.hpp
      • File partition_sampling_bi_random.hpp
      • File partition_sampling_bi_stratified_example_wise.hpp
      • File partition_sampling_bi_stratified_output_wise.hpp
      • File partition_sampling_no.hpp
      • File partition_single.hpp
      • File post_optimization.hpp
      • File post_optimization_no.hpp
      • File post_optimization_phase_list.hpp
      • File post_optimization_sequential.hpp
      • File post_optimization_unused_rule_removal.hpp
      • File post_processor.hpp
      • File post_processor_no.hpp
      • File prediction.hpp
      • File prediction_complete.hpp
      • File prediction_evaluated.hpp
      • File prediction_matrix_dense.hpp
      • File prediction_matrix_sparse_binary.hpp
      • File prediction_partial.hpp
      • File predictor.hpp
      • File predictor_binary.hpp
      • File predictor_binary_no.hpp
      • File predictor_common.hpp
      • File predictor_probability.hpp
      • File predictor_probability_no.hpp
      • File predictor_score.hpp
      • File predictor_score_no.hpp
      • File probability_calibration.hpp
      • File probability_calibration_isotonic.hpp
      • File probability_calibration_joint.hpp
      • File probability_calibration_marginal.hpp
      • File probability_calibration_no.hpp
      • File properties.hpp
      • File quality.hpp
      • File refinement.hpp
      • File refinement_comparator_fixed.hpp
      • File refinement_comparator_single.hpp
      • File regression_matrix_c_contiguous.hpp
      • File regression_matrix_csr.hpp
      • File regression_matrix_row_wise.hpp
      • File ring_buffer.hpp
      • File rng.hpp
      • File rule_compare_function.hpp
      • File rule_induction.hpp
      • File rule_induction_top_down_beam_search.hpp
      • File rule_induction_top_down_greedy.hpp
      • File rule_list.hpp
      • File rule_model.hpp
      • File rule_model_assemblage.hpp
      • File rule_model_assemblage_sequential.hpp
      • File rule_pruning.hpp
      • File rule_pruning_irep.hpp
      • File rule_pruning_no.hpp
      • File rule_refinement.hpp
      • File rule_refinement_statistics_based.hpp
      • File score_processor.hpp
      • File score_vector.hpp
      • File score_vector_binned_dense.hpp
      • File score_vector_bit.hpp
      • File score_vector_dense.hpp
      • File statistics.hpp
      • File statistics_provider.hpp
      • File statistics_space.hpp
      • File statistics_state.hpp
      • File statistics_subset.hpp
      • File statistics_subset_resettable.hpp
      • File statistics_update.hpp
      • File statistics_update_candidate.hpp
      • File statistics_update_candidate_common.hpp
      • File statistics_weighted.hpp
      • File stopping_criterion.hpp
      • File stopping_criterion_list.hpp
      • File stopping_criterion_no.hpp
      • File stopping_criterion_size.hpp
      • File stopping_criterion_time.hpp
      • File stratified_sampling_example_wise.hpp
      • File stratified_sampling_output_wise.hpp
      • File strings.hpp
      • File threads.hpp
      • File types.hpp
      • File validation.hpp
      • File vector_bit.hpp
      • File vector_dense.hpp
      • File vector_dok_binary.hpp
      • File vector_sparse_array.hpp
      • File vector_sparse_array_binary.hpp
      • File view.hpp
      • File view_composite.hpp
      • File view_compressed.hpp
      • File view_functions.hpp
      • File view_matrix.hpp
      • File view_matrix_c_contiguous.hpp
      • File view_matrix_composite.hpp
      • File view_matrix_csc.hpp
      • File view_matrix_csc_binary.hpp
      • File view_matrix_csr.hpp
      • File view_matrix_csr_binary.hpp
      • File view_matrix_dense.hpp
      • File view_matrix_fortran_contiguous.hpp
      • File view_matrix_lil.hpp
      • File view_matrix_sparse.hpp
      • File view_matrix_sparse_binary.hpp
      • File view_matrix_sparse_set.hpp
      • File view_vector.hpp
      • File view_vector_binned.hpp
      • File view_vector_bit.hpp
      • File view_vector_composite.hpp
      • File view_vector_compressed.hpp
      • File view_vector_indexed.hpp
      • File view_vector_sparse_set.hpp
      • File weight_sampling.hpp
      • File weight_vector.hpp
      • File weight_vector_bit.hpp
      • File weight_vector_dense.hpp
      • File weight_vector_equal.hpp
      • File weight_vector_out_of_sample.hpp
    • Library libmlrlboosting
      • File blas.hpp
      • File default_rule_auto.hpp
      • File discretization_function.hpp
      • File discretization_function_probability.hpp
      • File discretization_function_score.hpp
      • File dll_exports.hpp
      • File feature_binning_auto.hpp
      • File head_type.hpp
      • File head_type_auto.hpp
      • File head_type_complete.hpp
      • File head_type_partial_dynamic.hpp
      • File head_type_partial_fixed.hpp
      • File head_type_single.hpp
      • File iterator_diagonal.hpp
      • File label_binning.hpp
      • File label_binning_auto.hpp
      • File label_binning_equal_width.hpp
      • File label_binning_no.hpp
      • File lapack.hpp
      • File learner.hpp
      • File learner_boomer_classifier.hpp
      • File learner_boomer_regressor.hpp
      • File learner_classification.hpp
      • File learner_common.hpp
      • File library_info.hpp
      • File loss.hpp
      • File loss_decomposable.hpp
      • File loss_decomposable_logistic.hpp
      • File loss_decomposable_sparse.hpp
      • File loss_decomposable_squared_error.hpp
      • File loss_decomposable_squared_hinge.hpp
      • File loss_non_decomposable.hpp
      • File loss_non_decomposable_logistic.hpp
      • File loss_non_decomposable_squared_error.hpp
      • File loss_non_decomposable_squared_hinge.hpp
      • File math.hpp
      • File matrix_c_contiguous_numeric.hpp
      • File matrix_sparse_set_numeric.hpp
      • File parallel_rule_refinement_auto.hpp
      • File parallel_statistic_update_auto.hpp
      • File partition_sampling_auto.hpp
      • File predictor_binary_auto.hpp
      • File predictor_binary_common.hpp
      • File predictor_binary_example_wise.hpp
      • File predictor_binary_gfm.hpp
      • File predictor_binary_output_wise.hpp
      • File predictor_probability_auto.hpp
      • File predictor_probability_common.hpp
      • File predictor_probability_marginalized.hpp
      • File predictor_probability_output_wise.hpp
      • File predictor_score_common.hpp
      • File predictor_score_output_wise.hpp
      • File probability_calibration_isotonic.hpp
      • File probability_function_chain_rule.hpp
      • File probability_function_joint.hpp
      • File probability_function_logistic.hpp
      • File probability_function_marginal.hpp
      • File regularization.hpp
      • File regularization_manual.hpp
      • File regularization_no.hpp
      • File rule_compare_function.hpp
      • File rule_evaluation.hpp
      • File rule_evaluation_decomposable.hpp
      • File rule_evaluation_decomposable_complete.hpp
      • File rule_evaluation_decomposable_complete_binned.hpp
      • File rule_evaluation_decomposable_partial_dynamic.hpp
      • File rule_evaluation_decomposable_partial_dynamic_binned.hpp
      • File rule_evaluation_decomposable_partial_fixed.hpp
      • File rule_evaluation_decomposable_partial_fixed_binned.hpp
      • File rule_evaluation_decomposable_single.hpp
      • File rule_evaluation_decomposable_sparse.hpp
      • File rule_evaluation_non_decomposable.hpp
      • File rule_evaluation_non_decomposable_complete.hpp
      • File rule_evaluation_non_decomposable_complete_binned.hpp
      • File rule_evaluation_non_decomposable_partial_dynamic.hpp
      • File rule_evaluation_non_decomposable_partial_dynamic_binned.hpp
      • File rule_evaluation_non_decomposable_partial_fixed.hpp
      • File rule_evaluation_non_decomposable_partial_fixed_binned.hpp
      • File rule_list_builder.hpp
      • File shrinkage_constant.hpp
      • File statistic.hpp
      • File statistic_format.hpp
      • File statistic_format_auto.hpp
      • File statistic_format_dense.hpp
      • File statistic_format_sparse.hpp
      • File statistic_type.hpp
      • File statistic_type_float32.hpp
      • File statistic_type_float64.hpp
      • File statistics.hpp
      • File statistics_decomposable.hpp
      • File statistics_non_decomposable.hpp
      • File statistics_provider_decomposable_dense.hpp
      • File statistics_provider_decomposable_sparse.hpp
      • File statistics_provider_non_decomposable_dense.hpp
      • File transformation_binary.hpp
      • File transformation_binary_example_wise.hpp
      • File transformation_binary_gfm.hpp
      • File transformation_binary_output_wise.hpp
      • File transformation_probability.hpp
      • File transformation_probability_marginalized.hpp
      • File transformation_probability_output_wise.hpp
      • File vector_statistic_decomposable_dense.hpp
      • File vector_statistic_decomposable_sparse.hpp
      • File vector_statistic_non_decomposable_dense.hpp
      • File view_statistic_non_decomposable_dense.hpp
    • Library libmlrlseco
      • File confusion_matrix.hpp
      • File decision_list_builder.hpp
      • File dll_exports.hpp
      • File head_type.hpp
      • File head_type_partial.hpp
      • File head_type_single.hpp
      • File heuristic.hpp
      • File heuristic_accuracy.hpp
      • File heuristic_f_measure.hpp
      • File heuristic_laplace.hpp
      • File heuristic_m_estimate.hpp
      • File heuristic_precision.hpp
      • File heuristic_recall.hpp
      • File heuristic_wra.hpp
      • File learner.hpp
      • File learner_common.hpp
      • File learner_seco_classifier.hpp
      • File library_info.hpp
      • File lift_function.hpp
      • File lift_function_kln.hpp
      • File lift_function_no.hpp
      • File lift_function_peak.hpp
      • File matrix_coverage_dense.hpp
      • File matrix_statistic_decomposable_dense.hpp
      • File predictor_binary_output_wise.hpp
      • File rule_compare_function.hpp
      • File rule_evaluation.hpp
      • File rule_evaluation_decomposable.hpp
      • File rule_evaluation_decomposable_partial.hpp
      • File rule_evaluation_decomposable_single.hpp
      • File statistics.hpp
      • File statistics_decomposable.hpp
      • File statistics_provider_decomposable_dense.hpp
      • File stopping_criterion_coverage.hpp
      • File vector_confusion_matrix_dense.hpp

Further Information

  • Release Notes
  • Contributors
  • Code of Conduct
  • MIT License
  • Source Code
  • Issue Tracker
Back to top
View this page

Histogram-based Search¶

The exploitation of feature sparsity helps reduce training times on many benchmark datasets, as they often come with high feature sparsity. However, it does not provide significant advantages on datasets with low feature sparsity. Our algorithms provide an alternative to the pre-sorted search algorithm to efficiently deal with the latter type of datasets. It is based on assigning examples with similar values for a particular feature to a predefined number of bins and using an aggregated representation of their corresponding label space statistics, referred to as histograms. Depending on how many bins are used, this approach drastically reduces the number of candidates the rule induction algorithm must consider. Histogram-based approaches have previously been used to deal with complex classification tasks in modern implementations of gradient boosted decision trees, such as XGBoost[1] or LightGBM[2]. In the following, we discuss a generalization of the underlying concept, which has evolved from prior research on decision tree learning[3][4][5][6], to rule learning methods.

Assigning Examples to Bins¶

A histogram-based rule induction algorithm requires grouping the available training examples into a predefined number of bins. Different approaches can principally be used to determine such a mapping[7]. We restrict ourselves to unsupervised binning methods, where the assignment is solely based on the feature values of the training examples. This is in contrast to supervised methods, such as the weighted quantile sketch approach that originates from the XGBoost algorithm[1], where information about the true class labels of individual examples, or even their label space statistics, are taken into account. Compared to approaches that utilize the label space statistics map from examples to bins, unsupervised binning methods can usually be implemented more efficiently. This is because a mapping solely based on feature values remains unchanged for the entire training process, whereas the statistics for individual examples are subject to change and require adjusting the mapping whenever a model is refined.

Equal-width Feature Binning¶

The first binning method that we consider for our experiments is referred to as equal-width binning. This method, which is commonly used to discretize numerical feature values, is based on dividing the range of values for a particular feature into equally-sized intervals, such that the absolute difference between the smallest and largest value in each bin are the same. Given a predefined number of bins \(B\), the maximum difference between the values that are assigned to an interval calculates as

\[w = \frac{\textit{max} - \textit{min}}{B},\]

where \(\textit{min}\) and \(\textit{max}\) denote the largest and smallest value in a bin, respectively. Based on the value \(w\), a mapping \(\sigma: \mathbb{R} \rightarrow \mathbb{N}^{+}\) from individual feature values \(x_{n}\) to the index of the corresponding bin can be obtained as

\[\sigma_{\textit{eq.-width}} ( x_{n} ) = \text{min} ( \lfloor \frac{x_{n} - \textit{min}}{w} \rfloor + 1, B ).\]

Equal-frequency Feature Binning¶

Another well-known method to discretize numerical features is equal-frequency binning. Unlike equal-width binning, which is supposed to result in bins with values close to each other, this particular discretization method aims to obtain bins that contain approximately the same number of values. The available examples are first sorted in ascending order by their respective feature values to determine the bins for a particular feature. This results in a sorted vector of feature values \(( x_{\tau ( 1 )}, \dots, x_{\tau ( N )} )\), where the permutation function \(\tau ( i )\) specifies the index of the example that corresponds to the \(i\)-th element in the sorted vector. Afterward, the sorted values are divided into a predefined number of intervals, such that each bin contains the same number of values. Given an individual feature value \(x_{n}\), the index of the corresponding bin calculates as

\[\sigma_{\textit{eq.-freq.}} ( x_{n} ) = \lfloor \tau ( n ) - 1 \rfloor + 1.\]

In practice, examples with identical feature values should be prevented from being assigned to different bins. However, for reasons of brevity, this is omitted from the above formula.

Assigning Discrete Values to Bins¶

To handle datasets that do not only include numerical feature values, but also come with nominal features, we use an appropriate binning method to deal with the latter. It creates a bin for each discrete value encountered in the training data and assigns examples with identical values to the same bin.

Enumeration of Thresholds¶

We denote the set of example indices that have been assigned to the \(b\)-th bin via a mapping function \(\sigma\) as

\[\mathcal{B}_{b} = \{ n \in \{ 1, \dots, N \} \rvert \sigma ( x_{n} ) = b \}.\]

Given \(B\) bins previously created for a particular feature, one can obtain \(B - 1\) thresholds that the conditions of potential candidate rules may use. Depending on whether the \(\leq\) or \(>\) operator is used by a condition, the \(b\)-th threshold separates the examples that correspond to the bins \(\mathcal{B}_{1}, \dots, \mathcal{B}_{b}\) from the examples that have been assigned to the bins \(\mathcal{B}_{b + 1}, \dots, \mathcal{B}_{B}\). The individual thresholds \(t_{1}, \dots, t_{B - 1}\) calculate as the average of the largest and smallest feature value in two neighboring bins \(\mathcal{B}_{b}\) and \(\mathcal{B}_{b + 1}\). Depending on the characteristics of the binning method at hand, some bins may remain empty. For the enumeration of potential thresholds, bins that are not associated with any examples should be ignored. When dealing with bins that have been created from nominal feature values, all examples in a particular bin have the same feature value. In such a case, the conditions in a rule’s body may test for presence or absence of these \(B\) feature values.

Creation of Histograms¶

When using unsupervised binning methods, the mapping of examples to bins and the thresholds resulting from individual bins must only be determined once during training. They should be obtained when a particular feature is considered by the rule induction algorithm for the first time and should be kept in memory for repeated access. In contrast, the histograms that serve as a basis for evaluating candidate rules must be created from scratch whenever a rule should be refined. As shown in the pseudocode below, they result from aggregating the label space statistics of examples that have been assigned to the same bin. Examples that do not satisfy the conditions that have previously been added to the body of a rule must be ignored. We use an indicator function \(I\) to keep track of the examples that are covered by a rule. In addition, the extent to which the statistics of individual training examples contribute to a histogram depends on their respective weights. This enables the histogram-based search algorithm to use different samples of the available training examples to induce individual rules.

\[\begin{split}\textbf{in:}\quad & \text{Bins } (\mathcal{B}_{b})_b^B, \text{ statistics } S = \{ \boldsymbol{s} \}_n^N, \\ & \text{ indicator function } I, \text{ weights of training examples } \boldsymbol{w} \\ \\ \text{1:} \quad & \text{initialize empty histogram } S' = \{ \boldsymbol{s}'_{b} \}_b^B, \text{ where all elements of } \boldsymbol{s}'_b \text{ are set to zero} \\ \text{2:} \quad & \textbf{for } n = 1 \textbf{ to } N \textbf{ do} \\ \text{3:} \quad & \quad \textbf{if } I ( n ) = 1 \textbf{ and } w_{n} > 0 \textbf{ then} \\ \text{4:} \quad & \quad \quad \text{obtain bin index } b = \sigma ( x_n ) \\ \text{5:} \quad & \quad \quad \text{update } S' \text{ by setting } \boldsymbol{s}'_b = \boldsymbol{s}'_b + w_n \boldsymbol{s}_n = \sigma ( x_n ) \\ \text{6:} \quad & \textbf{return} \text{ histogram } S'\end{split}\]

Evaluation of Refinements¶

When using a histogram-based search algorithm, evaluating candidate rules in terms of a given loss function, in case of the BOOMER algorithm, or heuristic, in case of the SeCo algorithm, follows the same principles as its pre-sorted counterpart. However, instead of taking the feature values of individual training examples into account for making up conditions that can be added to a rule’s body, the conditions to be considered by the histogram-based algorithm result from the predetermined thresholds that correspond to the bins for a particular feature. Even when an existing rule should be refined, i.e., when an existing rule covers only a subset of the training examples, the thresholds remain unchanged to increase the algorithm’s efficiency. Similar to the pre-sorted rule induction algorithm, the histogram-based approach is based on incrementally aggregating the statistics of training examples that are covered by the considered refinements. However, instead of aggregating statistics at the level of individual training examples, it relies on the statistics that correspond to the individual bins of a histogram. For the efficient evaluation of conditions that use the \(>\) operator in case of numerical features, or the \(\neq\) operator in case of nominal features, the algorithm is provided with globally aggregated statistics that are determined beforehand and computes the difference between previously processed statistics that correspond to individual bins and the globally aggregated ones as previously discussed here. The statistics of examples with missing feature values are excluded from the globally aggregated statistics, as previously described here. In addition, the respective examples are ignored when determining the mapping to individual bins. Consequently, the histogram-based rule induction method can handle missing feature values.


[1] (1,2)

Tianqi Chen and Carlos Guestrin (2016). ‘XGBoost: A Scalable Tree Boosting System’. In: Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794.

[2]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu (2017). ‘LightGBM: A Highly Efficient Gradient Boosting Decision Tree’. In: Proc. Advances in Neural Information Processing Systems 30, pp. 3146–3154.

[3]

Khaled Alsabti, Sanjay Ranka, and Vineet Singh (1998). ‘CLOUDS: A Decision Tree Classifier for Large Datasets’. In: Proc. International Conference on Knowledge Discovery and Data Mining, pp. 2-8.

[4]

Ruoming Jin and Gagan Agrawal (2003). ‘Communication and Memory Efficient Parallel Decision Tree Construction’. In: Proc. SIAM International Conference on Data Mining, pp. 119-129.

[5]

Ping Li, Qiang Wu, and Christopher Burges. ‘McRank: Learning to Rank Using Multiple Classification and Gradient Boosting’. In: Advances in Neural Information Processing Systems, 20.

[6]

Chandrika Kamath, Erick Cantú-Paz, and David Littau (2002). ‘Approximate Splitting for Ensembles of Trees using Histograms’. In: Proc. SIAM International Conference on Data Mining, pp. 370–383.

[7]

Sotiris B. Kotsiantis and Dimitris Kanellopoulos (2006). ‘Discretization Techniques: A recent survey’. In: GESTS International Transactions on Computer Science and Engineering, 32.1, pp. 47-58.

Next
Multi-Threading
Previous
Missing Feature Values
Copyright © 2020-2026, Michael Rapp et al.
Made with Sphinx and @pradyunsg's Furo
On this page
  • Histogram-based Search
    • Assigning Examples to Bins
      • Equal-width Feature Binning
      • Equal-frequency Feature Binning
      • Assigning Discrete Values to Bins
    • Enumeration of Thresholds
    • Creation of Histograms
    • Evaluation of Refinements