mlrl.common.cython.stopping_criterion module

@author: Michael Rapp (michael.rapp.ml@gmail.com)

class mlrl.common.cython.stopping_criterion.AggregationFunction(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: Enum

Specifies different types of aggregation functions that allow to aggregate the values that are stored in a buffer.

ARITHMETIC_MEAN = 2
MAX = 1
MIN = 0
class mlrl.common.cython.stopping_criterion.AggregationFunctionImpl(value, names=None, *values, module=None, qualname=None, type=None, start=1, boundary=None)

Bases: IntFlag

ARITHMETIC_MEAN = 2
MAX = 1
MIN = 0
class mlrl.common.cython.stopping_criterion.PostPruningConfig

Bases: object

Defines an interface for all classes that allow to configure a stopping criterion that keeps track of the number of rules in a model that perform best with respect to the examples in the training or holdout set according to a certain measure.

This stopping criterion assesses the performance of the current model after every interval rules and stores and checks whether the current model is the best one evaluated so far.

get_interval() int

Returns the interval that is used to check whether the current model is the best one evaluated so far.

Returns:

The interval that is used to check whether the current model is the best one evaluated so far

get_min_rules() int

Returns the minimum number of rules that must be included in a model.

Returns:

The minimum number of rules that must be included in a model

is_holdout_set_used() bool

Returns whether the quality of the current model’s predictions is measured on the holdout set, if available, or if the training set is used instead.

Returns:

True, if the quality of the current model’s predictions is measured on the holdout set, if available, False, if the training set is used instead

is_remove_unused_rules() bool

Returns whether rules that have been induced, but are not used, should be removed from the final model or not.

Returns:

True, if unused rules should be removed from the model, False otherwise

set_interval(interval: int) PostPruningConfig

Sets the interval that should be used to check whether the current model is the best one evaluated so far.

Parameters:

interval – The interval that should be used to check whether the current model is the best one evaluated so far, e.g., a value of 10 means that the best model may include 10, 20, … rules

Returns:

A PostPruningConfig that allows further configuration of the stopping criterion

set_min_rules(min_rules: int) PostPruningConfig

Sets the minimum number of rules that must be included in a model.

Parameters:

min_rules – The minimum number of rules that must be included in a model. Must be at least 1

Returns:

A PostPruningConfig that allows further configuration of the stopping criterion

set_remove_unused_rules(remove_unused_rules: bool) PostPruningConfig

Sets whether rules that have been induced, but are not used, should be removed from the final model or not.

Parameters:

remove_unused_rules – True, if unused rules should be removed from the model, false otherwise

Returns:

A PostPruningConfig that allows further configuration of the stopping criterion

set_use_holdout_set(use_holdout_set: bool) PostPruningConfig

Sets whether the quality of he current model’s predictions should be measured on the holdout set, if available, or if the training set should be used instead.

Parameters:

use_holdout_set – True, if the quality of the current model’s predictions should be measured on the holdout set, if available, False, if the training set should be used instead

Returns:

A PostPruningConfig that allows further configuration of the stopping criterion

class mlrl.common.cython.stopping_criterion.PrePruningConfig

Bases: object

Allow to configure a stopping criterion that stops the induction of rules as soon as the quality of a model’s predictions for the examples in a holdout set do not improve according to a certain measure.

This stopping criterion assesses the performance of the current model after every updateInterval rules and stores its quality in a buffer that keeps track of the last numCurrent iterations. If the capacity of this buffer is already reached, the oldest quality is passed to a buffer of size numPast. Every stopInterval rules, it is decided whether the rule induction should be stopped. For this reason, the numCurrent qualities in the first buffer, as well as the numPast qualities in the second buffer are aggregated according to a certain aggregation_function. If the percentage improvement, which results from comparing the more recent qualities from the first buffer to the older qualities from the second buffer, is greater than a certain minImprovement, the rule induction is continued, otherwise it is stopped.

get_aggregation_function() AggregationFunction

Returns the type of the aggregation function that is used to aggregate the values that are stored in a buffer.

Returns:

A value of the enum AggregationFunction that specifies the type of the aggregation function that is used to aggregate the values that are stored in a buffer

get_min_improvement() float

Returns the minimum improvement that must be reached for the rule induction to be continued.

Returns:

The minimum improvement that must be reached for the rule induction to be continued

get_min_rules() int

Returns the minimum number of rules that must have been learned until the induction of rules might be stopped.

Returns:

The minimum number of rules that must have been learned until the induction of rules might be stopped

get_num_current() int

Returns the number of the most recent iterations that are stored in a buffer.

Returns:

The number of the most recent iterations that are stored in a buffer

get_num_past() int

Returns the number of quality stores of past iterations that are stored in a buffer.

Returns:

The number of quality stores of past iterations that are stored in a buffer

get_stop_interval() int

Returns the interval that is used to decide whether the induction of rules should be stopped.

Returns:

The interval that is used to decide whether the induction of rules should be stopped

get_update_interval() int

Returns the interval that is used to update the quality of the current model.

Returns:

The interval that is used to update the quality of the current model

is_holdout_set_used() bool

Returns whether the quality of the current model’s predictions is measured on the holdout set, if available, or if the training set is used instead.

Returns:

True, if the quality of the current model’s predictions is measured on the holdout set, if available, False, if the training set is used instead

is_remove_unused_rules() bool

Returns whether rules that have been induced, but are not used, should be removed from the final model or not.

Returns:

True, if unused rules should be removed from the model, False otherwise

set_aggregation_function(aggregation_function: AggregationFunction) PrePruningConfig

Sets the type of the aggregation function that should be used to aggregate the values that are stored in a buffer.

Parameters:

aggregation_function – A value of the enum AggregationFunction that specifies the type of the aggregation function that should be used to aggregate the values that are stored in a buffer

Returns:

A PrePruningConfig that allows further configuration of the stopping criterion

set_min_improvement(min_improvement: float) PrePruningConfig

Sets the minimum improvement that must be reached for the rule induction to be continued.

Parameters:

min_improvement – The minimum improvement in percent that must be reached for the rule induction to be continued. Must be in [0, 1]

Returns:

A PrePruningConfig that allows further configuration of the stopping criterion

set_min_rules(min_rules: int) PrePruningConfig

Sets the minimum number of rules that must have been learned until the induction of rules might be stopped.

Parameters:

min_rules – The minimum number of rules that must have been learned until the induction of rules might be stopped. Must be at least 1

Returns:

A PrePruningConfig that allows further configuration of the stopping criterion

set_num_current(num_current: int) PrePruningConfig

Sets the number of the most recent iterations that should be stored in a buffer.

Parameters:

num_current – The number of the most recent iterations that should be stored in a buffer. Must be at least 1

Returns:

A PrePruningConfig that allows further configuration of the stopping criterion

set_num_past(num_past: int) PrePruningConfig

Sets the number of past iterations that should be stored in a buffer.

Parameters:

num_past – The number of past iterations that should be be stored in a buffer. Must be at least 1

Returns:

A PrePruningConfig that allows further configuration of the stopping criterion

set_remove_unused_rules(remove_unused_rules: bool) PrePruningConfig

Sets whether rules that have been induced, but are not used, should be removed from the final model or not.

Parameters:

remove_unused_rules – True, if unused rules should be removed from the model, false otherwise

Returns:

A PrePruningConfig that allows further configuration of the stopping criterion

set_stop_interval(stop_interval: int) PrePruningConfig

Sets the interval that should be used to decide whether the induction of rules should be stopped.

Parameters:

stop_interval – The interval that should be used to decide whether the induction of rules should be stopped, e.g., a value of 10 means that the rule induction might be stopped after 10, 20, … rules. Must be a multiple of the update interval

Returns:

A PrePruningConfig that allows further configuration of the stopping criterion

set_update_interval(update_interval: int) PrePruningConfig

Sets the interval that should be used to update the quality of the current model.

Parameters:

update_interval – The interval that should be used to update the quality of the current model, e.g., a * value of 5 means that the model quality is assessed every 5 rules. Must be at least 1

Returns:

A PrePruningConfig that allows further configuration of the stopping criterion

set_use_holdout_set(use_holdout_set: bool) PrePruningConfig

Sets whether the quality of he current model’s predictions should be measured on the holdout set, if available, or if the training set should be used instead.

Parameters:

use_holdout_set – True, if the quality of the current model’s predictions should be measured on the holdout set, if available, False, if the training set should be used instead

Returns:

A PrePruningConfig that allows further configuration of the stopping criterion

class mlrl.common.cython.stopping_criterion.SizeStoppingCriterionConfig

Bases: object

Allows to configure a stopping criterion that ensures that the number of induced rules does not exceed a certain maximum.

get_max_rules() int

Returns the maximum number of rules that are induced.

Returns:

The maximum number of rules that are induced

set_max_rules(max_rules: int) SizeStoppingCriterionConfig

Sets the maximum number of rules that should be induced.

Parameters:

max_rules – The maximum number of rules that should be induced. Must be at least 1

Returns:

A SizeStoppingCriterionConfig that allows further configuration of the stopping criterion

class mlrl.common.cython.stopping_criterion.TimeStoppingCriterionConfig

Bases: object

Allows to configure a stopping criterion that ensures that a certain time limit is not exceeded.

get_time_limit() int

Returns the time limit.

Returns:

The time limit in seconds

set_time_limit(time_limit: int) TimeStoppingCriterionConfig

Sets the time limit.

Parameters:

time_limit – The time limit in seconds. Must be at least 1

Returns:

A TimeStoppingCriterionConfig that allows further configuration of the stopping criterion