File heuristic.hpp

namespace seco
class IHeuristic
#include <heuristic.hpp>

Defines an interface for all heuristics that allows to calculate numerical scores that assess the quality of rules, based on the elements of confusion matrices. Given the elements of a confusion matrix, such a heuristic calculates a numerical score in [0, 1].

All heuristics must be implemented as gain metrics, i.e., rules with a greater numerical score are considered better than those with a smaller numerical score.

All heuristics must treat positive and negative labels equally, i.e., if the ground truth and a rule’s predictions would be inverted, the resulting numerical scores must be the same as before.

Public Functions

inline virtual ~IHeuristic()
virtual float32 evaluateConfusionMatrix(float32 tp, float32 fp, float32 fn, float32 tn) const = 0

Calculates and returns a numerical score in [0, 1] given the elements of a confusion matrix. All elements must be equal to or greater than 0. If a rule does not cover any elements, i.e., if CIN + CIP + CRN + CRP == 0, the worst possible quality 0 must be returned.

According to the notation in https://ke-tud.github.io/bibtex/attachments/single/432, a confusion matrix consists of 8 elements, namely CIN, CIP, CRN, CRP, UIN, UIP, URN and URP. The individual symbols used in this notation have the following meaning:

  • The first symbol denotes whether the corresponding labels are covered (C) or uncovered (U) by the rule.

  • The second symbol denotes relevant (R) or irrelevant (I) labels according to the ground truth.

  • The third symbol denotes labels for which the prediction of a rule is positive (P) or negative (N).

This results in the terminology given in the following table:

       | ground-   |           |
       | truth     | predicted |
——–&#8212;|——–&#8212;|——–&#8212;|–&#8212; covered | 0 | 0 | CIN | 0 | 1 | CIP | 1 | 0 | CRN | 1 | 1 | CRP ——–&#8212;|——–&#8212;|——–&#8212;|–&#8212; uncovered | 0 | 0 | UIN | 0 | 1 | UIP | 1 | 0 | URN | 1 | 1 | URP

However, all common heuristics are calculated based on confusion matrices with 4 elements, denoted as TP, FP, TN and FN. We define these elements in terms of the previous notation:

  • True Positives: TP = CIN + CRP

  • False Positives: FP = CIP + CRN

  • True Negatives: TN = UIN + URP

  • False Negatives: FN = UIP + URN

Real numbers are used for the individual elements to cater for scenarios where real-valued weights are assigned to the individual labels or examples.

Parameters:
  • tp – The number of true positives, i.e., the number of covered labels for which the prediction in the rule’s head is correct

  • fp – The number of false positives, i.e., the number of covered labels for which the prediction in the rule’s head is incorrect

  • fn – The number of false positives, i.e., the number of uncovered labels for which the prediction in the rule’s head would be correct

  • tn – The number of true negatives, i.e., the number of uncovered labels for which the prediction in the rule’s head would be incorrect

Returns:

The quality that has been calculated

class IHeuristicFactory
#include <heuristic.hpp>

Defines an interface for all factories that allow to create instances of the type IHeuristic.

Public Functions

inline virtual ~IHeuristicFactory()
virtual std::unique_ptr<IHeuristic> create() const = 0

Creates and returns a new object of type IHeuristic.

Returns:

An unique pointer to an object of type IHeuristic that has been created

class IHeuristicConfig
#include <heuristic.hpp>

Defines an interface for all classes that allow to configure a heuristic.

Subclassed by seco::AccuracyConfig, seco::FMeasureConfig, seco::LaplaceConfig, seco::MEstimateConfig, seco::PrecisionConfig, seco::RecallConfig, seco::WraConfig

Public Functions

inline virtual ~IHeuristicConfig()
virtual std::unique_ptr<IHeuristicFactory> createHeuristicFactory() const = 0

Creates and returns a new object of type IHeuristicFactory according to the specified configuration.

Returns:

An unique pointer to an object of type IHeuristicFactory that has been created