Coding Standards¶
As it is common for Open Source projects, where everyone is invited to contribute, we rely on coding standards to ensure that new code works as expected, does not break existing functionality, and adheres to best practices we agreed on. These coding standards are described in the following.
Testing the Code¶
To be able to detect problems with the project’s source code early during development, it comes with unit and integration tests for the C++ and Python code it contains. If you want to execute all of these tests on your own system, you can use the following command:
./build tests
./build tests
build.bat tests
This will result in all tests being run and their results being reported. The tests are run automatically for pull requests via Continuous Integration whenever relevant parts of the source code have been modified.
Note
If you want to execute the tests for the C++ or Python code independently, you can use the build target tests_cpp or tests_python instead of tests.
Important
Tests for the C++ code are only executed if the project has been compiled with testing support enabled. As described in the section Build Options, testing support is enabled by default if the GoogleTest framework is available on the system.
Using Multiple Workers¶
If the execution of Python tests should be parallelized across multiple workers, the environment variable NUM_WORKERS may be used:
NUM_WORKERS=4 ./build tests
FAIL_FAST=4 ./build tests
$env:NUM_WORKERS = 4
build.bat tests
Note
If the environment variable NUM_WORKERS is set to the value auto, the number of workers is chosen automatically based on the number of available CPU cores.
Failing Fast¶
If the execution should be aborted as soon as a single test fails, the environment variable FAIL_FAST can be used as shown below:
FAIL_FAST=true ./build tests
FAIL_FAST=true ./build tests
$env:FAIL_FAST = "true"
build.bat tests
Running Only Failed Tests¶
To run only the tests that failed at the last run (or all if none failed), the environment ONLY_FAILED can be specified:
ONLY_FAILED=true ./build tests
ONLY_FAILED=true ./build tests
$env:ONLY_FAILED = "true"
build.bat tests
Running Only Selected Tests¶
It is also possible to only run the tests for certain subprojects (see Project Structure) by providing their names as a comma-separated list via the environment variable SUBPROJECTS:
SUBPROJECTS=boosting,seco ./build tests
SUBPROJECTS=boosting,seco ./build tests
$env:SUBPROJECTS = "boosting,seco"
build.bat tests
Alternatively, markers of the tests that should be run can be specified via the environment variable MARKERS. This provides more fine-grained control, but applies only to Python tests. The following markers are available: boosting, seco, classification, regression.
MARKERS=boosting,seco ./build tests
MARKERS=boosting,seco ./build tests
$env:MARKERS = "boosting,seco"
build.bat tests
Finally, it is also possible to run tests by their name. For this purpose, the environment variable TEST_NAME allows to specify a substring to search for in the names of tests. Only those that contain the substring in their name are run by the following command:
TEST_NAME=test_evaluation ./build tests
TEST_NAME=test_evaluation ./build tests
$env:TEST_NAME = "test_evaluation"
build.bat tests
Overwriting Output Files¶
When using the build target tests_python, the environment variable OVERWRITE_OUTPUT_FILES may be utilized to overwrite the files in the directory python/tests/res/out/ with the actual output of the corresponding test cases:
OVERWRITE_OUTPUT_FILES=true ./build tests_python
OVERWRITE_OUTPUT_FILES=true ./build tests_python
$env:OVERWRITE_OUTPUT_FILES = "true"
build.bat tests_python
Code Style¶
We aim to enforce a consistent code style across the entire project. For this purpose, we employ the following tools.
C++ Source Files¶
For formatting the C++ code, we use clang-format. The desired C++ code style is defined in the file .clang-format.
In addition, cpplint is used for static code analysis. It is configured according to the file .cpplint.cfg.
Python Source Files¶
We use ruff for linting Python code and enforcing the code style defined in the file .ruff.toml.
For static type checking of Python code, we use mypy. It uses the configuration file .mypy.ini.
Cython Source files¶
isort is used to keep the ordering of imports Cython source files consistent according to the configuration file .isort.cfg
To automatically detect and remove unused variables and imports, as well as unnecessary
passstatements, in Cython code, we employ autoflake. The configuration of this tool can be found in the file .autoflake.toml.For linting Cython source files, we rely on cython-lint.
Configuration Files¶
We employ config-formatter for formatting
.cfgfiles.For applying a consistent style to Markdown files, including those used for writing the documentation, we use mdformat.
We apply yamlfix to YAML files to enforce the code style defined in the file .yamlfix.toml.
We use taplo for validating and formatting TOML files according to the configuration file .taplo.toml.
If you have modified the project’s source code, you can check whether it adheres to our coding standards via the following command:
./build test_format
./build test_format
build.bat test_format
Note
If you want to check for compliance with the C++, Python or Cython code style independently, you can use the build target test_format_cpp, test_format_python or test_format_cython instead of test_format. Using the build target test_format_cfg, test_format_md, test_format_yaml or test_format_toml results in the style of Markdown, YAML or TOML files to be checked, respectively.
In order to automatically format the project’s source files according to our style guidelines, the following command can be used:
./build format
./build format
build.bat format
Note
If you want to format only the C++ source files, you can specify the build target format_cpp instead of format. Accordingly, the targets format_python and format_cython may be used to format only the Python or Cython source files. If you want to format .cfg, Markdown, YAML or TOML files, you should use the target format_cfg, format_md, format_yaml or format_toml, respectively.
Whenever any source files have been modified, a Continuous Integration job is run automatically to verify if they adhere to our code style guidelines.
Versioning Scheme¶
We use Semantic Versioning to assign unique version numbers in the form MAJOR.MINOR.PATCH to the individual releases of our software packages. We refer to releases that come with an incremented major version, as major releases. When the minor version is increased by a release, we refer to it as a feature release. Updates that include bugfixes or minor improvements come with an increased patch version and are referred to as bugfix releases.
Tip
An overview of past releases, together with a description of the changes they introduced compared to the previous version, can be found in the release notes.
Bugfix Releases¶
Bugfix releases are limited to backward-compatible changes, such as bug fixes, performance optimizations, improvements to the build system, or updates of the documentation. They are neither allowed to introduce any compatibility-breaking changes to the command line API, nor to any of the programmatic APIs in the project’s Python or C++ code.
Feature Releases¶
Feature releases may come with changes that do not break compatibility with the command line API or programmatic APIs provided by previous versions. As a consequence, new functionalities can be added to the algorithms provided by this project, if they do not break existing functionality. In contrast, the removal of features is only allowed for major releases.
Feature releases with the major version 0 are not obliged to maintain API compatibility, because these releases are considered to represent an early stage of development, where things may change drastically from one version to another.
Major Releases¶
Increments of the major version indicate big leaps in the software’s development. They are reserved for new versions of the software that introduce new functionality, fundamentally change how the software works, or come with compatibility-breaking changes. In general, major releases are not guaranteed to be compatible with past releases in any way. In particular, they may introduce compatibility-breaking API changes, affecting the command line API or programmatic APIs in the project’s Python or C++ code. Moreover, models that have been trained using an older version are not guaranteed to work after updating to a new major release and must potentially be trained from scratch.
Release Process¶
To enable releasing new major, feature, or bugfix releases at any time, we maintain a branch for each type of release:
maincontains all changes that will be included in the next major release (including changes on the feature and bugfix branch).featurecomes with the changes that will be part of an upcoming feature release (including changes on the bugfix branch).bugfixis restricted to minor changes that will be published as a bugfix release.
We do not allow directly pushing to the above branches. Instead, all changes must be submitted via pull requests and require certain checks to pass.
Downstream Merges¶
Once modifications to one of the branches have been merged, Continuous Integration jobs are used to automatically update downstream branches via pull requests. If all checks for such pull requests are successful, they are merged automatically. If there are any merge conflicts, they must be resolved manually. Following this procedure, changes to the feature branch are merged into the main branch, whereas changes to the bugfix branch are first merged into the feature branch and then into the main branch (see description of merge_feature.yml and merge_bugfix.yml in Automated Releases).
Triggering Releases¶
We use a Continuous Integration job for triggering a new release, including the changes of one of the branches mentioned above (see description of release.yml in Automated Releases). Depending on the release branch, the job automatically collects the corresponding changelog entries from the files changelog-main.md, changelog-feature.md, and changelog-bugfix.md and updates the file CHANGELOG.md in the project’s root directory accordingly. Afterward, it will publish the new release on GitHub, which will in turn trigger the publishing of pre-built packages (see description of publish.yml in Publishing Packages).
Upstream Merges¶
Whenever a new release has been published, the release branch is merged into the upstream branches (see description of merge_release.yml in Automated Releases), i.e., major releases result in the feature and bugfix branches being updated, whereas minor releases result in the bugfix branch being updated. The version of the release branch and the affected branches are updated accordingly. The file version in the project’s root directory specifies the version of each of these branches. Similarly, the file version-dev keeps track of the version number used for development releases (see description of publish_development.yml in Publishing Packages).
Dependencies¶
Adding dependencies to a software project always comes at a cost. Maintainers need to continuously test their software as new versions of dependencies are released and major changes in their APIs may break existing functionality. For this reason, we try to keep the number of dependencies at a minimum.
That being said, we still rely on several dependencies for Continuous Integration, compiling our source code, generating the documentation, or running the algorithms provided by this project. When using pre-built packages from PyPI, there is no need to care about these dependencies, as they are already included in the packages. When building from source, dependencies are automatically installed by the build system once they are needed, unless explicitly stated in the documentation.
Supported Python Versions¶
The packages provided by this project are built from source code that must be compiled for specific Python versions. As a consequence, they do not work with arbitrary Python releases, but can only be used with the versions they have been built for. The supported Python versions are stored in the file version-python and are regularly updated as new releases of Python are published. The following command provided by the project’s build system updates the supported versions in the file version-python to include the latest Python release:
./build update_python_version
./build update_python_version
build.bat update_python_version
Python Dependencies¶
Python dependencies that are required by different aspects of the project, such as the build system, the documentation, or our own Python code, are defined in separate requirements.txt and pyproject.template.toml files. For dependencies that use Semantic Versioning, we specify the earliest and latest version we support. For other dependencies, we demand for a specific version number. This strives to achieve a balance between flexibility for users and comfort for developers. On the one hand, supporting a range of versions provides more freedom to users, as our packages can more flexibly be used together with other ones, relying on the same dependencies. On the other hand, the project’s maintainers must not manually update dependencies that have a minor release, while still requiring manual intervention for major updates.
To ease the life of developers, the following command provided by the project’s build system may be used to update the versions of outdated dependencies:
./build update_dependencies
./build update_dependencies
build.bat update_dependencies
Note
If you want to restrict the above commands to the build-time dependencies, required by the project’s build system, or the runtime dependencies, required for running its algorithms, you can use the targets update_build_dependencies and update_runtime_dependencies instead.
GitHub Actions¶
Our Continuous Integration (CI) jobs heavily rely on so-called Actions, which are reusable building blocks provided by third-party developers. As with all dependencies, updates to these Actions may introduce breaking changes. To reduce the risk of updates breaking our CI jobs, we pin the Actions to a certain version. Usually, we only restrict the major version required by a job, rather than specifying a specific version. This allows minor updates, which are less likely to cause problems, to take effect without manual intervention.
The project’s build system allows to automatically check for outdated Actions used by the project’s CI jobs. The following command may be used to update the versions of outdated Actions automatically:
./build update_github_actions
./build update_github_actions
build.bat update_github_actions
Note
The above commands query the GitHub API for the latest version of relevant GitHub Actions. You can optionally specify an API token to be used for these queries via the environment variable GITHUB_TOKEN. If no token is provided, repeated requests might fail due to GitHub’s rate limit.
GitHub Runners¶
For running Continuous Integration (CI) jobs, we use runners hosted by GitHub. Runners are available for different operating systems and architectures, which is particularly relevant when building packages for the various target platforms we support. To avoid breaking the build process when GitHub updates its runners, we specify the exact version required by a particular CI job.
Our build system provides the following command to update the versions of outdated runners automatically:
./build update_github_runners
./build update_github_runners
build.bat update_github_runners