Skip to content

Commit

Permalink
feat: add extras mechanism to finer-grained dependency selection
Browse files Browse the repository at this point in the history
Inspired by: #36 (comment).

This commit includes the extra-dependencies mechanism of setuptools to overcome limitations specific to certain dependencies
(e.g. no support for some Python interpreter versions).

The changes use the following conventions for extras names:

- `[all]`: install all dependencies from all extras
- `[X-sampler]`: install all dependencies to make X sampler to work
- `[X-loss]`: install all dependencies to make X loss function to work.

We do not have yet an example for the last item for the moment; but for "forward-compatibility" of the nomenclature, we leave the -sampler suffix.

E.g. for GPy, we could have the extra called gp-sampler, that installs GPy on-demand, and not installed if not needed by the user.

This commit also includes a mechanism to handle import errors for the non-installed dependencies for some component.
Such mechanism provides a useful message to the user, e.g. it raises an exception with a useful error message
pointing out to the missing extra in its local installation of black-it.
  • Loading branch information
marcofavorito committed Mar 16, 2023
1 parent 3e09454 commit d954d9d
Show file tree
Hide file tree
Showing 10 changed files with 230 additions and 9 deletions.
21 changes: 18 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,19 +38,34 @@ matter of days, with no need to reimplement all the plumbings from scratch.

This project requires Python v3.8 or later.

To install the latest version of the package from [PyPI](https://pypi.org/project/black-it/):
To install the latest version of the package from [PyPI](https://pypi.org/project/black-it/), with all the extra dependencies (recommended):
```
pip install black-it
pip install "black-it[all]"
```

Or, directly from GitHub:

```
pip install git+https://github.com/bancaditalia/black-it.git#egg=black-it
pip install git+https://github.com/bancaditalia/black-it.git#egg="black-it[all]"
```

If you'd like to contribute to the package, please read the [CONTRIBUTING.md](./CONTRIBUTING.md) guide.

### Feature-specific Package Dependencies

We use the [optional dependencies mechanism of `setuptools`](https://setuptools.pypa.io/en/latest/userguide/dependency_management.html#optional-dependencies)
(also called _extras_) to allow users to avoid dependencies for features they don't use.

For the basic features of the package, you can install the `black-it` package without extras, e.g. `pip install black-it`.
However, for certain components, you will need to install some more extras using the syntax `pip install black-it[extra-1,extra-2,...]`.

For example, the [Gaussian Process Sampler](https://bancaditalia.github.io/black-it/samplers/#black_it.samplers.gaussian_process.GaussianProcessSampler)
depends on the Python package [`GPy`](https://github.com/SheffieldML/GPy/).
If the Gaussian Process sampler is not needed by your application, you can avoid its installation by just installing `black-it` as explained above.
However, if you need the sampler, you must install `black-it` with the `gp-sampler` extra: `pip install black-it[gp-sampler]`.

The special extra `all` will install all the dependencies.

## Quick Example

The GitHub repo of Black-it contains a series ready-to-run calibration examples.
Expand Down
112 changes: 112 additions & 0 deletions black_it/_load_dependency.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
# Black-box ABM Calibration Kit (Black-it)
# Copyright (C) 2021-2023 Banca d'Italia
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as
# published by the Free Software Foundation, either version 3 of the
# License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program. If not, see <http://www.gnu.org/licenses/>.

"""
Python module to handle extras dependencies loading and import errors.
This is a private module of the library. There should be no point in using it directly from client code.
"""

import sys
from typing import Optional

# known extras and their dependencies
_GPY_PACKAGE_NAME = "GPy"
_GP_SAMPLER_EXTRA_NAME = "gp-sampler"

_XGBOOST_PACKAGE_NAME = "xgboost"
_XGBOOST_SAMPLER_EXTRA_NAME = "xgboost-sampler"


class DependencyNotInstalled(Exception):
"""Library exception for when a required dependency is not installed."""

def __init__(self, component_name: str, package_name: str, extra_name: str) -> None:
"""Initialize the exception object."""
message = (
f"Cannot import package '{package_name}', required by component {component_name}. "
f"To solve the issue, you can install the extra '{extra_name}': pip install black-it[{extra_name}]"
)
super().__init__(message)


class GPyNotSupportedOnPy311Exception(Exception):
"""Specific exception class for import error of GPy on Python 3.11."""

__ERROR_MSG = (
f"The GaussianProcessSampler depends on '{_GPY_PACKAGE_NAME}', which is not supported on Python 3.11; "
f"see https://github.com/bancaditalia/black-it/issues/36"
)

def __init__(self) -> None:
"""Initialize the exception object."""
super().__init__(self.__ERROR_MSG)


def _check_import_error_else_raise_exception(
import_error: Optional[ImportError],
component_name: str,
package_name: str,
black_it_extra_name: str,
) -> None:
"""
Check an import error; raise the DependencyNotInstalled exception with a useful message.
Args:
import_error: the ImportError object generated by the failed attempt. If None, then no error occurred.
component_name: the component for which the dependency is needed
package_name: the Python package name of the dependency
black_it_extra_name: the name of the black-it extra to install to solve the issue.
"""
if import_error is None:
# nothing to do.
return

# an import error happened; we need to raise error to the caller
raise DependencyNotInstalled(component_name, package_name, black_it_extra_name)


def _check_gpy_import_error_else_raise_exception(
import_error: Optional[ImportError],
component_name: str,
package_name: str,
black_it_extra_name: str,
) -> None:
"""
Check GPy import error and if an error occurred, raise erorr with a useful error message.
We need to handle two cases:
- the user is using Python 3.11: the GPy package cannot be installed there;
see https://github.com/SheffieldML/GPy/issues/998
- the user did not install the 'gp-sampler' extra.
Args:
import_error: the ImportError object generated by the failed attempt. If None, then no error occurred.
component_name: the component for which the dependency is needed
package_name: the Python package name of the dependency
black_it_extra_name: the name of the black-it extra to install to solve the issue.
"""
if import_error is None:
# nothing to do.
return

if sys.version_info == (3, 11):
raise GPyNotSupportedOnPy311Exception()

_check_import_error_else_raise_exception(
import_error, component_name, package_name, black_it_extra_name
)
25 changes: 23 additions & 2 deletions black_it/samplers/gaussian_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,14 +20,26 @@
from enum import Enum
from typing import Optional, Tuple, cast

import GPy
import numpy as np
from GPy.models import GPRegression
from numpy.typing import NDArray
from scipy.special import erfc # pylint: disable=no-name-in-module

from black_it._load_dependency import (
_GP_SAMPLER_EXTRA_NAME,
_GPY_PACKAGE_NAME,
_check_import_error_else_raise_exception,
)
from black_it.samplers.surrogate import MLSurrogateSampler

_GPY_IMPORT_ERROR: Optional[ImportError]
try:
import GPy
from GPy.models import GPRegression
except ImportError as e:
_GPY_IMPORT_ERROR = e
else:
_GPY_IMPORT_ERROR = None


class _AcquisitionTypes(Enum):
"""Enumeration of allowed acquisition types."""
Expand Down Expand Up @@ -71,6 +83,8 @@ def __init__( # pylint: disable=too-many-arguments
optimize_restarts: number of independent random trials of the optimization of the GP hyperparameters
acquisition: type of acquisition function, it can be 'expected_improvement' of simply 'mean'
"""
self.__check_gpy_import_error()

self._validate_acquisition(acquisition)

super().__init__(
Expand All @@ -81,6 +95,13 @@ def __init__( # pylint: disable=too-many-arguments
self.acquisition = acquisition
self._gpmodel: Optional[GPRegression] = None

@classmethod
def __check_gpy_import_error(cls) -> None:
"""Check if an import error happened while attempting to import the 'GPy' package."""
_check_import_error_else_raise_exception(
_GPY_IMPORT_ERROR, cls.__name__, _GPY_PACKAGE_NAME, _GP_SAMPLER_EXTRA_NAME
)

@staticmethod
def _validate_acquisition(acquisition: str) -> None:
"""
Expand Down
25 changes: 24 additions & 1 deletion black_it/samplers/xgboost.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,27 @@
from typing import Optional, cast

import numpy as np
import xgboost as xgb
from numpy.typing import NDArray

from black_it._load_dependency import (
_XGBOOST_PACKAGE_NAME,
_XGBOOST_SAMPLER_EXTRA_NAME,
_check_import_error_else_raise_exception,
)
from black_it.samplers.surrogate import MLSurrogateSampler

MAX_FLOAT32 = np.finfo(np.float32).max
MIN_FLOAT32 = np.finfo(np.float32).min
EPS_FLOAT32 = np.finfo(np.float32).eps

_XGBOOST_IMPORT_ERROR: Optional[ImportError]
try:
import xgboost as xgb
except ImportError as e:
_XGBOOST_IMPORT_ERROR = e
else:
_XGBOOST_IMPORT_ERROR = None


class XGBoostSampler(MLSurrogateSampler):
"""This class implements xgboost sampling."""
Expand Down Expand Up @@ -64,6 +76,7 @@ def __init__( # pylint: disable=too-many-arguments
References:
Lamperti, Roventini, and Sani, "Agent-based model calibration using machine learning surrogates"
"""
self.__check_xgboost_import_error()
super().__init__(
batch_size, random_state, max_deduplication_passes, candidate_pool_size
)
Expand All @@ -75,6 +88,16 @@ def __init__( # pylint: disable=too-many-arguments
self._n_estimators = n_estimators
self._xg_regressor: Optional[xgb.XGBRegressor] = None

@classmethod
def __check_xgboost_import_error(cls) -> None:
"""Check if an import error happened while attempting to import the 'xgboost' package."""
_check_import_error_else_raise_exception(
_XGBOOST_IMPORT_ERROR,
cls.__name__,
_XGBOOST_PACKAGE_NAME,
_XGBOOST_SAMPLER_EXTRA_NAME,
)

@property
def colsample_bytree(self) -> float:
"""Get the colsample_bytree parameter."""
Expand Down
7 changes: 5 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,6 @@ classifiers = [

[tool.poetry.dependencies]
python = ">=3.8,<3.11"
GPy = "^1.10.0"
ipywidgets = "^7.7.0"
matplotlib = "^3.5.2"
numpy = ">=1.23.3,<1.24.0"
Expand All @@ -43,7 +42,6 @@ scikit-learn = "^1.1.0"
seaborn = "^0.11.2"
statsmodels = "^0.13.2"
tables = "^3.7.0"
xgboost = "^1.7.2"

[tool.poetry.dev-dependencies]
bandit = "^1.7.4"
Expand Down Expand Up @@ -84,6 +82,11 @@ tox = "^3.25.0"
twine = "^4.0.0"
vulture = "^2.3"

[project.optional-dependencies]
gp-sampler = ["GPy~=1.10.0"]
xgboost-sampler = ["xgboost~=1.7.2"]
all = ["GPy~=1.10.0", "xgboost~=1.7.2"]

[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
Expand Down
4 changes: 4 additions & 0 deletions tests/test_calibrator.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,12 @@
from black_it.search_space import SearchSpace

from .fixtures.test_models import NormalMV # type: ignore
from .utils.base import no_gpy_installed, no_python311_for_gpy, no_xgboost_installed


@no_python311_for_gpy
@no_gpy_installed
@no_xgboost_installed
class TestCalibrate: # pylint: disable=too-many-instance-attributes,attribute-defined-outside-init
"""Test the Calibrator.calibrate method."""

Expand Down
3 changes: 3 additions & 0 deletions tests/test_samplers/test_gaussian_process.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@

from black_it.samplers.gaussian_process import GaussianProcessSampler, _AcquisitionTypes
from black_it.search_space import SearchSpace
from tests.utils.base import no_gpy_installed, no_python311_for_gpy

pytestmark = [no_python311_for_gpy, no_gpy_installed]


class TestGaussianProcess2D: # pylint: disable=attribute-defined-outside-init
Expand Down
4 changes: 4 additions & 0 deletions tests/test_samplers/test_xgboost.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,10 @@
from black_it.search_space import SearchSpace

from ..fixtures.test_models import BH4 # type: ignore
from ..utils.base import no_xgboost_installed

pytestmark = no_xgboost_installed


expected_params = np.array([[0.24, 0.26], [0.26, 0.02], [0.08, 0.24], [0.15, 0.15]])

Expand Down
34 changes: 33 additions & 1 deletion tests/utils/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,19 @@

"""Generic utility functions."""
import dataclasses
import importlib
import shutil
import signal
import subprocess # nosec B404
import sys
import types
from functools import wraps
from typing import Callable, List, Type, Union
from typing import Any, Callable, List, Optional, Type, Union

import pytest
from _pytest.mark.structures import MarkDecorator

from black_it._load_dependency import _GPY_PACKAGE_NAME, _XGBOOST_PACKAGE_NAME
from tests.conftest import DEFAULT_SUBPROCESS_TIMEOUT


Expand Down Expand Up @@ -170,3 +175,30 @@ def wrapper(*args, **kwargs): # type: ignore
return wrapper

return decorator


def try_import_else_none(module_name: str) -> Optional[types.ModuleType]:
"""Try to import a module; if it fails, return None."""
try:
return importlib.import_module(module_name)
except ImportError:
return None


def try_import_else_skip(package_name: str, **skipif_kwargs: Any) -> MarkDecorator:
"""Try to import the package; else skip the test(s)."""
return pytest.mark.skipif(
try_import_else_none(package_name) is None,
reason=f"Cannot run the test because the package '{package_name}' is not installed",
**skipif_kwargs,
)


no_python311_for_gpy = pytest.mark.skipif(
(3, 11) <= sys.version_info < (3, 12),
reason="GPy not supported on Python 3.11, see: https://github.com/bancaditalia/black-it/issues/36",
)


no_gpy_installed = try_import_else_skip(_GPY_PACKAGE_NAME)
no_xgboost_installed = try_import_else_skip(_XGBOOST_PACKAGE_NAME)
4 changes: 4 additions & 0 deletions tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ basepython = python3
[testenv]
setenv =
PYTHONPATH = {toxinidir}
extras =
all
gp-sampler
xgboost-sampler
deps =
pytest>=7.1.2,<7.2.0
pytest-cov>=3.0.0,<3.1.0
Expand Down

0 comments on commit d954d9d

Please sign in to comment.