Skip to content

Commit

Permalink
Split the two proposals and add comments from the call.
Browse files Browse the repository at this point in the history
  • Loading branch information
janosg committed Jul 9, 2024
1 parent dfab0a6 commit cd80345
Show file tree
Hide file tree
Showing 3 changed files with 175 additions and 109 deletions.
168 changes: 59 additions & 109 deletions docs/source/development/eep-02-typing.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
+------------+------------------------------------------------------------------+
| Author | `Janos Gabler <https://github.com/janosg>`_ |
+------------+------------------------------------------------------------------+
| Status | Draft |
| Status | Accepted |
+------------+------------------------------------------------------------------+
| Type | Standards Track |
+------------+------------------------------------------------------------------+
Expand Down Expand Up @@ -36,12 +36,6 @@ which are dictionaries with a fixed set of required keys.
This enhancement proposal outlines how we can accommodate the changes needed to reap the
benefits of static typing without breaking users' code in too many places.

A few deprecations and breaking changes will, however, be unavoidable. Since we are
already interrupting users, we can use this deprecation cycle as a chance to better
align some names in estimagic with SciPy and other optimization libraries. This will
make it even easier for scipy users to switch to estimagic. These changes will be marked
as independent of the core proposal and summarized in [aligning names](aligning-names).

## Motivation and resources

- [Writing Python like it's Rust](https://kobzol.github.io/rust/python/2023/05/20/writing-python-like-its-rust.html).
Expand Down Expand Up @@ -115,8 +109,7 @@ i.e. `maximize`, `minimize`, `slice_plot`, `criterion_plot`, `params_plot`,

#### Current situation

The objective or criterion function is the function being optimized. Currently, it is
called `criterion` in estimagic.
The objective or criterion function is the function being optimized.

The same criterion function can work for scalar, least-squares and likelihood
optimizers. Moreover, a criterion function can return additional data that is stored in
Expand Down Expand Up @@ -194,9 +187,6 @@ and therefore omitted here.
`root_contributions`, `contributions` and `value` even though any of them could be
constructed out of the `root_contributions`. This redundancy of information means that
we need to check the consistency of all user provided function outputs.
- What we call `criterion` is called `fun` in `scipy.optimize.minimize`. Therefore,
users that come from SciPy need to change their code instead of just importing
`minimize` from a different package.

#### Proposal

Expand Down Expand Up @@ -268,12 +258,6 @@ def least_squares_sphere(params):
And analogous for scalar and likelihood functions, where again the `mark.scalar`
decorator is optional.

##### Renaming `criterion` to `fun`

On top of the described changes, we suggest to rename `criterion` to `fun` to align the
naming with `scipy.optimize`. Similarly, `criterion_kwargs` should be renamed to
`fun_kwargs`.

##### Optionally replace decorators by type hints

The purpose of the decorators is to tell us the output type of the criterion function.
Expand Down Expand Up @@ -775,51 +759,20 @@ Python variable names.
user wants to try out a grid of values for one tuning parameter while keeping all
other options constant.

**Secondary problems**

The following problems are not related to the specific goals of this enhancement
proposal but it might be a good idea to address them in the same deprecation cycle.

- In an effort to make everything very descriptive, some names got too long. For example
`"convergence.absolute_gradient_tolerance"` is very long but most people are so
familiar with reading `"gtol_abs"` (from SciPy and NLopt) that
`"convergence.gtol_abs"` would be a better name.
- It might have been a bad idea to harmonize default values for similar options that
appear in multiple optimizers. Sometimes, the options, while similar in spirit, are
defined slightly differently and usually algorithm developers will set all tuning
parameters to maximize performance on a benchmark set they care about. If we change
how options are handled in estimagic, we should consider just harmonizing names and
not default values.

#### Proposal

In total, we want to offer four entry points for the configuration of optimizers:

1. Instead of passing an `Algorithm` class (as described in
[Algorithm Selection](algorithm-selection)) the user can create an instance of their
selected algorithm. When creating the instance, they have autocompletion for all
options supported by the selected algorithm. `Algorithm`s are immutable.
1. Given an instance of an `Algorithm`, a user can easily create a modified copy of that
instance by using the `with_option` method.
1. We can provide additional methods `with_stopping` and `with_convergence` that call
`with_option` internally but provide two additional features:
1. They validate that the option is indeed a stopping/convergence criterion.
1. They allow to omit the `convergence_` or `stopping_` at the beginning of the
option name and can thus reduce repetition in the option names.
1. As before, the user can pass a global set of options to `maximize` or `minimize`. We
continue to support option dictionaries but also allow `AlgorithmOption` objects that
enable better autocomplete and immutability. We can construct them using a similar
pre-commit hook approach as discussed in [algorithm selection](algorithm-selection).
Global options override the options that were directly passed to an optimizer. For
consistency, `AlgorithmOptions` can offer the `with_stopping`, `with_convergence` and
`with_option` copy-constructors, so users can modify options safely. Probably, this
approach should be featured less prominently in the documentation as it offers no
guarantees that the specified options are compatible with the selected algorithm.
We want to offer multiple entry points for passing additional options to algorithms.
Users can pick the one that works best for their particular use-case. The current
solution remains valid but not recommended.

The previous example continues to work. Examples of the new possibilities are:
##### Configured algorithms

Instead of passing an `Algorithm` class (as described in
[Algorithm Selection](algorithm-selection)) the user can create an instance of their
selected algorithm. When creating the instance, they have autocompletion for all options
supported by the selected algorithm. `Algorithm`s are immutable.

```python
# configured algorithm
algo = em.algorithms.scipy_lbfgsb(
stopping_max_iterations=1000,
stopping_max_criterion_evaluations=1500,
Expand All @@ -832,6 +785,32 @@ minimize(
)
```

##### Copy constructors on algorithms

Given an instance of an `Algorithm`, a user can easily create a modified copy of that
instance by using the `with_option` method.

```python
# using copy constructors to create variants
base_algo = em.algorithms.fides(stopping_max_iterations=1000)
algorithms = [base_algo.with_option(initial_radius=r) for r in [0.1, 0.2, 0.5]]

for algo in algorithms:
minimize(
# ...
algorithm=algo,
# ...
)
```

We can provide additional methods `with_stopping` and `with_convergence` that call
`with_option` internally but provide two additional features:

1. They validate that the option is indeed a stopping/convergence criterion.
1. They allow to omit the `convergence_` or `stopping_` at the beginning of the option
name and can thus reduce repetition in the option names. This recreates the
namespaces we currently achieve with the dot notation:

```python
# using copy constructors for better namespaces
algo = (
Expand All @@ -852,21 +831,21 @@ minimize(
)
```

```python
# using copy constructors to create variants
base_algo = em.algorithms.fides(stopping_max_iterations=1000)
algorithms = [base_algo.with_option(initial_radius=r) for r in [0.1, 0.2, 0.5]]
##### Global option object

for algo in algorithms:
minimize(
# ...
algorithm=algo,
# ...
)
```
As before, the user can pass a global set of options to `maximize` or `minimize`. We
continue to support option dictionaries but also allow `AlgorithmOption` objects that
enable better autocomplete and immutability. We can construct them using a similar
pre-commit hook approach as discussed in [algorithm selection](algorithm-selection).
Global options override the options that were directly passed to an optimizer. For
consistency, `AlgorithmOptions` can offer the `with_stopping`, `with_convergence` and
`with_option` copy-constructors, so users can modify options safely. Probably, this
approach should be featured less prominently in the documentation as it offers no
guarantees that the specified options are compatible with the selected algorithm.

The previous example continues to work. Examples of the new possibilities are:

```python
# option object
options = em.AlgorithmOptions(
stopping_max_iterations=1000,
stopping_max_criterion_evaluations=1500,
Expand Down Expand Up @@ -938,7 +917,10 @@ returns a tuple of the criterion value and the derivative instead.

#### Proposal

We keep the three arguments but rename them to `fun`, `jac` and `fun_and_jac`.
```{note}
The following section uses the new names `fun`, `jac` and `fun_and_jac` instead of
`criterion`, `derivative` and `criterion_and_derivative`.
```

To improve the integration with modern automatic differentiation frameworks, `jac` or
`fun_and_jac` can also be a string `"jax"` or a more autocomplete friendly enum
Expand Down Expand Up @@ -1293,6 +1275,11 @@ also prepares adding a public `minimize` method that internally calls the
To make things more concrete, here are prototypes for components related to the
`InternalProblem` and `InternalOptimizeResult`.

```{note}
The names of the internal problem are already aligned with the new names for
the objective function and its derivatives.
```

```python
from numpy.typing import NDArray
from dataclasses import dataclass
Expand Down Expand Up @@ -1707,41 +1694,6 @@ does it.

The general structure of the documentation is not affected by this enhancement proposal.

(aligning-names)=

## Aligning names

### Suggested changes

| **Old Name** | **Proposed Name** | **Source** |
| ------------------------------------------ | ------------------------- | ---------- |
| `criterion` | `fun` | scipy |
| `criterion_kwargs` | `fun_kwargs` | |
| `derivative` | `jac` | scipy |
| `derivative_kwargs` | `jac_kwargs` | |
| `criterion_and_derivative` | `fun_and_jac` | |
| `criterion_and_derivative_kwargs` | `fun_and_jac_kwargs` | |
| `stopping_max_criterion_evaluations` | `stopping_maxfun` | scipy |
| `stopping_max_iterations` | `stopping_maxiter` | scipy |
| `convergence_absolute_criterion_tolerance` | `convergence_ftol_abs` | NlOpt |
| `convergence_relative_criterion_tolerance` | `convergence_ftol_rel` | NlOpt |
| `convergence_absolute_params_tolerance` | `convergence_xtol_abs` | NlOpt |
| `convergence_relative_params_tolerance` | `convergence_xtol_rel` | NlOpt |
| `convergence_absolute_gradient_tolerance` | `convergence_gtol_abs` | NlOpt |
| `convergence_relative_gradient_tolerance` | `convergence_gtol_rel` | NlOpt |
| `convergence_scaled_gradient_tolerance` | `convergence_gtol_scaled` | |

### Things we do not want to align

- We do not want to rename `algorithm` to `method` because our algorithm names are
different from scipy, so people who switch over from scipy need to adjust their code
anyways.
- We do not want to rename `algo_options` to `options` for the same reason.
- We do not want to rename `params` to `x0`. It would align estimagic more with scipy,
but `params` is just easier to pronounce and use as a word than `x0`. Moreover, `x0`
has the strong connotation that this is a vector and not a nested dictionary or other
pytree.

## Summary of breaking changes

- The internal algorithm interface changes completely without deprecations
Expand All @@ -1759,13 +1711,11 @@ The following deprecations become active in version `0.5.0`. The functionality w
removed in version `0.6.0` which should be scheduled for approximately half a year after
the realease of `0.5.0`.

- Multiple arguments of `minimize` and multiple `algo_options` get renamed to align them
better with SciPy and/or NlOpt. See [Aligning Names](aligning-names) for details.
- Returning a `dict` in the objective function io deprecated. Return `FunctionValue`
instead. In addition, likelihood and least-squares problems need to be decorated with
`em.mark.likelihood` and `em.mark_least_squares`.
- The arguments `lower_bounds`, `upper_bounds`, `soft_lower_bounds` and
`soft_uppper_bounds` are deprecated. Use `bounds` instead. `bounds` can be
`soft_upper_bounds` are deprecated. Use `bounds` instead. `bounds` can be
`estimagic.Bounds` or `scipy.optimize.Bounds` objects.
- Specifying constraints with dictionaries is deprecated. Use the corresponding subclass
of `em.constraints.Constraint` instead. In addition, all selection methods except for
Expand Down
115 changes: 115 additions & 0 deletions docs/source/development/eep-03-alignment.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
(eepalignment)=

# EEP-03: Alignment with SciPy

```{eval-rst}
+------------+------------------------------------------------------------------+
| Author | `Janos Gabler <https://github.com/janosg>`_ |
+------------+------------------------------------------------------------------+
| Status | Accepted |
+------------+------------------------------------------------------------------+
| Type | Standards Track |
+------------+------------------------------------------------------------------+
| Created | 2024-07-09 |
+------------+------------------------------------------------------------------+
| Resolution | |
+------------+------------------------------------------------------------------+
```

## Abstract

This enhancement proposal explains how we will better align estimagic with
`scipy.minimize`. Scipy is the most widely used optimizer library in Python and most of
our new users are switching over from SciPy.

The goal is therefore simple: Make it as easy as possible for SciPy users to use
estimagic. In most cases this means that the only thing that has to be changed is the
import statement for the `minimize` function:

```python
# from scipy.optimize import minimize
from estimagic import minimize
```

## Design goals

- If we can make code written for SciPy run with estimagic, we should do so
- If we cannot make it run, the user should get a helpful error message that explains
how the code needs to be adjusted.

## Aligning names

| **Old Name** | **Proposed Name** | **Source** |
| ------------------------------------------ | ------------------------- | ---------- |
| `criterion` | `fun` | scipy |
| `criterion_kwargs` | `fun_kwargs` | |
| `params` | `x0` | |
| `derivative` | `jac` | scipy |
| `derivative_kwargs` | `jac_kwargs` | |
| `criterion_and_derivative` | `fun_and_jac` | |
| `criterion_and_derivative_kwargs` | `fun_and_jac_kwargs` | |
| `stopping_max_criterion_evaluations` | `stopping_maxfun` | scipy |
| `stopping_max_iterations` | `stopping_maxiter` | scipy |
| `convergence_absolute_criterion_tolerance` | `convergence_ftol_abs` | NlOpt |
| `convergence_relative_criterion_tolerance` | `convergence_ftol_rel` | NlOpt |
| `convergence_absolute_params_tolerance` | `convergence_xtol_abs` | NlOpt |
| `convergence_relative_params_tolerance` | `convergence_xtol_rel` | NlOpt |
| `convergence_absolute_gradient_tolerance` | `convergence_gtol_abs` | NlOpt |
| `convergence_relative_gradient_tolerance` | `convergence_gtol_rel` | NlOpt |
| `convergence_scaled_gradient_tolerance` | `convergence_gtol_scaled` | |

## Names we do not want to align

- We do not want to rename `algorithm` to `method` because our algorithm names are
different from SciPy, so people who switch over from SciPy need to adjust their code
anyways.
- We do not want to rename `algo_options` to `options` for the same reason.

Instead we can provide aliases for those.

## Additional aliases

To make it even easier for SciPy users to switch to estimagic, we can provide additional
aliases in `minimize` and `maximize` that let them used their SciPy code without changes
or help to adjust it by showing good error messages. The following arguments are
relevant:

- `method`: In SciPy this is used instead of `algorithm` to select the optimization
algorithm. We opted against simply renaming `algorithm` to `method` because our naming
scheme of algorithms is (and has to be) different from SciPy. By using `method`
instead of `algorithm`, users could select SciPy algorithms by their SciPy name. If
`method` and `algorithm` are both provided, they would get an error.
- `tol`: We do not want to support one `tol` argument for all kinds of different
convergence criteria but could raise an error for people who use it and point them to
the relevant parts of our documentation.
- `args`: we can support `args` as an alternative to `fun_kwargs`
- `options`: This is the SciPy counterpart to our `algo_options`. We do not want to
support this as our option names are different but we can provide a good error message
with pointers to our documentation if someone uses it.
- `hess` and `hessp`: Currently we don't support closed form hessians. If we support
them they will be called `hess`. In the meantime, this can raise a
`NotImplementedError`.
- `callback`: Currently we do not support `callback`s. If we support them they will be
called `callback` and be as compatible with SciPy as possible. In the meantime we can
raise a `NotImplementedError`.
- If a user sets `jac=True` we raise and error and explain how to use `fun_and_jac`
instead.

## Aligning default values

Currently we try to align default values for convergence criteria and other algorithm
options across algorithms and even across optimizer packages. This means that sometimes
algorithms that are used via estimagic produce different results than the same algorithm
used via SciPy or other packages.

Moreover, it is possible that we deviate from algorithm options that the original
authors carefully picked because they maximize performance on a relevant benchmark set.

I therefore propose that in the future we do not try to align algorithm options across
algorithms and packages.

## Implementation

All renamings are done with a careful deprecation cycle. The deprecations become active
in version `0.5.0`. Old names will be removed in version `0.6.0` which should be
scheduled for approximately half a year after the release of `0.5.0`.
1 change: 1 addition & 0 deletions docs/source/development/eeps.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,5 @@ maxdepth: 1
eep-00-governance-model.md
eep-01-pytrees.md
eep-02-typing.md
eep-03-alignment.md
```

0 comments on commit cd80345

Please sign in to comment.