Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init the refactoring #143

Merged
merged 101 commits into from
Apr 11, 2024
Merged

Init the refactoring #143

merged 101 commits into from
Apr 11, 2024

Conversation

nhuet
Copy link
Contributor

@nhuet nhuet commented Mar 14, 2024

Full refactoring of the library

The main goal is to ease the introduction of custom layer by users, and thus the implementation of new decomon layers.

Main features:

  • backward and forward layers are merged into a single DecomonLayer
  • To implement a new decomon layer, one has only to implement
    • linear (or affine) case: get_affine_representation() -> w,b so that layer(x) can be written as x * w + b
    • generic case:
      • forward_ibp_propagate(): propagation of constant bounds through the layer
      • get_affine_relaxation() -> w_l, b_l, w_u, b_u such that x* w_l + b_l <= layer(x) <= x* w_u + b_u
        See the tutorial tutorials/z_Advanced/custom_layer.ipynb for more details.
  • a keras model can now have a multidimensional input (before needed to be flattened), but is supposed to have a single input (but potentially several outputs)
  • a keras model can include submodels which must also have single input but also single output. The backward propagation is now managed so that this does not decrease the tightness of the bound as the resulting model will be unwrapped as if a single model.
  • The affine bounds can have several representation:
    • diagonal: the weights w are represented (batch by batch) by their "diagonal" (potentially multid) so that, for a given batch,
      w_generic = reshape(diag(ravel(w_diag)), w_diag.shape + w_diag.shape)
      In that case, w.shape == b.shape
    • from_linear: this means that the affine bounds come from a linear layer/model and thus
      • they are independent of the batch => no batch axis
      • upper bound == lower bound
    • a mix of both
    • generic: batchsize + full shape
  • a dedicated function batch_multi_dot manage the product x * w. This is a batch per batch dot product of tensors allowing x to be multi-dimensional. This function is responsible for managing all above mentioned representations for w
    • if x is flattened and w generic, this is the keras batch_dot() or Dot layer call
    • if w is from a linear layer, this is keras tensordot()
    • if w is diagonal and from a linear layer, this is a mere product element by element.
    • the same function is also used to merge affine bounds and "multiply" weights tensors between them
  • ForwardMode is replaced by booleans ibp and affine (to unify the existing mix)
  • All operations needed to create inputs, merge bounds, convert format are implemented as dedicated layers to keep a clean graph for decomon models =>Fuse, ReduceCrownBounds, ConvertOutput, ForwardInput, BackwardInput, DecomonOracle, ...
  • forward layers do not pass perturbation domain input in output anymore, and they take it in input only if in affine mode:
    • if affine and ibp bounds computed => to get tightest bound between ibp and affine turned into constant bounds
    • if affine for a non-linear layer => to get the forward oracle (constant bounds on kerasinputs) from previous layer
  • backward layers do not take directly forward layer output as input but rather:
    • the previous backward layers outputs (bounds to propagate)
      • oracle bounds if needed (i.e. constant bounds on keras layer input).
        These oracle bounds come from dedicated DecomonOracle bounds that convert the output of a forward layer from a first forward conversion, or the affine bounds from a subcrown launch starting from the layer of interest. In the future, other oracle could be used to inject external information on keras layers inputs bounds.
  • decomon layers inputs and outputs are flat (to be consistent with decomon models which follow keras model api) but then internal methods use the split inputs/outputs. This conversions are managed by InputsOutputsSpecs which is also responsible for detecting the representation of affine bounds.
  • backward conversion for merge layers is now handled
  • keras models with multiple outputs are now handled and tested

Goodies:

  • a dedicated plot_model() allows to show the graph of a decomon model. This is a modified version of original keras plot_model() (and thus still working on plain keras models) that
    • for each DecomonLayer, displays
      • the name of the underlying keras layer
      • the sense of propagation with a distinct color
    • displays other utilitary decomon layers in a separate color
    • allows argument to specify other attributes to show and choose their colors => see tests/visu-decomon.ipynb for more details

To be continued:

  • the core architecture has been set
  • some layers have been implemented: Dense, Activation with linear, relu or softsign activation function, Add
  • other layers are still to be reimplemented

@nhuet nhuet marked this pull request as draft March 14, 2024 10:18
@nhuet nhuet force-pushed the refactoring branch 4 times, most recently from a246ea9 to 3500ee8 Compare March 18, 2024 22:24
nhuet added 25 commits March 19, 2024 11:34
Starting from python 3.9, we can
- replace Tuple, Dict, List, Type from typing by buitins tuple, dict, list, type
- replace typing.Callable by collections.abc.Callable
- replace typing.Sequence by collections.abc.Sequence

See https://peps.python.org/pep-0585/
This will reflect the structure of keras.layers.

For instance
- Dense can be found in keras.layers.core.dense,
- DecomonDense will be found in decomon.layers.core.dense
Some imports are not working anymore because of the partial refactoring
With the refactoring, some code is not relevant anymore.
This is a multid equivalent of batch_dot() on tensors.
We perform a dot product on tensors by summing over a range of axes
instead of a single axis.

In the first tensor, we perform it on the last axes, starting from a
given one.
In the second tensor, we perform it on the first axes (after batch
axis), ending to a given one.

The option `missing_batchsize` is used to apply batch_multi_dot even
when the batch axis is missing for one of the entry. (In that case, we
use keras.ops.tensordot under the hood.)
This can be useful when combining an affine bounds having a batch size
with a layer affine representation without the batch dimension.

By default, the number of axes to merge is the number of non-batch axes
in the first arg, so that a linear operator represented by tensor `w`
operates on an input tensor `x` as `batch_multi_dot(x, w,
missing_batchsize=(False, True)`
The same layer class is now used for forward and backward propagation.

The goal is to simplified the implementation of the decomon version
of a custom layer for the user wanting to use decomon algorithms on a
model involving such a custom layer.

When implementing a new decomon layer it is sufficient to implement
- get_affine_bounds(): returns affine relaxation of the keras layer
- get_ibp_bounds(): returns constant relaxation of the keras layer

In the case of a linear keras layer, one can set the class attribute
`linear` to True and then implement get_affine_representation() instead
of get_affine_bounds(). Some computations will be simplified in this
case. In particular, no oracle bounds are needed to propagate affine
bounds.

One can also, for performance reasons, override directly
forward_affine_propagate() and backward_affine_propagate() instead of
implementing get_affine_bounds().

The main attributes of the decomon layers are
- layer: the keras underlying layer
- perturbation_domain: on which we propagate bounds
- ibp: do we propagate constant bounds (forward only)
- affine: do we propagate affine bounds (meaningless for backward)
- propagation: forward or backward, the direction of bounds propagation

ibp and affine booleans replace the previous modes as they seem to be
the relevant variables when testing inside decomon layer methods;
Was taking model_input_dim under the hypothesis that the model input was
flatten. With this, it will allow to drop this hypothesis.
We update get_lower_box and get_upper_box so that we can use
tensor-multid (like images) for x_min and x_max.

For now,
- we do not update get_lower_ball
- we do not treat case where w.shape == b.shape
We only implement get_affine_representation().
We could probably compute faster, for tensor-multid inputs,
by avoiding artificially creating a "big" weights representation,
and working directly with the kernel itself.
Less naive version where we directly override
forward_ibp_propagate, forward_affine_propagate, and
backward_affine_propagate to avoid repeating artificially the kernel, in
case of tensor-multid inputs.
"Diagonal" tensors can be represented by their diagonal. In that case,
batch_multid_dot simplifies and results in a mere element-wise product,
with the correct broadcasting.

Diagonal tensors are tensors that can be, batch element by batch element
represented as

x_full = K.reshape(K.diag(K.ravel(x_diag)), x_diag.shape+x_diag.shape )

x_diag being their "diagonal" representation that can be multid.

It will be useful when we represent an affine operator by (w, b) with
weights tensor w of the same shape as bias tensor b.
When layer affine bounds or affine bounds to propagate are represented
in diagonal mode (ie w.shape == b.shape), we need to specify it to
batch_multid_dot.

For not naive DecomonDense implementation, this is not a trivial task in
backward mode, so not yet implemented.
By convention, empty affine bounds means identity bounds
ie w_l=w_u=identity, b_l=b_u=0
thus we return the other bounds unchanged in that case.
… types

- empty affine bounds => identity bounds
- diagonal bounds
- bounds w/o batch
…ounds

For affine bounds empty (ie identity), we add a get_lower_x and
get_upper_x method that return lower and upper bounds on the model input
x.

We implement it in box case.
We also take car of diagonal/ w/o batchsize inputs in box case.

Ball perturbation domains are to be completed later.
nhuet added 9 commits March 19, 2024 11:56
Add also a notebook showing graphs of several decomon models.
With an example of customization to see more attributes and change color
of a specific layer.
- InputsOutputsSpecs:
    - remove all methods from previous api,
    - remove perturbation_domain attribute
    - move into decomon.layers.inputs_outputs_specs
- ForwardMode: removed, replaced by ibp + affine booleans
- PerturbationDomain and related stuff -> decomon.perturbation_domain
- decomon.core -> decomon.constants: contains only enumerations
- decomon.keras_utils: remove unused operations like BatchDiag and
  BatchIdentity
- decomon.utils: keep only activation relaxations, moved into
  decomon.layers.activations.utils. Remove get_linear_hull_s_shape()
  that relies on old inputs_outputs_specs api (and thus need ForwardMode
  => cannot be imported)
- decomon.metrics: removed as rely on previous api and will fail to
  import (need ForwardMode)
- decomon.wrappers: kept but untested. Need to be adapted
- decomon.wrappers_with_tuning: removed as need decomon.metrics
- decomon.models.crown: removed. Specific layers for conversion (like
  ReduceCrown, Fuse, ...) are now in dedicated modules within
  decomon.layers
@nhuet nhuet force-pushed the refactoring branch 2 times, most recently from 7e2c25b to d3cbec2 Compare March 19, 2024 13:11
@nhuet nhuet marked this pull request as ready for review March 19, 2024 14:04
@ducoffeM ducoffeM merged commit 3d3162e into airbus:refactor Apr 11, 2024
16 checks passed
@nhuet nhuet deleted the refactoring branch April 15, 2024 15:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants