-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Developer Guide Transforms
Note: transform design is changing due to work for issue #4855. This documentation will be updated in due course, but for now, the proposed design changes can be found here.
Transforms (or preprocessors) are callables that convert input data into a form that is ready to be consumed by deep learning models. In general, a transform can also have internal states, so that calling it with different data inputs will give consistently processed outputs.
In MONAI, the transform takes the following pattern:
class Transform:
def __init__(self, system_params):
# set states using system parameters
self.some_states = ... # from system_params
def __call__(self, input_data, data_params):
# using self.some_states and data_params
# process data and return output_data
A typical usage of Transform
is:
transform = Transform(system_params) # construct a transform instance
output_data = transform(input_data, data_params) # apply the transform
The uses and developers will directly interact with these interfaces.
With the goal of readability and flexibility, the following sections discuss assumptions and designs of the interfaces.
- transform's system parameters
- transform's input data and additional data parameters
- transform's shape convention
- randomizable transforms
- universal adaptors
System parameters are "static" information that is not data-dependent. They are known and fixed parameters before we have access to input_data
.
The parameters are stored as instance variables (self.some_states
) in each transform.
In a multi-processing context, the transform's instance variables are not shared among different workers. Once constructed, each worker process will operate on it's own states.
MONAI provides both
- vanilla transforms, and
- their dictionary-based counterparts.
The rationale is described here.
The main differences are in the assumptions on input_data
and data_params
:
should work seamlessly with numpy ndarrays.
def __call__(self, input_data, data_params):
# process input_data
-
input_data
: a multi-dimensional array, -
data_params
: as additional information from data, to be used when processing theinput_data
. Thedata_params
are runtime parameters, any static parameters should go intosystem_params
in the transform's constructor.
For example, a vanilla RandRotate90
transform works as the following:
img = np.array((1, 2, 3, 4)).reshape((1, 2, 2))
rotator = RandRotate90(prob=0.0, max_k=3, axes=(1, 2))
img_result = rotator(img)
print(type(img_result))
print(img_result)
# output:
>>>
<class 'numpy.ndarray'>
[[[1 2]
[3 4]]]
The vanilla transforms are located in monai/transforms/transforms.py
in the codebase.
assumes input_data
is a dictionary with ndarray, and data_params
is also a part of the dictionary. The transform's call method therefore has the form:
def __call__(self, input_data):
# process dict(input_data)
For example, a dictionary-based RandRotate90d
transform works as the following:
data = {
'img': np.array((1, 2, 3, 4)).reshape((1, 2, 2)),
'seg': np.array((1, 2, 3, 4)).reshape((1, 2, 2)),
'unused': 5,
}
rotator = RandRotate90d(keys=('img', 'seg'), prob=0.8)
data_result = rotator(data)
print(data_result)
# output:
>>>
{'unused': 5, 'img': array([[[4, 3],
[2, 1]]]), 'seg': array([[[4, 3],
[2, 1]]])}
These transforms are adaptors on top of the vanilla transforms, to facilitate the compositions of multiple transforms:
composed = monai.transforms.Compose([Transform1d(system_params),
Transform2d(system_params),
...])
output_data = composed(input_data) # input_data is a dictionary
The dictionary-based transforms are located in monai/transforms/composables.py
in the codebase.
These transforms take [TransformClassName]d
as the class name, indicating that it is a dictionary-based adaptor for the vanilla transform monai.transforms.transforms.TransformClassName
.
Most of the pre-processing transforms assume the input ND arrays has the shape: [num_channels, spatial_dims]
, where
-
spatial_dims
may have- 0 element (
[num_channels]
, e.g. classification labels), - 1 element (
[num_channels, w]
, spatially 1D), - 2 elements (
[num_channels, h, w]
, spatially 2D) - ...
- N elements (
[num_channels, d, h, w, ...]
, spatially ND).
- 0 element (
-
num_channels
must be greater or equal than1
(input data with shape[spatial_dims]
has to be reshaped into[1, spatial_dims]
beforehand). - each transform may or may not support all spatially ND inputs.
- the returned ND arrays from a transform should take the same shape convention.
Most of the post-processing transforms assume the input ND arrays has the shape: [batch_size, num_channels, spatial_dims]
[num_channels, spatial_dims]
(updated since v0.6).
MONAI provides a randomizable interface so that each transform can generate processed data subject to some random factors (often used in training data augmentation).
The interface has:
- an
R
variable to store the random number generator containernp.random.RandomState
. All derived classes should useself.R
instead ofnp.random
to generate random factors, E.g.,np.random.rand()
should be replaced byself.R.rand()
. - a
randomize()
method, where allself.R
related random factors are generated. - a
set_random_state
method, to set the random number generator container's state.
The interface is located at monai/transforms/compose.py
in the codebase.
These transforms take Rand[TransformClassName][d]
as the class name, indicating that it is a randomized transform for monai.transforms.transforms.TransformClassName[d]
.
For the randomized dictionary-based transforms, when len(keys)
is greater than 1, we expect values of the corresponding keys would be updated simultaneously (most of the transforms of this kind follow this pattern).